[GitHub] spark pull request: [Minor] [Doc] [ML] ml.clustering scala & pytho...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/13291#discussion_r64699243 --- Diff: python/pyspark/ml/clustering.py --- @@ -64,6 +64,21 @@ class GaussianMixture(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIte .. note:: Experimental GaussianMixture clustering. +This class performs expectation maximization for multivariate Gaussian +Mixture Models (GMMs). A GMM represents a composite distribution of +independent Gaussian distributions with associated "mixing" weights +specifying each's contribution to the composite. + +Given a set of sample points, this class will maximize the log-likelihood +for a mixture of k Gaussians, iterating until the log-likelihood changes by +less than convergenceTol, or until it has reached the max number of iterations. +While this process is generally guaranteed to converge, it is not guaranteed +to find a global optimum. + +Note: For high-dimensional data (with many features), this algorithm may perform poorly. --- End diff -- super minor: This formats oddly in Sphinx - to match the scala doc format wise I think you could drop the indentation for the sentances under note, or if you wanted to do a PyDoc note call out you could use the `.. Note::` syntax --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [Doc] [ML] ml.clustering scala & pytho...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/13291#discussion_r64699018 --- Diff: python/pyspark/ml/clustering.py --- @@ -227,15 +242,15 @@ class KMeans(JavaEstimator, HasFeaturesCol, HasPredictionCol, HasMaxIter, HasTol .. versionadded:: 1.5.0 """ -k = Param(Params._dummy(), "k", "number of clusters to create", +k = Param(Params._dummy(), "k", "The number of clusters to create. Must be > 1.", typeConverter=TypeConverters.toInt) initMode = Param(Params._dummy(), "initMode", - "the initialization algorithm. This can be either \"random\" to " + + "The initialization algorithm. This can be either \"random\" to " + "choose random points as initial cluster centers, or \"k-means||\" " + "to use a parallel variant of k-means++", typeConverter=TypeConverters.toString) -initSteps = Param(Params._dummy(), "initSteps", "steps for k-means initialization mode", - typeConverter=TypeConverters.toInt) +initSteps = Param(Params._dummy(), "initSteps", "The number of steps for k-means|| " + --- End diff -- Since were copying this over might as well also include "his is an advanced setting -- the default of 5 is almost always enough." from the scala side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13311 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13311#issuecomment-221792095 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221792040 @holdenk Thanks. I think you are right. I will revert `an one-xxx` to `a one-xxx`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13318#issuecomment-221791831 FYI "This PR also change the loadFactor of BytesToBytesMap to 0.5 (it was 0.75)" this is a pretty low load factor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15457][MLLIB][ML] Eliminate some warnin...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/13314#issuecomment-221791580 @srowen willing to help with that too btw :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13318#issuecomment-221791386 Can we add a unit test for this behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-8426] Enhance Blacklist mechanism...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/13234#discussion_r64698507 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -249,10 +249,16 @@ private[spark] class TaskSchedulerImpl( availableCpus: Array[Int], tasks: Seq[ArrayBuffer[TaskDescription]]) : Boolean = { var launchedTask = false +// TODO unit test, and also add executor-stage filtering as well +// This is an optimization -- the taskSet might contain a very long list of pending tasks. +// Rather than wasting time checking the offer against each task, and then realizing the +// executor is blacklisted, just filter out the bad executor immediately. +val nodeBlacklist = taskSet.blacklistTracker.map{_.nodeBlacklistForStage(taskSet.stageId)} + .getOrElse(Set()) --- End diff -- Before this change, there is an `O(n^2)` (where `n` is the number of pending tasks) cost when you've got one bad executor. The tasks assigned to the bad executor fail, but then we get another resource offer for the bad executor again. So we find another task for the bad executor, it fails, and we continue the process, going through all of the pending task. Each time we respond to the resource offer, we need to (a) iterate through the list of tasks to find one that is *not* blacklisted and (b) then remove it from the task list. Those are both `O(1)` operations when there isn't any blacklisting -- we just pop the last task off the stack. But as our bad executor makes its way through the tasks, it has to go deeper into the list each time, and both searching the list and then removing an element from it become expensive. After we've gone through *all* of the tasks for bad executor once, then we will wait for there to be resource offers from good executors. However, even though we then start scheduling on the good executor, scheduling as a whole is still much slower, because we still have an `O(n)` cost at each call to resourceOffer. The offer still includes the (now idle) bad executor, and we have to iterate through the entire list of pending tasks to decide that nope, there aren't any tasks we can schedule on that node. In my performance tests with a 3k task job, this leads to about a 10x slowdown, but obviously this depends a lot on the number of tasks. But that is the really scary thing -- its not a function of how many bad nodes you have, but how many tasks you are trying to run. So on a large cluster, where a bad node is more likely, and lots of tasks are more likely, the slowdown will be much worse. Note that as implemented in this version of the patch, this slowdown is only avoided when we blacklist the entire node. But we should add blacklisting for an executor as well, to avoid the slowdown in that case also. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221791122 Also, your change seems to have made a few odd changes "an one way" which sounds odd, generally "a one way" is considered sounding "better" (I'm a bit fuzzy on the exact rule - but if you look you'll see people say "a one way ticket" instead of "an one way ticket" and some other similar things). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13311#issuecomment-221791056 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221791117 **[Test build #59352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59352/consoleFull)** for PR 13317 at commit [`230c801`](https://github.com/apache/spark/commit/230c80148cdcd29242fa8fb828ca12ec8c402221). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13311#issuecomment-221791057 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59343/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and exa...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/13176#discussion_r64698276 --- Diff: docs/ml-features.md --- @@ -145,9 +148,11 @@ for more details on the API. passed to other algorithms like LDA. During the fitting process, `CountVectorizer` will select the top `vocabSize` words ordered by - term frequency across the corpus. An optional parameter "minDF" also affects the fitting process + term frequency across the corpus. An optional parameter `minDF` also affects the fitting process by specifying the minimum number (or fraction if < 1.0) of documents a term must appear in to be - included in the vocabulary. + included in the vocabulary. Another optional binary toggle parameter controls the output vector. --- End diff -- You haven't addressed my previous comment for this part both here and in `HashingTF`: Let's make this consistent with the doc for HashingTF above. I'd prefer both to read: "... optional parameter binary controls the output term frequencies. When set to true, all nonzero term frequencies are set to 1. This is especially useful for discrete probabilistic models that model binary, rather than integer, counts." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13311#issuecomment-221790919 **[Test build #59343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59343/consoleFull)** for PR 13311 at commit [`94d6e7b`](https://github.com/apache/spark/commit/94d6e7b218e0a969b41f32bd61878cf890c3ba99). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user zhengruifeng commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221790963 @holdenk Thanks. I have fixed this. and run `lint-java` to check java file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13319#issuecomment-221790817 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59349/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13319#issuecomment-221790798 **[Test build #59349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59349/consoleFull)** for PR 13319 at commit [`8d1958e`](https://github.com/apache/spark/commit/8d1958e6e0ded35fa29282aa35da548a059f15fe). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15100][DOC] Modified user guide and exa...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/13176#discussion_r64698136 --- Diff: docs/ml-features.md --- @@ -53,7 +53,10 @@ collisions, where different raw features may become the same term after hashing. chance of collision, we can increase the target feature dimension, i.e. the number of buckets of the hash table. Since a simple modulo is used to transform the hash function to a column index, --- End diff -- I think we can add it - but we can simply say "The hash function used is MurmurHash 3" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13319#issuecomment-221790816 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/13317#discussion_r64698017 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -105,7 +105,7 @@ private[spark] abstract class MapOutputTracker(conf: SparkConf) extends Logging } } - /** Send a one-way message to the trackerEndpoint, to which we expect it to reply with true. */ + /** Send an one-way message to the trackerEndpoint, to which we expect it to reply with true. */ --- End diff -- I don't think this change is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221790462 So it seems that in a few places adding the extra character has pushed it over the 100. You should probably run the linter explicitly if you have it disabled by default `./dev/lint-scala`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13316#issuecomment-221790073 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59348/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13316#issuecomment-221790065 **[Test build #59348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59348/consoleFull)** for PR 13316 at commit [`325a2ea`](https://github.com/apache/spark/commit/325a2ea5fb9de05f866aa4eab56dea5563223712). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13316#issuecomment-221790072 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221789832 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221789830 **[Test build #59351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59351/consoleFull)** for PR 13317 at commit [`cff3aa8`](https://github.com/apache/spark/commit/cff3aa81f2417ff5bc0d1e7bf205ed2ff5a8eb7f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13319#issuecomment-221789707 cc @cloud-fan @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221789833 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59351/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13319#discussion_r64697510 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -178,12 +180,14 @@ class SparkSession private( def udf: UDFRegistration = sessionState.udf /** + * :: Experimental :: * Returns a [[ContinuousQueryManager]] that allows managing all the * [[org.apache.spark.sql.ContinuousQuery ContinuousQueries]] active on `this`. * * @group basic * @since 2.0.0 */ + @Experimental --- End diff -- this is a "bug" fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13318#issuecomment-221789570 cc @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13318#issuecomment-221789635 **[Test build #59350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59350/consoleFull)** for PR 13318 at commit [`6d074f6`](https://github.com/apache/spark/commit/6d074f6e3ad41f427e6dcb9f5a72674798a40b5e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/13319 [SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession ## What changes were proposed in this pull request? SparkSession has a list of unnecessary private[sql] methods. These methods cause some trouble because private[sql] doesn't apply in Java. In the cases that they are easy to remove, we can simply remove them. This patch does that. ## How was this patch tested? Updated test cases to reflect the changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-15552 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13319 commit 8d1958e6e0ded35fa29282aa35da548a059f15fe Author: Reynold Xin Date: 2016-05-26T06:36:05Z [SPARK-15552][SQL] Remove unnecessary private[sql] methods in SparkSession --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15552][SQL] Remove unnecessary private[...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13319#issuecomment-221789649 **[Test build #59349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59349/consoleFull)** for PR 13319 at commit [`8d1958e`](https://github.com/apache/spark/commit/8d1958e6e0ded35fa29282aa35da548a059f15fe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13317#issuecomment-221789634 **[Test build #59351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59351/consoleFull)** for PR 13317 at commit [`cff3aa8`](https://github.com/apache/spark/commit/cff3aa81f2417ff5bc0d1e7bf205ed2ff5a8eb7f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13318 [SPARK-15391] [SQL] manage the temporary memory of timsort ## What changes were proposed in this pull request? Currently, the memory for temporary buffer used by TimSort is always allocated as on-heap without bookkeeping, it could cause OOM both in on-heap and off-heap mode. This PR will try to manage that by preallocate it together with the pointer array, same with RadixSort. It both works for on-heap and off-heap mode. This PR also change the loadFactor of BytesToBytesMap to 0.5 (it was 0.75), it enables use to radix sort also makes sure that we have enough memory for timsort. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark fix_timsort Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13318.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13318 commit 6d074f6e3ad41f427e6dcb9f5a72674798a40b5e Author: Davies Liu Date: 2016-05-26T06:29:09Z manage the temporary memory of timsort --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix Typos
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/13317 [MINOR] Fix Typos ## What changes were proposed in this pull request? `a` -> `an` ## How was this patch tested? local build You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark a_an Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13317.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13317 commit cff3aa81f2417ff5bc0d1e7bf205ed2ff5a8eb7f Author: Zheng RuiFeng Date: 2016-05-26T06:29:10Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15434][SQL] improve EmbedSerializerInFi...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13216#discussion_r64697298 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/TypedFilterOptimizationSuite.scala --- @@ -34,40 +35,47 @@ class TypedFilterOptimizationSuite extends PlanTest { Batch("EliminateSerialization", FixedPoint(50), EliminateSerialization) :: Batch("EmbedSerializerInFilter", FixedPoint(50), -EmbedSerializerInFilter) :: Nil +EmbedSerializerInFilter, +RemoveAliasOnlyProject, +CombineFilters) :: Nil } implicit private def productEncoder[T <: Product : TypeTag] = ExpressionEncoder[T]() - test("back to back filter") { + test("embed deserializer in filter condition if there is only one filter") { val input = LocalRelation('_1.int, '_2.int) -val f1 = (i: (Int, Int)) => i._1 > 0 -val f2 = (i: (Int, Int)) => i._2 > 0 +val f = (i: (Int, Int)) => i._1 > 0 -val query = input.filter(f1).filter(f2).analyze +val query = input.filter(f).analyze val optimized = Optimize.execute(query) -val expected = input.deserialize[(Int, Int)] - .where(callFunction(f1, BooleanType, 'obj)) - .select('obj.as("obj")) - .where(callFunction(f2, BooleanType, 'obj)) - .serialize[(Int, Int)].analyze +val deserializer = input.deserialize[(Int, Int)].analyze + .asInstanceOf[DeserializeToObject].deserializer +val boundReference = BoundReference(0, deserializer.dataType, nullable = false) +val callFunc = callFunction(f, BooleanType, boundReference) +val condition = ReferenceToExpressions(callFunc, deserializer :: Nil) +val expected = input.where(condition).analyze comparePlans(optimized, expected) } - test("embed deserializer in filter condition if there is only one filter") { + test("embed deserializer in filter condition if there are two filters") { --- End diff -- Shall we add a new test case instead of replacing the original one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13316#issuecomment-221788985 **[Test build #59348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59348/consoleFull)** for PR 13316 at commit [`325a2ea`](https://github.com/apache/spark/commit/325a2ea5fb9de05f866aa4eab56dea5563223712). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15551][MINOR][DOCS][SQL] Replace groupB...
GitHub user holdenk opened a pull request: https://github.com/apache/spark/pull/13316 [SPARK-15551][MINOR][DOCS][SQL] Replace groupBy with groupByKey in KeyValueGroupedDataset Scaladoc ## What changes were proposed in this pull request? Replace groupBy with groupByKey in KeyValueGroupedDataset Scaladoc and update Scaladoc on dataset groupByKey to mention that it is a replacement for the old groupBy + keyAs. ## How was this patch tested? Verified groupByKey behaved as groupBy + keyAs used to function against spark 2.0 preview and built unidoc locally. You can merge this pull request into a Git repository by running: $ git pull https://github.com/holdenk/spark minor-scaladoc-KeyValueGroupedDataset Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13316.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13316 commit 325a2ea5fb9de05f866aa4eab56dea5563223712 Author: Holden Karau Date: 2016-05-26T06:19:18Z Minor: replace groupBy with groupByKey in KeyValueGroupedDataset and mention groupByKey replaces groupBy combined with keyAs from Spark 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15236][SQL][SPARK SHELL] Add spark-defa...
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13088#issuecomment-221788192 @rxin @andrewor14 @cloud-fan Please help review! Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/13284#issuecomment-221786544 @shivaram I will create a JIRA soon. Thursday and Friday, I will be on travel to NYC. Will do it on Saturday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221786353 **[Test build #59347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59347/consoleFull)** for PR 12836 at commit [`9cacd4d`](https://github.com/apache/spark/commit/9cacd4dbfa0e20d2a855e23f2962a258abbba553). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13283#issuecomment-221786340 **[Test build #59346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59346/consoleFull)** for PR 13283 at commit [`b9e12f8`](https://github.com/apache/spark/commit/b9e12f8742e76984445f9d498248704b1c9e9973). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221785096 **[Test build #59345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59345/consoleFull)** for PR 12836 at commit [`0928740`](https://github.com/apache/spark/commit/09287408137f7d6fbe8f899b12810ab16cbb5c3e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13300#discussion_r64694941 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala --- @@ -142,6 +145,75 @@ object CSVRelation extends Logging { if (nonEmptyLines.hasNext) nonEmptyLines.drop(1) } } + + def baseRdd( + sparkSession: SparkSession, + options: CSVOptions, + inputPaths: Seq[String]): RDD[String] = { +readText(sparkSession, options, inputPaths.mkString(",")) + } + + def tokenRdd( + options: CSVOptions, + header: Array[String], + rdd: RDD[String]): RDD[Array[String]] = { +val firstLine = if (options.headerFlag) findFirstLine(options, rdd) else null +univocityTokenizer(rdd, header, firstLine, options) + } + + /** + * Returns the first line of the first non-empty file in path + */ + def findFirstLine(options: CSVOptions, rdd: RDD[String]): String = { +if (options.isCommentSet) { + val comment = options.comment.toString + rdd.filter { line => +line.trim.nonEmpty && !line.startsWith(comment) + }.first() +} else { + rdd.filter { line => +line.trim.nonEmpty + }.first() +} + } + + def readText( + sparkSession: SparkSession, + options: CSVOptions, + location: String): RDD[String] = { +if (Charset.forName(options.charset) == StandardCharsets.UTF_8) { + sparkSession.sparkContext.textFile(location) +} else { + val charset = options.charset + sparkSession.sparkContext +.hadoopFile[LongWritable, Text, TextInputFormat](location) +.mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, pair._2.getLength, charset))) +} + } + + def verifySchema(schema: StructType): Unit = { +schema.foreach { field => + field.dataType match { +case _: ArrayType | _: MapType | _: StructType => + throw new UnsupportedOperationException( +s"CSV data source does not support ${field.dataType.simpleString} data type.") +case _ => + } +} + } + + def getHeader(rdd: RDD[String], csvOptions: CSVOptions): Array[String] = { --- End diff -- This is also used in a few places to get the header from csv records. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13165#discussion_r64694857 --- Diff: R/pkg/inst/tests/testthat/test_includeJAR.R --- @@ -21,10 +21,13 @@ runScript <- function() { sparkTestJarPath <- "R/lib/SparkR/test_support/sparktestjar_2.10-1.0.jar" jarPath <- paste("--jars", shQuote(file.path(sparkHome, sparkTestJarPath))) scriptPath <- file.path(sparkHome, "R/lib/SparkR/tests/testthat/jarTest.R") - submitPath <- file.path(sparkHome, "bin/spark-submit") - res <- system2(command = submitPath, - args = c(jarPath, scriptPath), - stdout = TRUE) + if (.Platform$OS.type == "windows") { --- End diff -- you can call determineSparkSubmitBin() here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15463][SQL] support creating dataframe ...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13300#discussion_r64694834 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -42,16 +42,23 @@ private[csv] object CSVInferSchema { tokenRdd: RDD[Array[String]], header: Array[String], options: CSVOptions): StructType = { -val startType: Array[DataType] = Array.fill[DataType](header.length)(NullType) -val rootTypes: Array[DataType] = - tokenRdd.aggregate(startType)(inferRowType(options), mergeRowTypes) +val structFields = if (options.inferSchemaFlag) { --- End diff -- This method is used in both `csv.DefaultSource` and `DataFrameReader.csv(ds: Dataset[String])`. So I refactored it here to take care both the default schema type and `inferSchemaFlag=true` cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8559#issuecomment-221784338 on a related note, @squito can you in the future leave a msg indicating the branch a pr was merged once you merge it? There have been cases that lead to race conditions in merging and also mistakes in the branches that we needed to go back and audit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8559#issuecomment-221784058 This is pretty cool! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-8426] Enhance Blacklist mechanism...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13234#issuecomment-221783307 **[Test build #59344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59344/consoleFull)** for PR 13234 at commit [`8f2534b`](https://github.com/apache/spark/commit/8f2534b1d4d90f1ed42c695a77f5a2fa588d3428). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10372] [CORE] basic test framework for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8559 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13310#issuecomment-221780558 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13310#issuecomment-221780560 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59333/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15532] [SQL] Add SQLConf.ALLOW_MULTIPLE...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13310#issuecomment-221780470 **[Test build #59333 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59333/consoleFull)** for PR 13310 at commit [`f40a898`](https://github.com/apache/spark/commit/f40a89873ba92eaf5821dce4728d2aab84e1289e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13290#issuecomment-221777604 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13290#issuecomment-221777607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59334/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13290#issuecomment-221777507 **[Test build #59334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59334/consoleFull)** for PR 13290 at commit [`127024d`](https://github.com/apache/spark/commit/127024da7e1058cd39b71e85c6dcd08b5e3e2b53). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15543][SQL] Rename DefaultSources to ma...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13311#issuecomment-221777001 **[Test build #59343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59343/consoleFull)** for PR 13311 at commit [`94d6e7b`](https://github.com/apache/spark/commit/94d6e7b218e0a969b41f32bd61878cf890c3ba99). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15533][SQL]Deprecate Dataset.explode
Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15327] [SQL] fix split expression in wh...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/13235#issuecomment-221776141 It looks like #12351 is the same issue about whole stage codegen with `splitExpressions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13283#issuecomment-221775767 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59332/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13283#issuecomment-221775766 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13283#issuecomment-221775683 **[Test build #59332 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59332/consoleFull)** for PR 13283 at commit [`76f4f80`](https://github.com/apache/spark/commit/76f4f80f962e0271a2073a4cb8de0d513013cf87). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221775528 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221775529 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59342/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221775481 **[Test build #59342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59342/consoleFull)** for PR 13308 at commit [`cbd5163`](https://github.com/apache/spark/commit/cbd5163d73fa56a58e18598ece64aaa60e06cc1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221774354 **[Test build #59341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59341/consoleFull)** for PR 9192 at commit [`f67095e`](https://github.com/apache/spark/commit/f67095ef72540140aa2348b5262ffdf91685846a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221774407 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221774409 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59341/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221774053 **[Test build #59342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59342/consoleFull)** for PR 13308 at commit [`cbd5163`](https://github.com/apache/spark/commit/cbd5163d73fa56a58e18598ece64aaa60e06cc1d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][Doc][Minor] Remove several obsolete env...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13296#issuecomment-221773158 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59329/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][Doc][Minor] Remove several obsolete env...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13296#issuecomment-221773157 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [YARN][Doc][Minor] Remove several obsolete env...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13296#issuecomment-221773071 **[Test build #59329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59329/consoleFull)** for PR 13296 at commit [`367e3b8`](https://github.com/apache/spark/commit/367e3b8de0633c100bc1a9bf4742f6af80ecfa68). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221773031 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221773032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59340/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221772977 **[Test build #59340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59340/consoleFull)** for PR 13308 at commit [`88319c0`](https://github.com/apache/spark/commit/88319c022b8eb55f59f8080d488e30726f475580). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221772828 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221772896 Thanks for the update. LGTM. Will merge after Jenkins passes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221772829 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59339/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Use shell() instead of sy...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221772778 **[Test build #59339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59339/consoleFull)** for PR 13165 at commit [`0482ebb`](https://github.com/apache/spark/commit/0482ebbc43ff1bef8e7a6a16376c6ec36840a366). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13284 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/13284#issuecomment-221772558 Yeah thats a good idea @wangmiao1981 can you open a JIRA to not mask `startsWith` and `endsWith` by updating our generics ? LGTM - Merging this to master and branch-2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221772591 **[Test build #59341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59341/consoleFull)** for PR 9192 at commit [`f67095e`](https://github.com/apache/spark/commit/f67095ef72540140aa2348b5262ffdf91685846a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221772521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59338/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221772519 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221772475 **[Test build #59338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59338/consoleFull)** for PR 13308 at commit [`07806de`](https://github.com/apache/spark/commit/07806de09f4be0dd9501fe81684c07a45ad68672). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15439][SparkR]:Failed to run unit test ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/13284#issuecomment-221772080 looks fine - I think we should really try to make startsWith and endsWith work though, but that could be a follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221771654 **[Test build #59340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59340/consoleFull)** for PR 13308 at commit [`88319c0`](https://github.com/apache/spark/commit/88319c022b8eb55f59f8080d488e30726f475580). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15542][SparkR] Make error message clear...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13308#issuecomment-221771180 **[Test build #59338 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59338/consoleFull)** for PR 13308 at commit [`07806de`](https://github.com/apache/spark/commit/07806de09f4be0dd9501fe81684c07a45ad68672). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221771183 **[Test build #59339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59339/consoleFull)** for PR 13165 at commit [`0482ebb`](https://github.com/apache/spark/commit/0482ebbc43ff1bef8e7a6a16376c6ec36840a366). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15515] [SQL] Error Handling in Running ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13283#issuecomment-221770911 **Update**: The latest code changes contains - For JDBC format, we added an extra checking in the rule `ResolveRelations` of `Analyzer`. Without the PR, Spark will return the error message like: `Option 'url' not specified`. Now, we are reporting `Unsupported data source type for direct query on files: jdbc` - Make data source format name case incensitive so that error handling behaves consistent with the normal cases. - Added the test cases for all the supported formats. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221770814 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221770536 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221770524 **[Test build #59336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59336/consoleFull)** for PR 13165 at commit [`0482ebb`](https://github.com/apache/spark/commit/0482ebbc43ff1bef8e7a6a16376c6ec36840a366). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13165#issuecomment-221770538 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59336/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221770384 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59337/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221770383 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221770380 **[Test build #59337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59337/consoleFull)** for PR 9192 at commit [`90641a7`](https://github.com/apache/spark/commit/90641a71ff1860ddfe1a8e0bcb64cc0f0d2a56c6). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Prevent illegal NULL propagation when fi...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/13290#discussion_r64688437 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1448,6 +1450,37 @@ class Analyzer( } /** + * Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of + * corresponding Attributes of its children output Attributes. This step is needed because + * users can use a resolved AttributeReference in the Dataset API and outer joins + * can change the nullability of an AttribtueReference. Without the fix, a nullable column's + * nullable field can be actually set as non-nullable, which cause illegal optimization + * (e.g., NULL propagation) and wrong answers. + * See SPARK-13484 and SPARK-13801 for the concrete queries of this case. + */ + object FixNullability extends Rule[LogicalPlan] { + +def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { + case q: LogicalPlan if q.resolved => +val childrenOutput = q.children.flatMap(c => c.output).groupBy(_.exprId).flatMap { --- End diff -- yes, I got your point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org