[GitHub] spark pull request: [SPARK-12316] Wait a minutes to avoid cycle ca...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/10475#issuecomment-167655015 CC: @harishreedharan @SaintBacchus could you add test cases for this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] Adjust coverage for partitionBy()
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10499#issuecomment-167655997 **[Test build #48373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48373/consoleFull)** for PR 10499 at commit [`7884e87`](https://github.com/apache/spark/commit/7884e87975e8655f0e3a20cc0455e0d7cd614fe4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12489][Core][SQL][MLib]Fix minor issues...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10440#issuecomment-167656781 ML changes look good to me. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12513] [Streaming] SocketReceiver hang ...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10464#issuecomment-167661660 Looks a race condition in `restart` and `finally { ... socket.stop() ...}`. `restart` will start a new thread and call `receiver.onStart`. So `receiver.onStart` may run before `socket.stop()`. However, it looks unlikely since it sleeps 2 seconds before calling `startReceiver()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12539][SQL][WIP] support writing bucket...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10498#issuecomment-167664157 BTW in github you can use square brackets to create a checklist, e.g. ``` - [] item a - [] item b ``` becomes - [] item a - [] item b --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/9185#issuecomment-167664825 ping @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12222] [Core] Deserialize RoaringBitmap...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/10253#issuecomment-167665206 LGTM. Merging this into `master` and `branch-1.6`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9185#issuecomment-167668475 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9185#issuecomment-167668477 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48375/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12525] Fix fatal compiler warnings in K...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10479#issuecomment-167668461 **[Test build #48376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48376/consoleFull)** for PR 10479 at commit [`422ef49`](https://github.com/apache/spark/commit/422ef494b56f9ac4c770311743fb2a01a9d19ae1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7995][SPARK-6280][Core]Remove AkkaRpcEn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10459#issuecomment-167654637 **[Test build #48372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48372/consoleFull)** for PR 10459 at commit [`1f5a523`](https://github.com/apache/spark/commit/1f5a5237c9fe238a23d0601293da3ae33f1f9fa2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7995][SPARK-6280][Core]Remove AkkaRpcEn...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/10459#discussion_r48505487 --- Diff: core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala --- @@ -61,9 +55,14 @@ class AkkaUtilsSuite extends SparkFunSuite with LocalSparkContext with ResetSyst val slaveRpcEnv = RpcEnv.create("spark-slave", hostname, 0, conf, securityManagerBad) val slaveTracker = new MapOutputTrackerWorker(conf) -intercept[akka.actor.ActorNotFound] { +try { slaveTracker.trackerEndpoint = -slaveRpcEnv.setupEndpointRef("spark", rpcEnv.address, MapOutputTracker.ENDPOINT_NAME) +slaveRpcEnv.setupEndpointRef(rpcEnv.address, MapOutputTracker.ENDPOINT_NAME) +} catch { + case e: RuntimeException => +assert(e.getMessage.contains("javax.security.sasl.SaslException")) + case e: SparkException => +assert(e.getMessage.contains("Message is dropped because Outbox is stopped")) --- End diff -- Adding this catch clause because there is a race condition in `Outbox` that may throw `SparkException` instead. Image the following execution order: Execution Order | Thread1 | Thread2 - | - | - 1 | nettyEnv.createClient (Outbox.scala, will call channel.close in this method if authentication fails) | 2 | catch NonFatal(e) | 3 | | connectionTerminated (NettyRpcHandler) 4 | | nettyEnv.removeOutbox 5 | | outbox.stop 6 | | message.onFailure(new SparkException("Message is dropped because Outbox is stopped")) 7 | Outbox.handleNetworkFailure | --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12415] Do not use closure serializer to...
Github user tedyu commented on a diff in the pull request: https://github.com/apache/spark/pull/10368#discussion_r48507734 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -109,6 +111,9 @@ class KryoSerializer(conf: SparkConf) kryo.register(classOf[SerializableJobConf], new KryoJavaSerializer()) kryo.register(classOf[HttpBroadcast[_]], new KryoJavaSerializer()) kryo.register(classOf[PythonBroadcast], new KryoJavaSerializer()) +kryo.register(classOf[TaskMetrics], new KryoJavaSerializer()) +kryo.register(classOf[DirectTaskResult[_]], new KryoJavaSerializer()) +kryo.register(classOf[IndirectTaskResult[_]], new KryoJavaSerializer()) --- End diff -- bq. people may forget to register new classes if they just add an Option field to TaskMetrics in future Addition to TaskMetrics would be reviewed, right ? A comment can be added to TaskMetrics reminding them to register corresponding class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12486] Worker should kill the executors...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10438#issuecomment-167664384 **[Test build #48374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48374/consoleFull)** for PR 10438 at commit [`67611ac`](https://github.com/apache/spark/commit/67611acec29cb6cadadc038f27759c19578f6e21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12222] [Core] Deserialize RoaringBitmap...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10253 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9185#issuecomment-167665605 This seems fine to me as a first step. Eventually we will probably want to make the RBackend multi-session aware. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12525] Fix fatal compiler warnings in K...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10479#issuecomment-167665578 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12489][Core][SQL][MLib]Fix minor issues...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10440#issuecomment-167668145 @andrewor14 could you take a look at this pr? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12489][Core][SQL][MLib]Fix minor issues...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10440#discussion_r48510554 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/Node.scala --- @@ -386,9 +386,9 @@ private[tree] object LearningNode { var levelsToGo = indexToLevel(nodeIndex) while (levelsToGo > 0) { if ((nodeIndex & (1 << levelsToGo - 1)) == 0) { -tmpNode = tmpNode.leftChild.asInstanceOf[LearningNode] +tmpNode = tmpNode.leftChild.get --- End diff -- @jkbradley was this code never run before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12489][Core][SQL][MLib]Fix minor issues...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10440#issuecomment-167671951 Looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12489][Core][SQL][MLib]Fix minor issues...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10440#discussion_r48510515 --- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java --- @@ -151,7 +151,7 @@ private static String prepareWindowsCommand(List cmd, Map
[GitHub] spark pull request: [SPARK-12490] Don't use Javascript for web UI'...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10441#issuecomment-167672868 @zsxwing, I've pushed a new commit which aims to preserve the old behavior when increasing the number of items displayed per page while pageNumber > 1; see fd2d1f2a49ad4f6bc2b5ed8bf3aecd65093abc65. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9185#issuecomment-167689398 **[Test build #48386 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48386/consoleFull)** for PR 9185 at commit [`0633a73`](https://github.com/apache/spark/commit/0633a73ddbc6a328d579434f3c3ec349765d70ef). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/10444#discussion_r48516709 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -47,6 +48,34 @@ trait Predicate extends Expression { override def dataType: DataType = BooleanType } +object Predicate extends PredicateHelper { + def toCNF(predicate: Expression, maybeThreshold: Option[Double] = None): Expression = { +val cnf = new CNFExecutor(predicate).execute(predicate) +val threshold = maybeThreshold.map(predicate.size * _).getOrElse(Double.MaxValue) +if (cnf.size > threshold) predicate else cnf --- End diff -- Maximizing the number of simple predicates sounds reasonable. We may do the conversion in a depth-first manner, i.e. always convert the left branch of an `And` and then its right branch, until either no more predicates can be converted or we reach the size limit. In this way the intermediate result is still useful. BTW, searched for CNF conversion in Hive and found [HIVE-9166][1], which also tries to put an upper limit for ORC SARG CNF conversion. @nongli Any clues about how Impala does this? [1]: https://issues.apache.org/jira/browse/HIVE-9166 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12522] [SQL] [MINOR] Add the missing do...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10471#issuecomment-167689959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48382/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12522] [SQL] [MINOR] Add the missing do...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10471#issuecomment-167689958 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12522] [SQL] [MINOR] Add the missing do...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10471#issuecomment-167689909 **[Test build #48382 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48382/consoleFull)** for PR 10471 at commit [`91ec5de`](https://github.com/apache/spark/commit/91ec5ded23df41006554c5c3401e94e5f0a1fa5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12522] [SQL] [MINOR] Add the missing do...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10471 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12522] [SQL] [MINOR] Add the missing do...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10471#issuecomment-167690183 Thanks - I've merged it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12536] [SQL] Added "Empty Seq" in Expla...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10494#discussion_r48517137 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala --- @@ -62,6 +62,10 @@ case class LocalRelation(output: Seq[Attribute], data: Seq[InternalRow] = Nil) case _ => false } + override def simpleString: String = +if (data == Seq.empty) super.simpleString + " [Empty Seq]" --- End diff -- should be https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L401 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10491#discussion_r48517225 --- Diff: docs/configuration.md --- @@ -120,7 +120,8 @@ of the most common options to set are: spark.driver.cores 1 -Number of cores to use for the driver process, only in cluster mode. +Number of cores to use for the driver process, only in cluster mode. This can be set through +--driver-cores command line option. --- End diff -- I moved these to the running-on-yarn doc, would that work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10491#issuecomment-167692412 **[Test build #48387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48387/consoleFull)** for PR 10491 at commit [`27c6976`](https://github.com/apache/spark/commit/27c6976cb33c8a418635a46255301b027db8615c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48517423 --- Diff: R/pkg/R/DataFrame.R --- @@ -2272,3 +2260,40 @@ setMethod("with", newEnv <- assignNewEnv(data) eval(substitute(expr), envir = newEnv, enclos = newEnv) }) + +#' Saves the content of the DataFrame to an external database table via JDBC +#' +#' Additional JDBC database connection properties can be set (...) +#' +#' Also, mode is used to specify the behavior of the save operation when +#' data already exists in the data source. There are four modes: \cr +#' append: Contents of this DataFrame are expected to be appended to existing data. \cr +#' overwrite: Existing data is expected to be overwritten by the contents of this DataFrame. \cr +#' error: An exception is expected to be thrown. \cr +#' ignore: The save operation is expected to not save the contents of the DataFrame +#' and to not change the existing data. \cr +#' +#' @param x A SparkSQL DataFrame +#' @param url JDBC database url of the form `jdbc:subprotocol:subname` +#' @param tableName The name of the table in the external database +#' @param mode One of 'append', 'overwrite', 'error', 'ignore' save mode (it is 'error' by default) +#' @family DataFrame functions +#' @rdname write.jdbc +#' @name write.jdbc +#' @export +#' @examples +#'\dontrun{ +#' sc <- sparkR.init() +#' sqlContext <- sparkRSQL.init(sc) +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename" +#' write.jdbc(df, jdbcUrl, "table", user = "username", password = "password") +#' } +setMethod("write.jdbc", + signature(x = "DataFrame", url = "character", tableName = "character"), + function(x, url, tableName, mode = "error", ...){ +jmode <- convertToJSaveMode(mode) +jprops <- envToJProperties(varargsToEnv(...)) --- End diff -- vararg -> env -> properties seems a little redundant. I would prefer vararg -> properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48517433 --- Diff: R/pkg/R/SQLContext.R --- @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options) dataFrame(sdf) } + +#' Create a DataFrame representing the database table accessible via JDBC URL +#' +#' Additional JDBC database connection properties can be set (...) +#' +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be +#' retrieved in parallel based on the `numPartitions` or by the predicates. +#' +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash +#' your external database systems. +#' +#' @param sqlContext SQLContext to use +#' @param url JDBC database url of the form `jdbc:subprotocol:subname` +#' @param tableName the name of the table in the external database +#' @param partitionColumn the name of a column of integral type that will be used for partitioning +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive), +#' `upperBound` (exclusive), form partition strides for generated WHERE +#' clause expressions used to split the column `partitionColumn` evenly. +#' This defaults to SparkContext.defaultParallelism when unset. +#' @param predicates a list of conditions in the where clause; each one defines one partition --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48517532 --- Diff: R/pkg/R/generics.R --- @@ -537,6 +537,12 @@ setGeneric("write.df", function(df, path, ...) { standardGeneric("write.df") }) #' @export setGeneric("saveDF", function(df, path, ...) { standardGeneric("saveDF") }) --- End diff -- yeah, correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48517634 --- Diff: R/pkg/R/SQLContext.R --- @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options) dataFrame(sdf) } + +#' Create a DataFrame representing the database table accessible via JDBC URL +#' +#' Additional JDBC database connection properties can be set (...) +#' +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be +#' retrieved in parallel based on the `numPartitions` or by the predicates. +#' +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash +#' your external database systems. +#' +#' @param sqlContext SQLContext to use +#' @param url JDBC database url of the form `jdbc:subprotocol:subname` +#' @param tableName the name of the table in the external database +#' @param partitionColumn the name of a column of integral type that will be used for partitioning +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive), +#' `upperBound` (exclusive), form partition strides for generated WHERE +#' clause expressions used to split the column `partitionColumn` evenly. +#' This defaults to SparkContext.defaultParallelism when unset. +#' @param predicates a list of conditions in the where clause; each one defines one partition +#' @return DataFrame +#' @rdname read.jdbc +#' @name read.jdbc +#' @export +#' @examples +#'\dontrun{ +#' sc <- sparkR.init() +#' sqlContext <- sparkRSQL.init(sc) +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename" +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username") +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0, +#' upperBound = 1, user = "username", password = "password") +#' } + +read.jdbc <- function(sqlContext, url, tableName, + partitionColumn = NULL, lowerBound = NULL, upperBound = NULL, + numPartitions = 0L, predicates = list(), ...) { --- End diff -- default parameter for predicates can be NULL? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10491#issuecomment-167697095 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10491#issuecomment-167696817 **[Test build #48387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48387/consoleFull)** for PR 10491 at commit [`27c6976`](https://github.com/apache/spark/commit/27c6976cb33c8a418635a46255301b027db8615c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10491#issuecomment-167697099 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48387/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12495][SQL] use true as default value f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10443#issuecomment-167697369 **[Test build #48389 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48389/consoleFull)** for PR 10443 at commit [`a6b826c`](https://github.com/apache/spark/commit/a6b826c4cd55545e2ca2f1478a16c030bc0a86df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] Adjust coverage for partitionBy()
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10499#issuecomment-167697956 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] Adjust coverage for partitionBy()
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10499#issuecomment-167697671 **[Test build #48381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48381/consoleFull)** for PR 10499 at commit [`f655bbe`](https://github.com/apache/spark/commit/f655bbe37fee7903ca8446996f971f629b1c5450). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] Adjust coverage for partitionBy()
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10499#issuecomment-167697959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48381/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12480][SQL] add Hash expression that ca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10435#issuecomment-167698232 **[Test build #48388 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48388/consoleFull)** for PR 10435 at commit [`6311aa7`](https://github.com/apache/spark/commit/6311aa75a7a41fee8464ee96e5949ccad3e7d7a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48517911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- is `left.maxRows.isEmpty` equal to `check if left is a Limit`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- Actually I think this branch is safe without this check, did I miss something here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => + Limit(exp, +Union( + CombineLimits(Limit(exp, left)), + CombineLimits(Limit(exp, right --- End diff -- We can get rid of this manual call now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518190 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -153,6 +153,15 @@ object SetOperationPushDown extends Rule[LogicalPlan] with PredicateHelper { ) ) +// Adding extra Limit below UNION ALL if both left and right childs are not Limit. +// This heuristic is valid assuming there does not exist any Limit push-down rule. +case Limit(exp, Union(left, right)) + if left.maxRows.isEmpty || right.maxRows.isEmpty => --- End diff -- The goal is to avoid double pushdown even if the limit has been pushed past another operator (i.e. a project). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518228 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- Specify that any operator that a `Limit` can be pushed passed should override this function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add more exceptions to Guava relocation
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10442#issuecomment-167700718 @microhello please file a JIRA and add it to the title of this PR. See how other patches are opened. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48518375 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -355,6 +355,13 @@ private[spark] object SerDe { writeInt(dos, v.length) v.foreach(elem => writeObject(dos, elem)) +// Handle Properties --- End diff -- my preference is to do more in R. if you feel strongly about having a helper in Scala instead of handling Properties then we could move most of the code into a Scala helper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10451#issuecomment-167700694 > add a comment and explain the current solution. In the future, if we add such an operator, we can change the current way and fix the issue? (Already added a comment in the code) I like this option --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12503] [SQL] Pushing Limit Through Unio...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10451#discussion_r48518442 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -91,6 +91,11 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { } /** + * Returns the limited number of rows to be returned. --- End diff -- And, thus, we should fix `Project` too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48518467 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -355,6 +355,13 @@ private[spark] object SerDe { writeInt(dos, v.length) v.foreach(elem => writeObject(dos, elem)) +// Handle Properties --- End diff -- I got it, java.util.Properties implements Map interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12512][SQL] support column name with do...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10500#issuecomment-167700900 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] Adjust coverage for partitionBy()
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10499#discussion_r48518567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -119,7 +119,7 @@ final class DataFrameWriter private[sql](df: DataFrame) { * Partitions the output by the given columns on the file system. If specified, the output is * laid out on the file system similar to Hive's partitioning scheme. * - * This is only applicable for Parquet at the moment. + * This was initally applicable for Parquet but in 1.5+ covers JSON as well. --- End diff -- also "text" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10418#issuecomment-167701068 **[Test build #48385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48385/consoleFull)** for PR 10418 at commit [`64f32a4`](https://github.com/apache/spark/commit/64f32a43848a0d458b5d42f37688e7af17c5f336). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12526][SPARKR]`ifelse`, `when`, `otherw...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10481#discussion_r48518527 --- Diff: R/pkg/R/column.R --- @@ -225,7 +225,7 @@ setMethod("%in%", setMethod("otherwise", signature(x = "Column", value = "ANY"), function(x, value) { -value <- ifelse(class(value) == "Column", value@jc, value) +value <- if(class(value) == "Column") { value@jc } else { value } --- End diff -- if( :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10418#issuecomment-167701117 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48385/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10418#issuecomment-167701116 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12443][SQL] encoderFor should support D...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10399#issuecomment-167702564 @rxin Could you check this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12512][SQL] support column name with do...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10500#issuecomment-167702840 **[Test build #48390 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48390/consoleFull)** for PR 10500 at commit [`6372f92`](https://github.com/apache/spark/commit/6372f92d7ce57cdf12ed98af513b11d97e613a88). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12547][SQL] Tighten scala style checker...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/10501 [SPARK-12547][SQL] Tighten scala style checker enforcement for UDF registration We use scalastyle:off to turn off style checks in certain places where it is not possible to follow the style guide. This is usually ok. However, in udf registration, we disable the checker for a large amount of code simply because some of them exceed 100 char line limit. It is better to just disable the line limit check rather than everything. In this pull request, I only disabled line length check, and fixed a problem (lack explicit types for public methods). You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-12547 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10501.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10501 commit 5157f276a68eef3eebf70df66ee526f1529ac354 Author: Reynold XinDate: 2015-12-29T02:40:04Z [SPARK-12547][SQL] Tighten scala style checker enforcement for UDF registration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC] Adjust coverage for partitionBy()
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10499#issuecomment-167703422 **[Test build #48391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48391/consoleFull)** for PR 10499 at commit [`dff3935`](https://github.com/apache/spark/commit/dff3935b571bcbf121aa017b1cf52bc5757d04ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12453] [Streaming] Spark Streaming Kine...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10416#issuecomment-167509793 Roger that, @Schadix would you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SPARK-12455][SQL] Native Spark Wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10402#issuecomment-167516759 **[Test build #48362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48362/consoleFull)** for PR 10402 at commit [`767305a`](https://github.com/apache/spark/commit/767305a58c47fffb1ced4483e3c4a938e5383143). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10418#discussion_r48469649 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -44,6 +48,11 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType * Sorts the input array in ascending / descending order according to the natural ordering of * the array elements and returns it. */ +@ExpressionDescription( + usage = "_FUNC_(array, ascendingOrder) - Sorts the input array for the given column in " + --- End diff -- This will fail to compile in scala 2.11. Use a raw string here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48472149 --- Diff: R/pkg/R/SQLContext.R --- @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options) dataFrame(sdf) } + +#' Create a DataFrame representing the database table accessible via JDBC URL +#' +#' Additional JDBC database connection properties can be set (...) +#' +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be +#' retrieved in parallel based on the `numPartitions` or by the predicates. +#' +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash +#' your external database systems. +#' +#' @param sqlContext SQLContext to use +#' @param url JDBC database url of the form `jdbc:subprotocol:subname` +#' @param tableName the name of the table in the external database +#' @param partitionColumn the name of a column of integral type that will be used for partitioning +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive), +#' `upperBound` (exclusive), form partition strides for generated WHERE +#' clause expressions used to split the column `partitionColumn` evenly. +#' This defaults to SparkContext.defaultParallelism when unset. +#' @param predicates a list of conditions in the where clause; each one defines one partition --- End diff -- State that parameter predicates is mutually exclusive from partitionColumn/lowerBound/upperBound/numPartitions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10480#discussion_r48472173 --- Diff: R/pkg/R/SQLContext.R --- @@ -556,3 +556,61 @@ createExternalTable <- function(sqlContext, tableName, path = NULL, source = NUL sdf <- callJMethod(sqlContext, "createExternalTable", tableName, source, options) dataFrame(sdf) } + +#' Create a DataFrame representing the database table accessible via JDBC URL +#' +#' Additional JDBC database connection properties can be set (...) +#' +#' Only one of partitionColumn or predicates should be set. Partitions of the table will be +#' retrieved in parallel based on the `numPartitions` or by the predicates. +#' +#' Don't create too many partitions in parallel on a large cluster; otherwise Spark might crash +#' your external database systems. +#' +#' @param sqlContext SQLContext to use +#' @param url JDBC database url of the form `jdbc:subprotocol:subname` +#' @param tableName the name of the table in the external database +#' @param partitionColumn the name of a column of integral type that will be used for partitioning +#' @param lowerBound the minimum value of `partitionColumn` used to decide partition stride +#' @param upperBound the maximum value of `partitionColumn` used to decide partition stride +#' @param numPartitions the number of partitions, This, along with `lowerBound` (inclusive), +#' `upperBound` (exclusive), form partition strides for generated WHERE +#' clause expressions used to split the column `partitionColumn` evenly. +#' This defaults to SparkContext.defaultParallelism when unset. +#' @param predicates a list of conditions in the where clause; each one defines one partition +#' @return DataFrame +#' @rdname read.jdbc +#' @name read.jdbc +#' @export +#' @examples +#'\dontrun{ +#' sc <- sparkR.init() +#' sqlContext <- sparkRSQL.init(sc) +#' jdbcUrl <- "jdbc:mysql://localhost:3306/databasename" +#' df <- read.jdbc(sqlContext, jdbcUrl, "table", predicates = list("field<=123"), user = "username") +#' df2 <- read.jdbc(sqlContext, jdbcUrl, "table2", partitionColumn = "index", lowerBound = 0, +#' upperBound = 1, user = "username", password = "password") +#' } + +read.jdbc <- function(sqlContext, url, tableName, + partitionColumn = NULL, lowerBound = NULL, upperBound = NULL, + numPartitions = 0L, predicates = list(), ...) { + jprops <- envToJProperties(varargsToEnv(...)) + + read <- callJMethod(sqlContext, "read") + if (!is.null(partitionColumn)) { --- End diff -- add mutual exclusive check for predicates? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12453][Streaming] Remove explicit depen...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10492#issuecomment-167543175 @JoshRosen it's not the scope that's an issue but the version. Not specifying it lets the SDK version required by the Kinesis client come in at whatever it needs to be. I am not sure provided scope works since it does really need to be bundled and isn't necessarily otherwise available from the env. There's a little wrinkle here in that the Kinesis code uses SDK classes directly, so technically the POM should declare that. However the SDK is used only in the context of the Kinesis client, so it seems like the lesser evil to rely on it as a transitive dependency but at the right version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48475897 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends UnaryExpression with ImplicitCastInput * the hash length is not one of the permitted values, the return value is NULL. */ @ExpressionDescription( - usage = "_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the " + -"input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent " + -"to 256", + usage = +"""_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the input. + SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.""" + , --- End diff -- Does that work under 2.11? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12224][SPARKR] R support for JDBC sourc...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10480#issuecomment-167517583 For test JDBC, we can add a helper function in Scala side, which reuses code in JDBCSuite to start a in-memory JDBC server? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SPARK-12455][SQL] Native Spark Wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10402#issuecomment-167520443 **[Test build #48363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48363/consoleFull)** for PR 10402 at commit [`13f9c95`](https://github.com/apache/spark/commit/13f9c95590bbee7790e74768e7b42fb0e0161b9d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12525] Fix fatal compiler warnings in K...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10479#issuecomment-167539129 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-10625] [SQL] Spark SQL JDBC read/write ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8785#issuecomment-167539588 **[Test build #2258 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2258/consoleFull)** for PR 8785 at commit [`6af8fd8`](https://github.com/apache/spark/commit/6af8fd8b824f5f343a01868560b74a1f55acd02f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-10625] [SQL] Spark SQL JDBC read/write ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8785#issuecomment-167539724 **[Test build #2258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2258/consoleFull)** for PR 8785 at commit [`6af8fd8`](https://github.com/apache/spark/commit/6af8fd8b824f5f343a01868560b74a1f55acd02f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12536] [SQL] Added "Empty Seq" in Expla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10494#issuecomment-167539570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48361/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12340][SQL]fix Int overflow in the Spar...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10487#issuecomment-167540446 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12513] [Streaming] SocketReceiver hang ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10464#issuecomment-167541966 Does this really solve the problem? the current code appears to clean up the socket on stopping already, so I wonder why this would fix it. Did you test it? It makes more sense to open the socket in onStart if you close in onStop? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: we should have training and test sets
Github user XD-DENG commented on the pull request: https://github.com/apache/spark/pull/10434#issuecomment-167541958 Thanks for clarifying. Will have a look if I can proceed as you suggested with JIRA. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add more exceptions to Guava relocation
Github user microhello commented on a diff in the pull request: https://github.com/apache/spark/pull/10442#discussion_r48467911 --- Diff: pom.xml --- @@ -99,14 +99,14 @@ sql/hive unsafe assembly -external/twitter -external/flume -external/flume-sink -external/flume-assembly -external/mqtt -external/mqtt-assembly -external/zeromq -examples + --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12536] [SQL] Added "Empty Seq" in Expla...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10494#discussion_r48468287 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala --- @@ -62,6 +62,10 @@ case class LocalRelation(output: Seq[Attribute], data: Seq[InternalRow] = Nil) case _ => false } + override def simpleString: String = +if (data == Seq.empty) super.simpleString + " [Empty Seq]" --- End diff -- should we do it in `TreeNode`? cc @marmbrus @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SPARK-12455][SQL] Native Spark Wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10402#issuecomment-167539426 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SPARK-12455][SQL] Native Spark Wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10402#issuecomment-167539345 **[Test build #48362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48362/consoleFull)** for PR 10402 at commit [`767305a`](https://github.com/apache/spark/commit/767305a58c47fffb1ced4483e3c4a938e5383143). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48472393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends UnaryExpression with ImplicitCastInput * the hash length is not one of the permitted values, the return value is NULL. */ @ExpressionDescription( - usage = "_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the " + -"input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent " + -"to 256", + usage = +"""_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the input. + SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.""" + , --- End diff -- Yes, this keeps within 100 characters at a line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8641][SPARK-12455][SQL] Native Spark Wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10402#issuecomment-167539427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48362/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12263][Docs]: IllegalStateException: Me...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10483#issuecomment-167539239 @nssalian you need to fix the line that's too long now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: we should have training and test sets
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10434#issuecomment-167539553 @XD-DENG can you address my comments or close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12536] [SQL] Added "Empty Seq" in Expla...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10494#issuecomment-167539491 **[Test build #48361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48361/consoleFull)** for PR 10494 at commit [`21080af`](https://github.com/apache/spark/commit/21080afd995e3df141db1531c065ace6eac4fa77). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-10625] [SQL] Spark SQL JDBC read/write ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/8785#discussion_r48472434 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/UnserializableDriverHelper.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.jdbc + +import java.sql.{DriverManager, Connection} +import java.util.Properties +import java.util.logging.Logger + +object UnserializableDriverHelper { + + import scala.collection.JavaConverters._ --- End diff -- This is imported locally in a few places, why? Below you don't import org.h2.Driver though. I'm not worried about changing it though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12536] [SQL] Added "Empty Seq" in Expla...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10494#issuecomment-167539569 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: we should have training and test sets
Github user XD-DENG commented on the pull request: https://github.com/apache/spark/pull/10434#issuecomment-167540341 @srowen Hi Owen, sure. Thanks a lot for your clarification. My understanding was that you find this modification unnecessary, so I didn't proceed further. Happy new year. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: we should have training and test sets
Github user XD-DENG closed the pull request at: https://github.com/apache/spark/pull/10434 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10418#discussion_r48469610 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -335,7 +335,7 @@ object FunctionRegistry { val df = clazz.getAnnotation(classOf[ExpressionDescription]) if (df != null) { (name, -(new ExpressionInfo(clazz.getCanonicalName, name, df.usage(), df.extended()), +(new ExpressionInfo(clazz.getCanonicalName, name, df.usage(), df.extended().stripMargin), --- End diff -- Why would you want to add a new line character in a raw string? It would be nice to add ```stripMargin``` to ```df.usage``` as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48470392 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends UnaryExpression with ImplicitCastInput * the hash length is not one of the permitted values, the return value is NULL. */ @ExpressionDescription( - usage = "_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the " + -"input. SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent " + -"to 256", + usage = +"""_FUNC_(input, bitLength) - Returns a checksum of SHA-2 family as a hex string of the input. + SHA-224, SHA-256, SHA-384, and SHA-512 are supported. Bit length of 0 is equivalent to 256.""" + , --- End diff -- Nit style. I guess this was needed to stay within 100 charaters? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12534][DOC] update documentation to lis...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/10491#discussion_r48471720 --- Diff: docs/configuration.md --- @@ -120,7 +120,8 @@ of the most common options to set are: spark.driver.cores 1 -Number of cores to use for the driver process, only in cluster mode. +Number of cores to use for the driver process, only in cluster mode. This can be set through +--driver-cores command line option. --- End diff -- I don't think the purpose of this file is to document how the CLI works. It should stick to documenting the underlying properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12526][SPARKR]`ifelse`, `when`, `otherw...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10481#issuecomment-167540565 The fix is good, but some style nit: if (...) { ... } else { ... } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12424][ML] The implementation of ParamM...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10381#issuecomment-167551863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48364/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12424][ML] The implementation of ParamM...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10381#issuecomment-167551695 **[Test build #48364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48364/consoleFull)** for PR 10381 at commit [`ea924e9`](https://github.com/apache/spark/commit/ea924e935ea8adb7cb9bdc8a7ac0da1fa32c0328). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12424][ML] The implementation of ParamM...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10381#issuecomment-167551862 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org