[GitHub] spark pull request: Spark 1442 step 1
GitHub user guowei2 opened a pull request: https://github.com/apache/spark/pull/2953 Spark 1442 step 1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/guowei2/spark SPARK-1442-STEP-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2953.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2953 commit 19af53f5b2b55a1c38ed8879c562abc346c4da25 Author: guowei2 guow...@asiainfo.com Date: 2014-10-24T09:55:47Z window function commit 060d42656536d20771525fbe93c28929c440542c Author: guowei2 guow...@asiainfo.com Date: 2014-10-27T05:29:35Z window function commit cfa0e2a105f3fc6e4b61433a1ba8c246399978b8 Author: guowei2 guow...@asiainfo.com Date: 2014-10-27T06:03:17Z window function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4030] Make destroy public for broadcast...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2922#issuecomment-60553565 [Test build #22277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22277/consoleFull) for PR 2922 at commit [`a11abab`](https://github.com/apache/spark/commit/a11ababb21a0c2378a4d2f665f16d16112d7b469). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2953#issuecomment-60553550 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1442] [SQL] window function implement
Github user guowei2 commented on the pull request: https://github.com/apache/spark/pull/2953#issuecomment-60553573 aims of step1 ( this pr): 1ãsupport parse sql with complex window define 2ãsupport most of aggregate-functions with window-spec 3ãsupport window range aims of step2: support the rest of un-support features as below: 1. not support with multi-different window partitions ,but support with multi-same window partitions 2. not support with both window partition and group by 3. not support lead, lag (default lead and lag function lookup in HiveFunctionRegistry are GenericUDF, we need GenericUDAF) 4. not support rank, dense_rank 3. not support sql parse with TOK_PTBLFUNCTION --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60553581 @JoshRosen I think we still have it (in tests at tonight): ``` [info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 1 times, most recent failure: Lost task 0.0 in stage 11.0 (TID 11, localhost): java.io.IOException: unexpected exception type [info] java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) [info] java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) [info] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) [info] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) [info] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) [info] java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) [info] java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) [info] java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) [info] java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) [info] java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) [info] org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) [info] org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) [info] org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164) [info] java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [info] java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [info] java.lang.Thread.run(Thread.java:745) [info] Driver stacktrace: [info] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1192) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1181) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1180) [info] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) [info] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) [info] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1180) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:695) [info] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:695) [info] at scala.Option.foreach(Option.scala:236) [info] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:695) [info] at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1398) [info] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) [info] at akka.actor.ActorCell.invoke(ActorCell.scala:456) [info] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) [info] at akka.dispatch.Mailbox.run(Mailbox.scala:219) [info] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) [info] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [info] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [info] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [info] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4030] Make destroy public for broadcast...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/2922#issuecomment-60553667 @pwendell - I made destroy blocking by default and only made that version public (its not clear we need the non-blocking version to also be public -- we can add it later if required) Also all the Broadcast stuff in the Java API seems to come directly from the java classes ? Let me know if I missed something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60553702 This is really strange; I thought that the unexpected exception type would have been addressed by https://github.com/apache/spark/pull/2932 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3795] Heuristics for dynamically scalin...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r19389557 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -0,0 +1,409 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.collection.mutable + +import org.apache.spark.scheduler._ + +/** + * An agent that dynamically allocates and removes executors based on the workload. + * + * The add policy depends on the number of pending tasks. If the queue of pending tasks is not + * drained in N seconds, then new executors are added. If the queue persists for another M + * seconds, then more executors are added and so on. The number added in each round increases + * exponentially from the previous round until an upper bound on the number of executors has + * been reached. + * + * The rationale for the exponential increase is twofold: (1) Executors should be added slowly + * in the beginning in case the number of extra executors needed turns out to be small. Otherwise, + * we may add more executors than we need just to remove them later. (2) Executors should be added + * quickly over time in case the maximum number of executors is very high. Otherwise, it will take + * a long time to ramp up under heavy workloads. + * + * The remove policy is simpler: If an executor has been idle for K seconds (meaning it has not + * been scheduled to run any tasks), then it is removed. This requires starting a timer on each + * executor instead of just starting a global one as in the add case. + * + * There is no retry logic in either case. Because the requests to the cluster manager are + * asynchronous, this class does not know whether a request has been granted until later. For + * this reason, both add and remove are treated as best-effort only. + * + * The relevant Spark properties include the following: + * + * spark.dynamicAllocation.enabled - Whether this feature is enabled + * spark.dynamicAllocation.minExecutors - Lower bound on the number of executors + * spark.dynamicAllocation.maxExecutors - Upper bound on the number of executors + * + * spark.dynamicAllocation.addExecutorThresholdSeconds - How long before new executors are added + * spark.dynamicAllocation.addExecutorIntervalSeconds - How often to add new executors + * spark.dynamicAllocation.removeExecutorThresholdSeconds - How long before an executor is removed + * + * Synchronization: Because the schedulers in Spark are single-threaded, contention should only + * arise when new executors register or when existing executors are removed, both of which are + * relatively rare events with respect to task scheduling. Thus, synchronizing each method on the + * same lock should not be expensive assuming biased locking is enabled in the JVM (on by default + * for Java 6+). This may not be true, however, if the application itself runs multiple jobs + * concurrently. + * + * Note: This is part of a larger implementation (SPARK-3174) and currently does not actually + * request to add or remove executors. The mechanism to actually do this will be added separately, + * e.g. in SPARK-3822 for Yarn. + */ +private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging { + import ExecutorAllocationManager._ + + private val conf = sc.conf + + // Lower and upper bounds on the number of executors. These are required. + private val minNumExecutors = conf.getInt(spark.dynamicAllocation.minExecutors, -1) + private val maxNumExecutors = conf.getInt(spark.dynamicAllocation.maxExecutors, -1) + if (minNumExecutors 0 || maxNumExecutors 0) { +throw new SparkException(spark.dynamicAllocation.{min/max}Executors must be set!) + } + if (minNumExecutors maxNumExecutors) { +throw new SparkException(spark.dynamicAllocation.minExecutors must + +
[GitHub] spark pull request: [SPARK-3795] Heuristics for dynamically scalin...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r19389583 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -0,0 +1,409 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import scala.collection.mutable + +import org.apache.spark.scheduler._ + +/** + * An agent that dynamically allocates and removes executors based on the workload. + * + * The add policy depends on the number of pending tasks. If the queue of pending tasks is not + * drained in N seconds, then new executors are added. If the queue persists for another M + * seconds, then more executors are added and so on. The number added in each round increases + * exponentially from the previous round until an upper bound on the number of executors has + * been reached. + * + * The rationale for the exponential increase is twofold: (1) Executors should be added slowly + * in the beginning in case the number of extra executors needed turns out to be small. Otherwise, + * we may add more executors than we need just to remove them later. (2) Executors should be added + * quickly over time in case the maximum number of executors is very high. Otherwise, it will take + * a long time to ramp up under heavy workloads. + * + * The remove policy is simpler: If an executor has been idle for K seconds (meaning it has not + * been scheduled to run any tasks), then it is removed. This requires starting a timer on each + * executor instead of just starting a global one as in the add case. + * + * There is no retry logic in either case. Because the requests to the cluster manager are + * asynchronous, this class does not know whether a request has been granted until later. For + * this reason, both add and remove are treated as best-effort only. + * + * The relevant Spark properties include the following: + * + * spark.dynamicAllocation.enabled - Whether this feature is enabled + * spark.dynamicAllocation.minExecutors - Lower bound on the number of executors + * spark.dynamicAllocation.maxExecutors - Upper bound on the number of executors + * + * spark.dynamicAllocation.addExecutorThresholdSeconds - How long before new executors are added + * spark.dynamicAllocation.addExecutorIntervalSeconds - How often to add new executors + * spark.dynamicAllocation.removeExecutorThresholdSeconds - How long before an executor is removed + * + * Synchronization: Because the schedulers in Spark are single-threaded, contention should only + * arise when new executors register or when existing executors are removed, both of which are + * relatively rare events with respect to task scheduling. Thus, synchronizing each method on the + * same lock should not be expensive assuming biased locking is enabled in the JVM (on by default + * for Java 6+). This may not be true, however, if the application itself runs multiple jobs + * concurrently. + * + * Note: This is part of a larger implementation (SPARK-3174) and currently does not actually + * request to add or remove executors. The mechanism to actually do this will be added separately, + * e.g. in SPARK-3822 for Yarn. + */ +private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging { + import ExecutorAllocationManager._ + + private val conf = sc.conf + + // Lower and upper bounds on the number of executors. These are required. + private val minNumExecutors = conf.getInt(spark.dynamicAllocation.minExecutors, -1) + private val maxNumExecutors = conf.getInt(spark.dynamicAllocation.maxExecutors, -1) + if (minNumExecutors 0 || maxNumExecutors 0) { +throw new SparkException(spark.dynamicAllocation.{min/max}Executors must be set!) + } + if (minNumExecutors maxNumExecutors) { +throw new SparkException(spark.dynamicAllocation.minExecutors must + +
[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60553953 Can you point me to the commit that produced that stacktrace? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4095][YARN][Minor]extract val isLaunchi...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/2954 [SPARK-4095][YARN][Minor]extract val isLaunchingDriver in ClientBase Instead of checking if `args.userClass` is null repeatedly, we extract it to an global val as in `ApplicationMaster`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark MemUnit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2954.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2954 commit 13bda20c2cafcb3962f74abe600fbb4a01ced88c Author: WangTaoTheTonic barneystin...@aliyun.com Date: 2014-10-27T06:09:41Z extract val isLaunchingDriver in ClientBase --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4087] use broadcast for task only when ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2933#issuecomment-60554225 @JoshRosen @pwendell The test branch (internal) did not have that commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4095][YARN][Minor]extract val isLaunchi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2954#issuecomment-60554332 [Test build #22278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22278/consoleFull) for PR 2954 at commit [`13bda20`](https://github.com/apache/spark/commit/13bda20c2cafcb3962f74abe600fbb4a01ced88c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3904] [SQL] add constant objectinspecto...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2762#issuecomment-60554357 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4030] Make destroy public for broadcast...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2922#issuecomment-60554586 Oh right yeah. Great, LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3594] [PySpark] [SQL] take more rows to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2716#issuecomment-60554724 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22273/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3594] [PySpark] [SQL] take more rows to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2716#issuecomment-60554714 [Test build #22273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22273/consoleFull) for PR 2716 at commit [`567dc60`](https://github.com/apache/spark/commit/567dc60d7ce2c43ec7c1e24e47dc515ab5056ac0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NullType(PrimitiveType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Streaming KMeans [MLLIB][SPARK-3254]
Github user anantasty commented on the pull request: https://github.com/apache/spark/pull/2942#issuecomment-60554980 Should we create another PR for the python bindings/example? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2771#issuecomment-6072 test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/2955 [SPARK-4096][YARN]Update executor memory description in the help message Here `ApplicationMaster` accept executor memory argument only in number format, we should update the description in help message. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark modifyDesc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2955.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2955 commit 37797670f70d2332c412d14c8cc9eac2573d8bce Author: WangTaoTheTonic barneystin...@aliyun.com Date: 2014-10-27T06:45:21Z Update executor memory description in the help message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60555883 [Test build #22279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22279/consoleFull) for PR 2955 at commit [`3779767`](https://github.com/apache/spark/commit/37797670f70d2332c412d14c8cc9eac2573d8bce). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
GitHub user liyezhang556520 opened a pull request: https://github.com/apache/spark/pull/2956 [SPARK-4094][CORE] checkpoint should still be available after any rdd actions JIRA URL: [SPARK-4094](https://issues.apache.org/jira/browse/SPARK-4094) You can merge this pull request into a Git repository by running: $ git pull https://github.com/liyezhang556520/spark cpAfterAction Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2956.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2956 commit 719c29da025519c9282940ac39c398ab860f700f Author: Zhang, Liye liye.zh...@intel.com Date: 2014-10-27T06:50:50Z [SPARK-4094][CORE] checkpoint should still be available after any rdd actions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2771#issuecomment-60555869 [Test build #22280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22280/consoleFull) for PR 2771 at commit [`1379c73`](https://github.com/apache/spark/commit/1379c7396d04bd5dfab0b7e436661fc7a6a6a096). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3904] [SQL] add constant objectinspecto...
Github user gvramana commented on the pull request: https://github.com/apache/spark/pull/2762#issuecomment-60556024 Yes, one more advantage of passing object inspector as parameter is, constants need not be allocated every time, same value in constantObjectInpector can be reused. I already has given my comments, I will rework on #2802 to support GenericUDAF once this is merged. I am waiting for this PR to be merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-60556120 [Test build #22281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22281/consoleFull) for PR 2956 at commit [`719c29d`](https://github.com/apache/spark/commit/719c29da025519c9282940ac39c398ab860f700f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-60556812 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-60556909 Yeah, I like that idea. On Oct 26, 2014 8:43 PM, wangfei notificati...@github.com wrote: @marmbrus https://github.com/marmbrus, how about make a new sub project named hive-shim to keep all the Hive Shim code in it? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/2685#issuecomment-60547522. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2753#issuecomment-60556908 [Test build #22275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22275/consoleFull) for PR 2753 at commit [`8dfcceb`](https://github.com/apache/spark/commit/8dfcceb5127b638ece6817e7858c6cbf93461cd6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2753#issuecomment-60556917 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22275/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3988][SQL] add public API for date type
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2901#issuecomment-60556986 [Test build #22274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22274/consoleFull) for PR 2901 at commit [`c51a24d`](https://github.com/apache/spark/commit/c51a24d382cc40928f2b90b438ff5f19705bd10b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DateType(PrimitiveType):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3988][SQL] add public API for date type
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2901#issuecomment-60556992 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22274/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4030] Make destroy public for broadcast...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2922#issuecomment-60557108 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22277/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4030] Make destroy public for broadcast...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2922#issuecomment-60557102 [Test build #22277 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22277/consoleFull) for PR 2922 at commit [`a11abab`](https://github.com/apache/spark/commit/a11ababb21a0c2378a4d2f665f16d16112d7b469). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2884#issuecomment-60557917 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4095][YARN][Minor]extract val isLaunchi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2954#issuecomment-60558127 [Test build #22278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22278/consoleFull) for PR 2954 at commit [`13bda20`](https://github.com/apache/spark/commit/13bda20c2cafcb3962f74abe600fbb4a01ced88c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4095][YARN][Minor]extract val isLaunchi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2954#issuecomment-60558131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22278/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2771#issuecomment-60558681 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22280/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2771#issuecomment-60558676 [Test build #22280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22280/consoleFull) for PR 2771 at commit [`1379c73`](https://github.com/apache/spark/commit/1379c7396d04bd5dfab0b7e436661fc7a6a6a096). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60558850 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22279/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60558845 [Test build #22279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22279/consoleFull) for PR 2955 at commit [`3779767`](https://github.com/apache/spark/commit/37797670f70d2332c412d14c8cc9eac2573d8bce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2956#discussion_r19391269 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1204,6 +1204,8 @@ abstract class RDD[T: ClassTag]( } else if (checkpointData.isEmpty) { checkpointData = Some(new RDDCheckpointData(this)) checkpointData.get.markForCheckpoint() + // There is supposed to be doCheckpoint in the following, reset doCheckpointCalled first + doCheckpointCalled = false --- End diff -- From the docs, it's clear that this is not intended to be called after operations have executed on the RDD. These changes kind of hack it so it doesn't directly fail, but are you certain this is valid? race conditions and so on? What's the point of `doCheckpointCalled` after this change, really? the criteria seems to collapse to allow checkpoint if no checkpoint data has been written. If it's that easy I do wonder why it wasn't this way in the first place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60558978 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60559230 It seems much more desirable to just support 3g or 200m in this argument, as was intended. `Utils.memoryStringToMb` can do the conversion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60559246 [Test build #22282 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22282/consoleFull) for PR 2955 at commit [`3779767`](https://github.com/apache/spark/commit/37797670f70d2332c412d14c8cc9eac2573d8bce). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2846#issuecomment-60559452 The `awk` command here is still a little wrong, as it matches any string `error`. `grep \[error\]` definitely works on `scalastyle.txt` to match `[error]`. There's a confusion here somewhere but hey this may end up working just fine to look for `error` only, and `awk` is probably installed everywhere, just suboptimal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-60559682 Sorry, are we going to support both Hive 0.12 0.13.1 in long term? I am working on SerDe stuff #2570, seems lots of method signatures changed after upgrading to 0.13.1., as well as the `ObjectInspector`. If we need to support both versions, probably the Shim code will be complicated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60559959 @srowen I thought this way at beginning, but after more dig I found in `ClientBase.scala` there is an initilization of `ApplicationMaster` using arguments from `ClientArguments` which already convert args like 3g or 200m into numbers in MB. So considering compatibility, maybe just modifying description is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2570#issuecomment-60560204 [Test build #22283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22283/consoleFull) for PR 2570 at commit [`53d0c7a`](https://github.com/apache/spark/commit/53d0c7a911748efce5670ec79f4f565fa5b17950). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2570#issuecomment-60560261 @marmbrus, I've rebase dthe code with latest master (with Hive 0.13.1 supported, but not compatible with Hive 0.12). Please let me know if you have concerns on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2570#issuecomment-60560430 [Test build #22283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22283/consoleFull) for PR 2570 at commit [`53d0c7a`](https://github.com/apache/spark/commit/53d0c7a911748efce5670ec79f4f565fa5b17950). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateTableAsSelect[T](` * ` logDebug(Found class for $serdeName)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2570#issuecomment-60560431 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22283/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-60560699 [Test build #22281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22281/consoleFull) for PR 2956 at commit [`719c29d`](https://github.com/apache/spark/commit/719c29da025519c9282940ac39c398ab860f700f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-60560702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22281/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/2957 [SPARK-4097] Fix the race condition of 'thread' There is a chance that `thread` is null when calling `thread.interrupt()`. ```Scala override def cancel(): Unit = this.synchronized { _cancelled = true if (thread != null) { thread.interrupt() } } ``` Should put `thread = null` into a `synchronized` block to fix the race condition. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-4097 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2957.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2957 commit c5cfeca22b46d6538f669ebbe5dd10fd198583c9 Author: zsxwing zsxw...@gmail.com Date: 2014-10-27T08:21:32Z Fix the race condition of 'thread' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392124 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -556,6 +556,8 @@ class SQLQuerySuite extends QueryTest with BeforeAndAfterAll { sql(SELECT * FROM lowerCaseData INTERSECT SELECT * FROM upperCaseData), Nil) } + + --- End diff -- Please remove extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392140 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoins.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable.{ArrayBuffer, BitSet} + + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.SQLContext + + + --- End diff -- Please remove extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392150 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoins.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable.{ArrayBuffer, BitSet} + + --- End diff -- Please remove extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392165 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoinImpl.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + --- End diff -- extra new line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoinImpl.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.spark.sql.execution + +import org.apache.spark.rdd.RDD +import org.apache.spark.{sql, SparkContext} +import org.apache.spark.SparkContext._ +import scala.reflect.ClassTag + +object RangeJoinImpl extends Serializable{ + + /** + * Multi-joins together two RDDs that contain objects that map to reference regions. + * The elements from the first RDD become the key of the output RDD, and the value + * contains all elements from the second RDD which overlap the region of the key. + * This is a multi-join, so it preserves n-to-m relationships between regions. + * + * @param sc A spark context from the cluster that will perform the join + * @param rdd1 RDD of values on which we build an interval tree. Assume |rdd1| |rdd2| + */ + def overlapJoin(sc: SparkContext, + rdd1: RDD[(Interval[Long],sql.Row)], + rdd2: RDD[(Interval[Long],sql.Row)]): RDD[(sql.Row, Iterable[sql.Row])] = + { + +val indexedRdd1 = rdd1.zipWithIndex().map(_.swap) + +/*Collect only Reference regions and the index of indexedRdd1*/ +val localIntervals = indexedRdd1.map(x = (x._2._1, x._1)).collect() +/*Create and broadcast an interval tree*/ +val intervalTree = sc.broadcast(new IntervalTree[Long](localIntervals.toList)) + +val kvrdd2: RDD[(Long, Iterable[sql.Row])] = rdd2 + //join entry with the intervals returned from the interval tree + .map(x = (intervalTree.value.getAllOverlappings(x._1), x._2)) + .filter(x = x._1 != Nil) //filter out entries that do not join anywhere + .flatMap(t = t._1.map(s = (s._2, t._2))) //create pairs of (index1, rdd2Elem) + .groupByKey + + --- End diff -- extra new line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392217 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -188,8 +191,18 @@ class SqlParser extends AbstractSparkSQLParser { } } - protected lazy val joinConditions: Parser[Expression] = -ON ~ expression + protected lazy val rangeJoinedRelation: Parser[LogicalPlan] = +relationFactor ~ RANGEJOIN ~ relationFactor ~ ON ~ OVERLAPS ~ --- End diff -- Wrong indentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60561668 [Test build #22284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22284/consoleFull) for PR 2957 at commit [`c5cfeca`](https://github.com/apache/spark/commit/c5cfeca22b46d6538f669ebbe5dd10fd198583c9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60561667 [Test build #22285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22285/consoleFull) for PR 2607 at commit [`eff21fe`](https://github.com/apache/spark/commit/eff21fea01393a44c7876542832e752c26cbcd86). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392262 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoinImpl.scala --- @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.spark.sql.execution + +import org.apache.spark.rdd.RDD +import org.apache.spark.{sql, SparkContext} +import org.apache.spark.SparkContext._ +import scala.reflect.ClassTag + +object RangeJoinImpl extends Serializable{ + + /** + * Multi-joins together two RDDs that contain objects that map to reference regions. + * The elements from the first RDD become the key of the output RDD, and the value + * contains all elements from the second RDD which overlap the region of the key. + * This is a multi-join, so it preserves n-to-m relationships between regions. + * + * @param sc A spark context from the cluster that will perform the join + * @param rdd1 RDD of values on which we build an interval tree. Assume |rdd1| |rdd2| + */ + def overlapJoin(sc: SparkContext, + rdd1: RDD[(Interval[Long],sql.Row)], + rdd2: RDD[(Interval[Long],sql.Row)]): RDD[(sql.Row, Iterable[sql.Row])] = + { + +val indexedRdd1 = rdd1.zipWithIndex().map(_.swap) + +/*Collect only Reference regions and the index of indexedRdd1*/ --- End diff -- Please add white space after * here and other places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392296 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoins.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable.{ArrayBuffer, BitSet} + + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.SQLContext + + + +@DeveloperApi +case class RangeJoin(left: SparkPlan, + right: SparkPlan, + condition: Seq[Expression], + context: SQLContext) extends BinaryNode with Serializable{ + def output = left.output ++ right.output + + lazy val (buildPlan, streamedPlan) = (left, right) + + lazy val (buildKeys, streamedKeys) = (List(condition(0),condition(1)), +List(condition(2), condition(3))) + + @transient lazy val buildKeyGenerator = new InterpretedProjection(buildKeys, buildPlan.output) + @transient lazy val streamKeyGenerator = new InterpretedProjection(streamedKeys, +streamedPlan.output) + + def execute() = { + +val v1 = left.execute() +val v1kv = v1.map(x = { + val v1Key = buildKeyGenerator(x) + (new Interval[Long](v1Key.apply(0).asInstanceOf[Long], v1Key.apply(1).asInstanceOf[Long]), +x.copy() ) +}) + +val v2 = right.execute() +val v2kv = v2.map(x = { + val v2Key = streamKeyGenerator(x) + (new Interval[Long](v2Key.apply(0).asInstanceOf[Long], v2Key.apply(1).asInstanceOf[Long]), +x.copy() ) +}) + +/*As we are going to collect v1 and build an interval tree on its intervals, +make sure that its size is the smaller one.*/ + assert(v1.count = v2.count) + + --- End diff -- extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoins.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable.{ArrayBuffer, BitSet} + + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.SQLContext + + + +@DeveloperApi +case class RangeJoin(left: SparkPlan, + right: SparkPlan, + condition: Seq[Expression], + context: SQLContext) extends BinaryNode with Serializable{ + def output = left.output ++ right.output + + lazy val (buildPlan, streamedPlan) = (left, right) + + lazy val (buildKeys, streamedKeys) = (List(condition(0),condition(1)), +List(condition(2), condition(3))) + + @transient lazy val buildKeyGenerator = new InterpretedProjection(buildKeys, buildPlan.output) + @transient lazy val streamKeyGenerator = new InterpretedProjection(streamedKeys, +streamedPlan.output) + + def execute() = { + +val v1 = left.execute() +val v1kv = v1.map(x = { + val v1Key = buildKeyGenerator(x) + (new Interval[Long](v1Key.apply(0).asInstanceOf[Long], v1Key.apply(1).asInstanceOf[Long]), +x.copy() ) +}) + +val v2 = right.execute() +val v2kv = v2.map(x = { + val v2Key = streamKeyGenerator(x) + (new Interval[Long](v2Key.apply(0).asInstanceOf[Long], v2Key.apply(1).asInstanceOf[Long]), +x.copy() ) +}) + +/*As we are going to collect v1 and build an interval tree on its intervals, +make sure that its size is the smaller one.*/ + assert(v1.count = v2.count) + + +val v3 = RangeJoinImpl.overlapJoin(context.sparkContext, v1kv, v2kv) + .flatMap(l = l._2.map(r = (l._1,r))) + +val v4 = v3.map { + case (l: Row, r: Row) = new JoinedRow(l, r).withLeft(l) +} +v4 + } + + --- End diff -- extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392318 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/RangeJoins.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import scala.collection.mutable.{ArrayBuffer, BitSet} + + +import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.SQLContext + + + +@DeveloperApi +case class RangeJoin(left: SparkPlan, + right: SparkPlan, + condition: Seq[Expression], + context: SQLContext) extends BinaryNode with Serializable{ + def output = left.output ++ right.output + + lazy val (buildPlan, streamedPlan) = (left, right) + + lazy val (buildKeys, streamedKeys) = (List(condition(0),condition(1)), +List(condition(2), condition(3))) + + @transient lazy val buildKeyGenerator = new InterpretedProjection(buildKeys, buildPlan.output) + @transient lazy val streamKeyGenerator = new InterpretedProjection(streamedKeys, +streamedPlan.output) + + def execute() = { + +val v1 = left.execute() +val v1kv = v1.map(x = { + val v1Key = buildKeyGenerator(x) + (new Interval[Long](v1Key.apply(0).asInstanceOf[Long], v1Key.apply(1).asInstanceOf[Long]), +x.copy() ) +}) + +val v2 = right.execute() +val v2kv = v2.map(x = { + val v2Key = streamKeyGenerator(x) + (new Interval[Long](v2Key.apply(0).asInstanceOf[Long], v2Key.apply(1).asInstanceOf[Long]), +x.copy() ) +}) + +/*As we are going to collect v1 and build an interval tree on its intervals, +make sure that its size is the smaller one.*/ + assert(v1.count = v2.count) + + +val v3 = RangeJoinImpl.overlapJoin(context.sparkContext, v1kv, v2kv) + .flatMap(l = l._2.map(r = (l._1,r))) + +val v4 = v3.map { + case (l: Row, r: Row) = new JoinedRow(l, r).withLeft(l) +} +v4 + } + + +} + +case class Interval[T % Long](start: T, end: T){ --- End diff -- Please add white space between ) and { here and other places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392340 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -102,6 +102,13 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } } + object RangeJoin extends Strategy{ --- End diff -- Please add white space between symbol and { here and other places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392360 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLRangeJoinSuite.scala --- @@ -0,0 +1,81 @@ +/** + * Licensed to Big Data Genomics (BDG) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The BDG licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{SQLContext, QueryTest} +import org.apache.spark.sql.test._ +import TestSQLContext._ + +case class RecordData1(start1: Long, end1: Long) extends Serializable +case class RecordData2(start2: Long, end2: Long) extends Serializable + +class SQLRangeJoinSuite extends QueryTest { + + + val sc = TestSQLContext.sparkContext + val sqlContext = new SQLContext(sc) + import sqlContext._ + + test(joining non overlappings results into no entries){ + +val rdd1 = sc.parallelize(Seq((1L,5L), (2L,7L))).map(i = RecordData1(i._1, i._2)) +val rdd2 = sc.parallelize(Seq((11L,44L), (23L, 45L))).map(i = RecordData2(i._1, i._2)) + +rdd1.registerTempTable(t1) +rdd2.registerTempTable(t2) +checkAnswer( + sql(select * from t1 RANGEJOIN t2 on OVERLAPS( (start1, end1), (start2, end2))), + Nil +) + --- End diff -- extra new line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392347 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLRangeJoinSuite.scala --- @@ -0,0 +1,81 @@ +/** + * Licensed to Big Data Genomics (BDG) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The BDG licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{SQLContext, QueryTest} +import org.apache.spark.sql.test._ +import TestSQLContext._ + +case class RecordData1(start1: Long, end1: Long) extends Serializable +case class RecordData2(start2: Long, end2: Long) extends Serializable + +class SQLRangeJoinSuite extends QueryTest { + + + val sc = TestSQLContext.sparkContext + val sqlContext = new SQLContext(sc) + import sqlContext._ + + test(joining non overlappings results into no entries){ + +val rdd1 = sc.parallelize(Seq((1L,5L), (2L,7L))).map(i = RecordData1(i._1, i._2)) +val rdd2 = sc.parallelize(Seq((11L,44L), (23L, 45L))).map(i = RecordData2(i._1, i._2)) + +rdd1.registerTempTable(t1) +rdd2.registerTempTable(t2) +checkAnswer( + sql(select * from t1 RANGEJOIN t2 on OVERLAPS( (start1, end1), (start2, end2))), + Nil +) + + } + + test(basic range join){ +val rdd1 = sc.parallelize(Seq((100L, 199L), + (200L, 299L), + (400L, 600L), + (1L, 2L))) + .map(i = RecordData1(i._1, i._2)) + +val rdd2 = sc.parallelize(Seq((150L, 250L), + (300L, 500L), + (500L, 700L), + (22000L, 22300L))) + .map(i = RecordData2(i._1, i._2)) + +rdd1.registerTempTable(s1) +rdd2.registerTempTable(s2) + + +checkAnswer( + sql(select start1, end1, start2, end2 from s1 RANGEJOIN s2 on OVERLAPS( (start1, end1), (start2, end2))), + (100L, 199L, 150L, 250L) :: +(200L, 299L, 150L, 250L) :: +(400L, 600L, 300L, 500L) :: +(400L, 600L, 500L, 700L) :: Nil +) + +checkAnswer( + sql(select end1 from s1 RANGEJOIN s2 on OVERLAPS( (start1, end1), (start2, end2))), + Seq(199L) :: Seq(299L) :: Seq(600L) :: Seq(600L) :: Nil +) + } + + --- End diff -- Extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/2939#discussion_r19392368 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLRangeJoinSuite.scala --- @@ -0,0 +1,81 @@ +/** + * Licensed to Big Data Genomics (BDG) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The BDG licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{SQLContext, QueryTest} +import org.apache.spark.sql.test._ +import TestSQLContext._ + +case class RecordData1(start1: Long, end1: Long) extends Serializable +case class RecordData2(start2: Long, end2: Long) extends Serializable + +class SQLRangeJoinSuite extends QueryTest { + + --- End diff -- extra new line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2884#issuecomment-60562034 [Test build #473 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/473/consoleFull) for PR 2884 at commit [`6174046`](https://github.com/apache/spark/commit/617404683c50f631cbe0150189bc7c4e535cc33c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-60562052 @pwendell I tried your reproducer after changing a few things, so I am not sure whether I fixed it accidentally or I could not reproduce at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4098][YARN]use appUIAddress instead of ...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/2958 [SPARK-4098][YARN]use appUIAddress instead of appUIHostPort in yarn-client mode https://issues.apache.org/jira/browse/SPARK-4098 You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark useAddress Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2958.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2958 commit 29236e60d90f3faf9ab750fb4e6d10f648be1727 Author: WangTaoTheTonic barneystin...@aliyun.com Date: 2014-10-27T08:25:18Z use appUIAddress instead of appUIHostPort in yarn-cluster mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2570#issuecomment-60562071 Build failed because this PR is only compatible with Hive 0,13.1 (not 0.12 any more). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add range join support to spark-sql
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2939#issuecomment-60562141 Hi @kozanitis , did you try to run sbt/sbt -Phive scalastyle ? That command helps you, telling which code is invalid style. Maybe, the length of some lines is over 100. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-60562506 [Test build #22287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22287/consoleFull) for PR 2615 at commit [`1e2a7f5`](https://github.com/apache/spark/commit/1e2a7f554fc9ea8f65729520c462e4c5cd4351f8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4098][YARN]use appUIAddress instead of ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2958#issuecomment-60562505 [Test build #22286 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22286/consoleFull) for PR 2958 at commit [`29236e6`](https://github.com/apache/spark/commit/29236e60d90f3faf9ab750fb4e6d10f648be1727). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-4041][SQL]attributes names in table sca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2884#issuecomment-60562729 [Test build #473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/473/consoleFull) for PR 2884 at commit [`6174046`](https://github.com/apache/spark/commit/617404683c50f631cbe0150189bc7c4e535cc33c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-60562791 [Test build #22287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22287/consoleFull) for PR 2615 at commit [`1e2a7f5`](https://github.com/apache/spark/commit/1e2a7f554fc9ea8f65729520c462e4c5cd4351f8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ SPARK-1812] Adjust build system and tests to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2615#issuecomment-60562793 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22287/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...
Github user jongyoul commented on the pull request: https://github.com/apache/spark/pull/2126#issuecomment-60563727 @andrewor14 You mean that a clean way to merge this patch into main stream is that I code this patch from current master branch and make a pull requst again, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60563934 You can also just take a local reference to the thread, and operate on it. The local reference will of course be null in both cases or not-null in both cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60564242 Hm, how do you mean? the rest of the code already expects this to be an `Int` and a number of megabytes. Nothing else can be further parsing this (or else that's a bug). The change here is to support a different intended means of specifying that `Int` upstream. There is even a `MemoryParam` helper class for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60564958 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22282/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60564950 [Test build #22282 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22282/consoleFull) for PR 2955 at commit [`3779767`](https://github.com/apache/spark/commit/37797670f70d2332c412d14c8cc9eac2573d8bce). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60565342 You can also just take a local reference to the thread, and operate on it. The local reference will of course be null in both cases or not-null in both cases. A local reference may interrupt other tasks in the executor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60565582 [Test build #22289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22289/consoleFull) for PR 2957 at commit [`c5cfeca`](https://github.com/apache/spark/commit/c5cfeca22b46d6538f669ebbe5dd10fd198583c9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60566095 Guess I didn't make it clear. So I paste some code segment to help. Excuse me for this. In ClientBase.scala we could see: ` val amClass = if (isLaunchingDriver) { Class.forName(org.apache.spark.deploy.yarn.ApplicationMaster).getName } else { Class.forName(org.apache.spark.deploy.yarn.ExecutorLauncher).getName } val userArgs = args.userArgs.flatMap { arg = Seq(--arg, YarnSparkHadoopUtil.escapeForShell(arg)) } val amArgs = Seq(amClass) ++ userClass ++ userJar ++ userArgs ++ Seq( --executor-memory, args.executorMemory.toString, --executor-cores, args.executorCores.toString, --num-executors , args.numExecutors.toString)` That is to say, we init an `ApplicationMaster` here using args `--executor-memory, args.executorMemory.toString,`. Here `args` in `args.executorMemory` is an `ClientArguments` instance in which `executorMemory` is a number. So in common scene we would not directly init `ApplicationMaster` (even we could do), `ApplicationMaster` usuarlly is inited in `ClientBase` class. Did I make it clear ? -__- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60566259 [Test build #22290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22290/consoleFull) for PR 2906 at commit [`2676166`](https://github.com/apache/spark/commit/2676166ba6f307b4605ea1e7ecf6ece5b9e200b3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60566367 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22290/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60566364 [Test build #22290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22290/consoleFull) for PR 2906 at commit [`2676166`](https://github.com/apache/spark/commit/2676166ba6f307b4605ea1e7ecf6ece5b9e200b3). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class JavaHierarchicalClustering ` * `trait HierarchicalClusteringConf extends Serializable ` * `class HierarchicalClustering(` * `class ClusteringModel(object):` * `class KMeansModel(ClusteringModel):` * `class HierarchicalClusteringModel(ClusteringModel):` * `class HierarchicalClustering(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60566474 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22288/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60566674 @mengxr thank you for your feedback. Is there a paper that you used as reference? If so, please cite it in the doc. Yes. I added the comment into the doc. https://github.com/yu-iskw/spark/commit/6b22f0752d5d692912c1e8a5e3390326e5d8ebc6 Could you send some performance testing results on dense and sparse datasets? I had only tested the performance on dense datasets. You can download the benchmark result below the URL. However, because I changed the algorithm, I will test it again. I will send the result to you. https://issues.apache.org/jira/secure/attachment/12675783/benchmark2.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4098][YARN]use appUIAddress instead of ...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2958#issuecomment-60567168 Same replacement happened in https://github.com/apache/spark/pull/2276, same change in `runExecutorLauncher` is mentioned in that PR but done nothing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2429] [MLlib] Hierarchical Implementati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2906#issuecomment-60567740 [Test build #22291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22291/consoleFull) for PR 2906 at commit [`8be11da`](https://github.com/apache/spark/commit/8be11da1f045e9ffc8c56886eea7c133aefe3eaf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60568456 [Test build #22285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22285/consoleFull) for PR 2607 at commit [`eff21fe`](https://github.com/apache/spark/commit/eff21fea01393a44c7876542832e752c26cbcd86). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] SPARK-1547: Adding Gradient Boosting t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-60568462 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22285/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60568910 [Test build #22284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22284/consoleFull) for PR 2957 at commit [`c5cfeca`](https://github.com/apache/spark/commit/c5cfeca22b46d6538f669ebbe5dd10fd198583c9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60568924 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22284/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4032] Deprecate YARN alpha support in S...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2878#issuecomment-60569271 [Test build #22292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22292/consoleFull) for PR 2878 at commit [`17e9857`](https://github.com/apache/spark/commit/17e9857eac51b1a99c08ec6f5d899d907ad5d9fa). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org