[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-5274 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20321/consoleFull) for PR 1983 at commit [`c22e8c2`](https://github.com/apache/spark/commit/c22e8c272bea24e670cf92d2eee5d9aa40f2891b). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Document(docId: Int, content: Array[Int]) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2250#issuecomment-5663 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20322/consoleFull) for PR 2250 at commit [`ead8966`](https://github.com/apache/spark/commit/ead8966e4bed34243cda135cb5dd1b5ec5c8c332). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-6122 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20327/consoleFull) for PR 2358 at commit [`3dbf037`](https://github.com/apache/spark/commit/3dbf037c69548fac099b75f9e34a1fbd5076a572). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-6169 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20327/consoleFull) for PR 2358 at commit [`3dbf037`](https://github.com/apache/spark/commit/3dbf037c69548fac099b75f9e34a1fbd5076a572). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NonASCIICharacterChecker extends ScalariformChecker ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add a Community Projects page
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2219#issuecomment-6421 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20325/consoleFull) for PR 2219 at commit [`7316822`](https://github.com/apache/spark/commit/7316822935dfd9bd8d9e432e1582f5470da10c32). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaSparkContext(val sc: SparkContext)` * `class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception ` * `class RatingDeserializer(FramedSerializer):` * ` class Encoder[T : NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] ` * ` class Encoder[T : NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] ` * ` class Encoder[T : NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] ` * ` class Encoder extends compression.Encoder[IntegerType.type] ` * ` class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])` * ` class Encoder extends compression.Encoder[LongType.type] ` * ` class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])` * `class JavaStreamingContext(val ssc: StreamingContext) extends Closeable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-6582 Hey @pwendell, I can remove the commit once you confirm it works. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-6598 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20328/consoleFull) for PR 1977 at commit [`4d4bc86`](https://github.com/apache/spark/commit/4d4bc8671a4ef7e9d2d9924681bed1f8e4695a20). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3501] [SQL] Fix the bug of Hive SimpleU...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2368#issuecomment-6637 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2357#issuecomment-6738 you are right about that, but we are forking it because we want an install plugin modified. If it was possible to run a plugin just before install and install plugin magically does the job correctly. That would have been nicer. But that isn't possible unless I try to hack things using reflection. Its like maven keeps its copy of objects before letting plugins use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3501] [SQL] Fix the bug of Hive SimpleU...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2368#issuecomment-6842 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20329/consoleFull) for PR 2368 at commit [`b804abd`](https://github.com/apache/spark/commit/b804abd5be4161531db38193be310cf628674cec). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2333#issuecomment-7935 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20326/consoleFull) for PR 2333 at commit [`d41b3ca`](https://github.com/apache/spark/commit/d41b3caf1adb0c807aa6ce9d011e5e2553408fe2). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class JavaSparkContext(val sc: SparkContext)` * ` throw new IllegalStateException(The main method in the given main class must be static)` * `class TaskCompletionListenerException(errorMessages: Seq[String]) extends Exception ` * `class Dummy(object):` * `class RatingDeserializer(FramedSerializer):` * ` class Encoder[T : NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] ` * ` class Encoder[T : NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] ` * ` class Encoder[T : NativeType](columnType: NativeColumnType[T]) extends compression.Encoder[T] ` * ` class Encoder extends compression.Encoder[IntegerType.type] ` * ` class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[IntegerType.type])` * ` class Encoder extends compression.Encoder[LongType.type] ` * ` class Decoder(buffer: ByteBuffer, columnType: NativeColumnType[LongType.type])` * `class JavaStreamingContext(val ssc: StreamingContext) extends Closeable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528496 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -181,11 +182,25 @@ class SqlParser extends StandardTokenParsers with PackratParsers { val overwrite: Boolean = o.getOrElse() == OVERWRITE InsertIntoTable(r, Map[String, Option[String]](), s, overwrite) } + + protected lazy val addCache: Parser[LogicalPlan] = +ADD ~ CACHE ~ TABLE ~ ident ~ AS ~ select ~ opt(;) ^^ { + case tableName ~ as ~ s = + CacheTableAsSelectCommand(tableName,s) +} --- End diff -- I will remove it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528493 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -181,11 +182,25 @@ class SqlParser extends StandardTokenParsers with PackratParsers { val overwrite: Boolean = o.getOrElse() == OVERWRITE InsertIntoTable(r, Map[String, Option[String]](), s, overwrite) } + + protected lazy val addCache: Parser[LogicalPlan] = +ADD ~ CACHE ~ TABLE ~ ident ~ AS ~ select ~ opt(;) ^^ { + case tableName ~ as ~ s = + CacheTableAsSelectCommand(tableName,s) +} protected lazy val cache: Parser[LogicalPlan] = -(CACHE ^^^ true | UNCACHE ^^^ false) ~ TABLE ~ ident ^^ { - case doCache ~ _ ~ tableName = CacheCommand(tableName, doCache) +CACHE ~ TABLE ~ ident ~ opt(AS) ~ opt(select) ~ opt(;) ^^ { --- End diff -- Thank you for your comments. Yes,It is better to add as ``` opt(AS ~ select) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -305,6 +305,8 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { Seq(execution.ExplainCommand(logicalPlan, plan.output, extended)(context)) case logical.CacheCommand(tableName, cache) = Seq(execution.CacheCommand(tableName, cache)(context)) + case logical.CacheTableAsSelectCommand(tableName,plan) = --- End diff -- I will update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528600 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -119,4 +119,16 @@ class CachedTableSuite extends QueryTest { } assert(!TestSQLContext.isCached(testData), Table 'testData' should not be cached) } + + test(CACHE TABLE tableName AS SELECT Star Table) { +TestSQLContext.sql(CACHE TABLE testCacheTable AS SELECT * FROM testData) +TestSQLContext.sql(SELECT * FROM testCacheTable WHERE key = 1).collect() --- End diff -- I will add --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528595 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,22 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) +sqlContext.cacheTable(tableName) +// It does the caching eager. --- End diff -- Either way is OK. But eager caching is existed only for this feature, so it may look inconsistent compared to other caching commands. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528605 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -214,6 +214,7 @@ private[hive] object HiveQl { */ def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) + --- End diff -- Sorry for trouble. I am new to git,so added accidently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528625 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -1097,7 +1109,7 @@ private[hive] object HiveQl { case Token(TOK_FUNCTION, Token(functionName, Nil) :: children) = HiveGenericUdtf(functionName, attributes, children.map(nodeToExpr)) - + --- End diff -- Ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528618 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -229,11 +230,17 @@ private[hive] object HiveQl { SetCommand(Some(key), Some(value)) } } else if (sql.trim.toLowerCase.startsWith(cache table)) { -CacheCommand(sql.trim.drop(12).trim, true) +sql.trim.drop(12).trim.split( ).toSeq match { + case Seq(tableName) = +CacheCommand(tableName, true) + case Seq(tableName,as, select@_*) = +CacheTableAsSelectCommand(tableName, +createPlan(sql.trim.drop(12 + tableName.length() + as.length() + 2))) +} } else if (sql.trim.toLowerCase.startsWith(uncache table)) { CacheCommand(sql.trim.drop(14).trim, false) } else if (sql.trim.toLowerCase.startsWith(add jar)) { -AddJar(sql.trim.drop(8).trim) +NativeCommand(sql) --- End diff -- Sorry for trouble. I am new to git,so added accidently --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2390#discussion_r17528644 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -127,6 +127,7 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val SUBSTRING = Keyword(SUBSTRING) protected val SQRT = Keyword(SQRT) protected val ABS = Keyword(ABS) + protected val ADD = Keyword(ADD) --- End diff -- I will remove it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2342#issuecomment-8998 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20324/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2357#issuecomment-9135 I am anyway trying more options to not need to modify maven-install-pluing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17528900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -181,6 +182,12 @@ class SqlParser extends StandardTokenParsers with PackratParsers { val overwrite: Boolean = o.getOrElse() == OVERWRITE InsertIntoTable(r, Map[String, Option[String]](), s, overwrite) } + + protected lazy val addCache: Parser[LogicalPlan] = +ADD ~ CACHE ~ TABLE ~ ident ~ AS ~ select ~ opt(;) ^^ { --- End diff -- Thanks for your comments. Sorry for misunderstanding I updated as per the syntax ```CACHE TABLE AS SELECT ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17528911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -181,6 +182,12 @@ class SqlParser extends StandardTokenParsers with PackratParsers { val overwrite: Boolean = o.getOrElse() == OVERWRITE InsertIntoTable(r, Map[String, Option[String]](), s, overwrite) } + + protected lazy val addCache: Parser[LogicalPlan] = --- End diff -- Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17528919 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,24 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String,plan: LogicalPlan)( --- End diff -- Ok.Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17528937 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,24 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String,plan: LogicalPlan)( +@transient context: SQLContext) --- End diff -- OK. Removed passing ```sqlContext ``` manually. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17528957 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala --- @@ -119,4 +119,20 @@ class CachedTableSuite extends QueryTest { } assert(!TestSQLContext.isCached(testData), Table 'testData' should not be cached) } + + test(ADD CACHE TABLE tableName AS SELECT Star Table) { +TestSQLContext.sql(ADD CACHE TABLE testCacheTable AS SELECT * FROM testData) +TestSQLContext.sql(SELECT * FROM testCacheTable WHERE key = 1).collect() +TestSQLContext.uncacheTable(testCacheTable) + } + + test('ADD CACHE TABLE tableName AS SELECT ..') { +TestSQLContext.sql(ADD CACHE TABLE testCacheTable AS SELECT * FROM testData) +TestSQLContext.table(testCacheTable).queryExecution.executedPlan match { + case _: InMemoryColumnarTableScan = // Found evidence of caching --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17528949 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,24 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String,plan: LogicalPlan)( +@transient context: SQLContext) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +context.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) +context.cacheTable(tableName) +//It does the caching eager. +//TODO : Does it really require to collect? --- End diff -- I have added as ```count``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17529002 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -214,6 +214,7 @@ private[hive] object HiveQl { */ def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) + --- End diff -- Sorry for trouble. I am new to git. I have removed new line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1087] Move python traceback utilities i...
Github user staple commented on a diff in the pull request: https://github.com/apache/spark/pull/2385#discussion_r17528982 --- Diff: python/pyspark/traceback_utils.py --- @@ -0,0 +1,80 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from collections import namedtuple +import os +import traceback + + +__all__ = [extract_concise_traceback, SparkContext] --- End diff -- Looks like I also need to put JavaStackTrace here instead of SparkContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17529035 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -233,7 +234,7 @@ private[hive] object HiveQl { } else if (sql.trim.toLowerCase.startsWith(uncache table)) { CacheCommand(sql.trim.drop(14).trim, false) } else if (sql.trim.toLowerCase.startsWith(add jar)) { -AddJar(sql.trim.drop(8).trim) +NativeCommand(sql) --- End diff -- Sorry for trouble. I am new to git. I have faced some problems in rebase.I have reverted it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2381#discussion_r17529073 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -243,14 +244,12 @@ private[hive] object HiveQl { } else if (sql.trim.startsWith(!)) { ShellCommand(sql.drop(1)) } else { -val tree = getAst(sql) -if (nativeCommands contains tree.getText) { - NativeCommand(sql) +if (sql.trim.toLowerCase.startsWith(add cache table)) { + sql.trim.drop(16).split( ).toSeq match { + case Seq(tableName,as, xs@_*) = CacheTableAsSelectCommand(tableName,createPlan(sql.trim.drop(16+tableName.length()+as.length()+1))) --- End diff -- Thank you for guiding me. I have run the ```sbt/sbt scalastyle``` and updated the code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1847#issuecomment-55560383 I will close this PR, since most of work was done in #1846 #1962, and native command support for `EXPLAIN` probably not necessary, even Hive doesn't support it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...
Github user chenghao-intel closed the pull request at: https://github.com/apache/spark/pull/1847 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-55560776 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20328/consoleFull) for PR 1977 at commit [`4d4bc86`](https://github.com/apache/spark/commit/4d4bc8671a4ef7e9d2d9924681bed1f8e4695a20). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ResultIterable(object):` * `class FlattedValuesSerializer(BatchedSerializer):` * `class SameKey(object):` * `class GroupByKey(object):` * `class ExternalGroupBy(ExternalMerger):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55560892 A flagged character looks like this: ```= Running Scala style checks = Scalastyle checks failed at following occurrences: error file=/home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/SparkContext.scala message=non.ascii.character.disallowed.message line=304 column=22 java.lang.RuntimeException: exists error at scala.sys.package$.error(package.scala:27) at scala.Predef$.error(Predef.scala:142) [error] (core/*:scalastyle) exists error``` Seems reasonable to merge with that confirmation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3527] [SQL] Strip the string message
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/2392 [SPARK-3527] [SQL] Strip the string message You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark trim Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2392.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2392 commit e52024fc1a093d2464d694546757a988c75b629f Author: Cheng Hao hao.ch...@intel.com Date: 2014-09-15T07:37:56Z trim the string message --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55560989 Yeah Thanks, @ash211 I will get rid of that commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3527] [SQL] Strip the string message
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2392#issuecomment-55561297 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20330/consoleFull) for PR 2392 at commit [`e52024f`](https://github.com/apache/spark/commit/e52024fc1a093d2464d694546757a988c75b629f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55561642 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20331/consoleFull) for PR 2358 at commit [`12a20f2`](https://github.com/apache/spark/commit/12a20f27cf9f1a7a04160add95da4375b123a40d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/2393 [SPARK-3529] [SQL] Delete the temp files after test exit There are lots of temporal files created by TestHive under the /tmp by default, which may cause potential performance issue for testing. This PR will automatically delete them after test exit. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark delete_temp_on_exit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2393.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2393 commit 4ecc9d49a83082806b9f713ee49565aecf5df764 Author: Cheng Hao hao.ch...@intel.com Date: 2014-09-12T01:58:51Z Delete the temp files after test exit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: allow symlinking to shell scripts
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2386#discussion_r17530408 --- Diff: bin/spark-shell --- @@ -29,7 +29,7 @@ esac set -o posix ## Global script variables -FWDIR=$(cd `dirname $0`/..; pwd) +FWDIR=$(cd $(dirname $(readlink -f $0))/..; pwd) --- End diff -- You may have to quote these - so that dirs with spaces within their name work. Like they do at the moment. Above should look like. ```FWDIR=$(cd $(dirname $(readlink -f $0))/..; pwd)``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-55563405 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20332/consoleFull) for PR 2393 at commit [`4ecc9d4`](https://github.com/apache/spark/commit/4ecc9d49a83082806b9f713ee49565aecf5df764). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: allow symlinking to shell scripts
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2386#discussion_r17530450 --- Diff: bin/spark-shell --- @@ -29,7 +29,7 @@ esac set -o posix ## Global script variables -FWDIR=$(cd `dirname $0`/..; pwd) +FWDIR=$(cd $(dirname $(readlink -f $0))/..; pwd) --- End diff -- Ofcourse this applies to all other places as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: make spark-class to work with openjdk
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2387#issuecomment-55563583 This is also a duplicate, of #2301. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Updates to shell globbing in run-example and s...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/449#discussion_r17530603 --- Diff: bin/run-example --- @@ -21,18 +21,25 @@ SCALA_VERSION=2.10 FWDIR=$(cd `dirname $0`/..; pwd) export SPARK_HOME=$FWDIR -EXAMPLES_DIR=$FWDIR/examples -if [ -f $FWDIR/RELEASE ]; then - export SPARK_EXAMPLES_JAR=`ls $FWDIR/lib/spark-examples-*hadoop*.jar` -elif [ -e $EXAMPLES_DIR/target/scala-$SCALA_VERSION/spark-examples-*hadoop*.jar ]; then - export SPARK_EXAMPLES_JAR=`ls $EXAMPLES_DIR/target/scala-$SCALA_VERSION/spark-examples-*hadoop*.jar` -fi +. $FWDIR/bin/load-spark-env.sh +. $FWDIR/bin/sh-funcs.sh --- End diff -- We generally quote string like above as ``` $FWDIR/bin/sh-funcs.sh``` and so on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3501] [SQL] Fix the bug of Hive SimpleU...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2368#issuecomment-55564638 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20329/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3507] Adding RegressionLearner
Github user epahomov commented on the pull request: https://github.com/apache/spark/pull/2371#issuecomment-55565128 Closed, because currently there is similar work in Databricks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2382#discussion_r17531198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -98,12 +98,12 @@ case class Star( override def withNullability(newNullability: Boolean) = this override def withQualifiers(newQualifiers: Seq[String]) = this - def expand(input: Seq[Attribute]): Seq[NamedExpression] = { + def expand(input: Seq[Attribute], resolver: Resolver): Seq[NamedExpression] = { val expandedAttributes: Seq[Attribute] = table match { // If there is no table specified, use all input attributes. case None = input // If there is a table, pick out attributes that are part of this table. - case Some(t) = input.filter(_.qualifiers contains t) + case Some(t) = input.filter(_.qualifiers.filter(resolver(_,t)).nonEmpty) --- End diff -- Nit: space after `,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
GitHub user epahomov opened a pull request: https://github.com/apache/spark/pull/2394 [Spark-3525] Adding gradient boosting You can merge this pull request into a Git repository by running: $ git pull https://github.com/epahomov/spark SPARK-3525 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2394.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2394 commit d0dfb7b632715c60ef78964ea4d20aaa7712d2e2 Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-04T06:51:45Z Added stochastic gradient boosting algorithm commit 11c247a72e1681661cef4314fec5d1b4283b087f Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-04T06:52:05Z Added stochastic gradient boosting algorithm commit fdfc88e046a29202058b8f45168d624ed91f6d16 Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-05T12:25:41Z Code refactor commit b91b372c951db8bd1be6bd4d2308bc509bc1b44f Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-06T09:02:51Z Added test 'StochasticGradientBoostingSuite' commit 223f0907b6accaa0bf08c7948b2e6c1d728dab18 Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-10T08:08:30Z Added new test commit da13706bd8101ec8a2b648ce6ddc9777516e121f Author: olgaoskina olgaosk...@yandex-team.ru Date: 2014-09-14T15:33:52Z Refactor tests commit eafa0b75785b2ac570ddbc26a80b08b328f7b29c Author: Egor Pakhomov pahomov.e...@gmail.com Date: 2014-09-15T07:42:53Z Merge branch 'gradient_boosting' of https://github.com/olgaoskina/spark into olgaoskina-gradient_boosting commit 3c56f4ef65fb0df80804b0f4b9436f0623582be7 Author: Egor Pakhomov pahomov.e...@gmail.com Date: 2014-09-15T08:46:43Z Merge branch 'olgaoskina-gradient_boosting' into SPARK-3525 commit ce1934a329783629a12f615cbeac3d7e1a05a791 Author: Egor Pakhomov pahomov.e...@gmail.com Date: 2014-09-15T08:32:48Z [SPARK-3525] Fixing GradientBoostingSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2382#discussion_r17531193 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala --- @@ -22,4 +22,9 @@ package org.apache.spark.sql.catalyst * Analysis consists of translating [[UnresolvedAttribute]]s and [[UnresolvedRelation]]s * into fully typed objects using information in a schema [[Catalog]]. */ -package object analysis +package object analysis { + type Resolver = (String, String) = Boolean --- End diff -- `Resolver` probably a general name, can we use a more precise name for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55565526 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20331/consoleFull) for PR 2358 at commit [`12a20f2`](https://github.com/apache/spark/commit/12a20f27cf9f1a7a04160add95da4375b123a40d). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NonASCIICharacterChecker extends ScalariformChecker ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2382#discussion_r17531262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -105,7 +119,9 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] { // One match, but we also need to extract the requested nested field. case Seq((a, nestedFields)) = Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), nestedFields.last)()) - case Seq() = None // No matches. + case Seq() = +println(sCould not find $name in ${input.mkString(, )}) --- End diff -- Use `logTrace` instead? As we did in `Analyzer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2394#issuecomment-55565637 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor]ignore all config files in conf
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/2395 [Minor]ignore all config files in conf Some config files in ```conf``` should ignore, such as conf/fairscheduler.xml conf/hive-log4j.properties conf/metrics.properties ... So ignore all ```sh```/```properties```/```conf```/```xml``` files You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2395.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2395 commit 3c2986fbfa5b8f2d1a4573ae678ceffa306f0083 Author: wangfei wangf...@huawei.com Date: 2014-09-15T08:54:46Z ignore all config files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor]ignore all config files in conf
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55566939 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/2396 [SPARK-3531][SQL]select null from table would throw a MatchError You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark selectnull Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2396.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2396 commit 0981c4246d267eedf90a232d2b7b8e3aab6b642d Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2014-09-15T09:16:37Z fix select null from table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3527] [SQL] Strip the string message
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2392#issuecomment-55568137 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20330/consoleFull) for PR 2392 at commit [`e52024f`](https://github.com/apache/spark/commit/e52024fc1a093d2464d694546757a988c75b629f). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2396#issuecomment-55568780 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20333/consoleFull) for PR 2396 at commit [`0981c42`](https://github.com/apache/spark/commit/0981c4246d267eedf90a232d2b7b8e3aab6b642d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2393#issuecomment-55570548 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20332/consoleFull) for PR 2393 at commit [`4ecc9d4`](https://github.com/apache/spark/commit/4ecc9d49a83082806b9f713ee49565aecf5df764). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2357#issuecomment-55570721 - If you bind a plugin to `install` phase and declare it before `maven-install-plugin`, will it happen to respect the ordering? - This is arguably something that can happen before `install`, in `package` phase? There's a `prepare-package` phase as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2393#discussion_r17533627 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala --- @@ -41,7 +49,27 @@ import org.apache.spark.sql.SQLConf import scala.collection.JavaConversions._ object TestHive - extends TestHiveContext(new SparkContext(local[2], TestSQLContext, new SparkConf())) + extends TestHiveContext(new SparkContext(local[2], TestSQLContext, new SparkConf())) { + + Signal.handle(new Signal(INT), new SignalHandler() { --- End diff -- Yikes, this seems a whole lot more heavy handed than just implementing test lifecycle methods with annotations. Elsewhere in the test framework, temp files are reliably delete by: - Invoking the standard method to get a temp dir - ... which calls `deleteOnExit()` - ... which also cleans up the declared test dir in an annotated cleanup method I would really avoid use of `Signal`! It does not seem required and is inconsistent with other tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: cycle of deleting history log
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2391#issuecomment-55571126 Can you explain this patch? What problem does it solve and why? There is no JIRA here either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2357#issuecomment-55571483 Hi @srowen, like I said. We can run a plugin before maven install - no question about that. But since maven install gets a copy of STATE(somehoe via guice). Altering it in another maven plugin does not help much. So as of now, I do not see a way around not modifying maven install plugin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2357#issuecomment-55572928 I see, I thought you mentioned above that running a plugin before `install` would work. It sounds like there is some internal state of the plugin you need to modify, OK. I don't know, maybe it's useful to elaborate this and see if anyone else can see a workaround. It is worth keeping track of alternatives when evaluating just how far to hack this. For example: Scala 2.11 support could be bound up with Spark 2.x support. Or it can be in a branch that is maintained over a few minor versions, which is not such a big deal if the delta is just flipping 2.10 - 2.11 in many places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
GitHub user ravipesala opened a pull request: https://github.com/apache/spark/pull/2397 [SPARK-2594][SQL] Add CACHE TABLE name AS SELECT ... This feature allows user to add cache table from the select query. Example : ADD CACHE TABLE AS SELECT * FROM TEST_TABLE. Spark takes this type of SQL as command and it does eager caching. It can be executed from SQLContext and HiveContext. Recreated the pull request after rebasing with master.And fixed all the comments raised in previous pull requests. https://github.com/apache/spark/pull/2381 https://github.com/apache/spark/pull/2390 Author : ravipesala ravindra.pes...@huawei.com You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/spark SPARK-2594 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2397.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2397 commit b803fc80efec026784b87c468b2597e5efbb6708 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-11T10:23:45Z Add CACHE TABLE name AS SELECT ... This feature allows user to add cache table from the select query. Example : ADD CACHE TABLE tableName AS SELECT * FROM TEST_TABLE. Spark takes this type of SQL as command and it does eager caching. It can be executed from SQLContext and HiveContext. Signed-off-by: ravipesala ravindra.pes...@huawei.com commit 4e858d83b0020a1701ed65eac7047ee2978329db Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T12:36:49Z Updated parser to support add cache table command commit 13c8e27c33e8934bbd6fb458536675e97c3d8798 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T17:15:10Z Updated parser to support add cache table command commit 7459ce36775126f4c0636585c1d29f30ab35fd06 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T17:39:28Z Added comment commit 6758f808d14ec7a3da0953f7720f7f5b9a4e8a85 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T18:07:25Z Changed style commit eebc0c17f039d5a281aa4fef07d255daca3b8862 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-11T10:23:45Z Add CACHE TABLE name AS SELECT ... This feature allows user to add cache table from the select query. Example : ADD CACHE TABLE tableName AS SELECT * FROM TEST_TABLE. Spark takes this type of SQL as command and it does eager caching. It can be executed from SQLContext and HiveContext. Signed-off-by: ravipesala ravindra.pes...@huawei.com commit b5276b22c8e0c271e98f445079ea2e3cf61db6dc Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T12:36:49Z Updated parser to support add cache table command commit dc3389557d3c14ccbc713a745fcb1a0c97bf8726 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T17:15:10Z Updated parser to support add cache table command commit aaf5b59ea71a9ccdc33a8cda7ee33c3341020c4d Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T17:39:28Z Added comment commit 724b9db63258936bf0d00cda44ca4d4ea4ff2dc5 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-13T18:07:25Z Changed style commit e3265d0773515821b1a908bb94025ac79807e325 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-14T21:46:09Z Updated the code as per the comments by Admin in pull request. commit bc0bffc994857b94831941d3626fdb22edb43c68 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-14T23:30:06Z Merge remote-tracking branch 'ravipesala/Add-Cache-table-as' into Add-Cache-table-as commit d8b37b25cb893bf130d403011425161ae89dd187 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-15T06:02:55Z Updated as per the comments by Admin commit 8c9993cb2786a5c23bdb2328eb46a28823e1f9c6 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-15T06:08:24Z Changed the style commit fb1759bc4f4db17a321041c2167d86d431b0132e Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-15T06:26:54Z Updated as per Admin comments commit 394d5ca28fd39a5785b6eca7f6c476701df31702 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-15T06:30:30Z Changed style commit c18aa3878de86039b09b79a7c0844eafba447462 Author: ravipesala ravindra.pes...@huawei.com Date: 2014-09-15T10:14:14Z Merge remote-tracking branch 'remotes/ravipesala/Add-Cache-table-as' into SPARK-2594 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2381#issuecomment-55574730 As there is a confusion in rebasing, I have created a new pull request https://github.com/apache/spark/pull/2397 rebased with master and also fixed the review comments raised here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2390#issuecomment-55574756 As there is a confusion in rebasing, I have created a new pull request https://github.com/apache/spark/pull/2397 rebased with master and also fixed the review comments raised here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-55574757 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2396#issuecomment-55575850 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20333/consoleFull) for PR 2396 at commit [`0981c42`](https://github.com/apache/spark/commit/0981c4246d267eedf90a232d2b7b8e3aab6b642d). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55576959 I have changed the way to fix the problem here, keeping most of the original logics. @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55577018 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55577211 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20334/consoleFull) for PR 2355 at commit [`0142696`](https://github.com/apache/spark/commit/01426963e85f33147c1074748cab820126c82cc5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2365#issuecomment-55582578 thanks, +1, lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55582984 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20334/consoleFull) for PR 2355 at commit [`0142696`](https://github.com/apache/spark/commit/01426963e85f33147c1074748cab820126c82cc5). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]LDA based on Graphx
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-55586887 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20335/consoleFull) for PR 2388 at commit [`dc7ef13`](https://github.com/apache/spark/commit/dc7ef13c9b5b58cb7b0e12f586432e3140644b10). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added support for accessing secured HDFS
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2320#issuecomment-55588750 yes that is how it works in standalone mode and it will. The master, workers, and all the applications/clients/drivers need to have the same shared secret. It will do authentication before being able to fetch the file that was added. I think this is fine to support as long as we make it very clear exactly what is supported and what is not supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55590580 Sorry for the missings here... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2383#discussion_r17541864 --- Diff: python/pyspark/tests.py --- @@ -586,6 +586,17 @@ def test_repartitionAndSortWithinPartitions(self): self.assertEquals(partitions[0], [(0, 5), (0, 8), (2, 6)]) self.assertEquals(partitions[1], [(1, 3), (3, 8), (3, 8)]) +def test_distinct(self): +rdd = self.sc.parallelize((1, 2, 3)*10).distinct() +self.assertEquals(rdd.count(), 3) + +def test_distinct_numPartitions(self): --- End diff -- can i have a pass? it looks like the python tests could use some attention during the test speed increase effort, but it'd rather wait for a big speedup recommendation before altering these cases. though, if this is important to you, i'll do it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2383#discussion_r17542165 --- Diff: python/pyspark/rdd.py --- @@ -353,7 +353,7 @@ def func(iterator): return ifilter(f, iterator) return self.mapPartitions(func, True) -def distinct(self): +def distinct(self, numPartitions=None): --- End diff -- i can do that. fyi, i ran into some problems initially... ``` from pyspark import sql ssc = sql.SQLContext(sc) rdd = sc.parallelize(['{a: 1}', '{b: 2}', '{c: 3}']*10) srdd = ssc.jsonRDD(rdd) srdd.distinct(10) Traceback (most recent call last): File stdin, line 1, in module File /home/matt/Documents/Repositories/spark/dist/python/pyspark/sql.py, line 1703, in distinct rdd = self._jschema_rdd.distinct(numPartitions) File /home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py, line 538, in __call__ File /home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py, line 304, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o32.distinct. Trace: py4j.Py4JException: Method distinct([class java.lang.Integer]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) at py4j.Gateway.invoke(Gateway.java:252) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2283#issuecomment-55592767 +1 looks good. Thanks @sarutak --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2283 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...
GitHub user BigCrunsh opened a pull request: https://github.com/apache/spark/pull/2398 [SPARK-3396][MLLIB] Use SquaredL2Updater in LogisticRegressionWithSGD SimpleUpdater ignores the regularizer, which leads to an unregularized LogReg. To enable the common L2 regularizer (and the corresponding regularization parameter) for logistic regression the SquaredL2Updater has to be used in SGD (see, e.g., [SVMWithSGD]) You can merge this pull request into a Git repository by running: $ git pull https://github.com/soundcloud/spark fix-regparam-logreg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2398.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2398 commit 0820c04bf26be840d0137b730e497ce4305938b1 Author: Christoph Sawade christ...@sawade.me Date: 2014-09-15T14:00:02Z Use SquaredL2Updater in LogisticRegressionWithSGD SimpleUpdater ignores the regularizer, which leads to an unregularized LogReg. To enable the common L2 regularizer (and the corresponding regularization parameter) for logistic regression the SquaredL2Updater has to be used in SGD (see, e.g., [SVMWithSGD]) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2383#issuecomment-55594346 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20336/consoleFull) for PR 2383 at commit [`6bc4a2c`](https://github.com/apache/spark/commit/6bc4a2c8a184f2c88a2d2d65bf74bb7ead980aab). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2231#issuecomment-55594378 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20337/consoleFull) for PR 2231 at commit [`0820c04`](https://github.com/apache/spark/commit/0820c04bf26be840d0137b730e497ce4305938b1). * This patch **does not** merge cleanly! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...
Github user BigCrunsh closed the pull request at: https://github.com/apache/spark/pull/2231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...
Github user BigCrunsh commented on the pull request: https://github.com/apache/spark/pull/2231#issuecomment-55594501 Changed target to master (https://github.com/apache/spark/pull/2398) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55594998 @chenghao-intel This is not so complex since it is not GenericUDF, but simple UDF with limited types. So we do not need to call those here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2398#issuecomment-55595028 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20338/consoleFull) for PR 2398 at commit [`0820c04`](https://github.com/apache/spark/commit/0820c04bf26be840d0137b730e497ce4305938b1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2355#issuecomment-55595044 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20339/consoleFull) for PR 2355 at commit [`5f25ca5`](https://github.com/apache/spark/commit/5f25ca564b2805f0b50e835ad74863a77d739198). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]LDA based on Graphx
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-55595533 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20335/consoleFull) for PR 2388 at commit [`dc7ef13`](https://github.com/apache/spark/commit/dc7ef13c9b5b58cb7b0e12f586432e3140644b10). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TopicModeling(@transient val tokens: RDD[(TopicModeling.WordId, TopicModeling.DocId)],` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/2396#issuecomment-55595820 This is according to how Hive handle immediate null value in queries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2014#issuecomment-55598514 @andrewor14 @nchammas @pwendell Humble ping on this one, I think it's good to go, and probably helps head off some build questions going forward. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-927] detect numpy at time of use
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2313#issuecomment-55601210 thanks @erikerlandson. @davies @JoshRosen how would you guys like to proceed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17546110 --- Diff: docs/building-spark.md --- @@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating the exact cause for this. The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself. +# Building with SBT +Maven is the official recommendation for packaging Spark, and is the build of reference. +But SBT is supported for day-to-day development since it can provide much faster iterative +compilation. More advanced developers may wish to use SBT. + +The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables +can be set to control the SBT build. For example: + +sbt -Pyarn -Phadoop-2.3 compile --- End diff -- Do we need to add a bit more color here about how to use `sbt`, to match what used to be in the GitHub README? Or is this sufficient? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17546414 --- Diff: docs/building-spark.md --- @@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating the exact cause for this. The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself. +# Building with SBT +Maven is the official recommendation for packaging Spark, and is the build of reference. +But SBT is supported for day-to-day development since it can provide much faster iterative +compilation. More advanced developers may wish to use SBT. + +The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables +can be set to control the SBT build. For example: + +sbt -Pyarn -Phadoop-2.3 compile --- End diff -- I think the goal here is just a taste, assuming the advanced developer will understand and figure out the rest if needed. Happy to make further edits though, like, should we still suggest `./sbt/sbt` instead of a local `sbt`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17547220 --- Diff: docs/building-spark.md --- @@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating the exact cause for this. The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself. +# Building with SBT +Maven is the official recommendation for packaging Spark, and is the build of reference. +But SBT is supported for day-to-day development since it can provide much faster iterative +compilation. More advanced developers may wish to use SBT. + +The SBT build is derived from the Maven POM files, and so the same Maven profiles and variables +can be set to control the SBT build. For example: + +sbt -Pyarn -Phadoop-2.3 compile --- End diff -- Hmm, I don't know enough to make a recommendation; I'll leave that to others. Just wanted to call out the fact that we'd have less info on using `sbt` than before. Maybe that's a good thing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2383#issuecomment-55604353 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20336/consoleFull) for PR 2383 at commit [`6bc4a2c`](https://github.com/apache/spark/commit/6bc4a2c8a184f2c88a2d2d65bf74bb7ead980aab). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-911] allow efficient queries for a rang...
Github user aaronjosephs commented on the pull request: https://github.com/apache/spark/pull/1381#issuecomment-55604431 @JoshRosen this isn't necessarily specified on the ticket but it's related. Since most of the time something will be range partitioned because you called sortByKey on it this could actually be even more efficient (if cached) on smaller data sets if you glommed the partition and did a binary search on the array. I'm not sure if the glomming overhead would outweigh the benefits of the binary search, I'd like to know if you have any opinions on this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2398#issuecomment-55604710 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20338/consoleFull) for PR 2398 at commit [`0820c04`](https://github.com/apache/spark/commit/0820c04bf26be840d0137b730e497ce4305938b1). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org