date:20140915

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1983#issuecomment-5274
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20321/consoleFull)
 for   PR 1983 at commit 
[`c22e8c2`](https://github.com/apache/spark/commit/c22e8c272bea24e670cf92d2eee5d9aa40f2891b).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Document(docId: Int, content: Array[Int]) `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3377] [Metrics] Metrics can be accident...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2250#issuecomment-5663
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20322/consoleFull)
 for   PR 2250 at commit 
[`ead8966`](https://github.com/apache/spark/commit/ead8966e4bed34243cda135cb5dd1b5ec5c8c332).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-6122
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20327/consoleFull)
 for   PR 2358 at commit 
[`3dbf037`](https://github.com/apache/spark/commit/3dbf037c69548fac099b75f9e34a1fbd5076a572).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-6169
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20327/consoleFull)
 for   PR 2358 at commit 
[`3dbf037`](https://github.com/apache/spark/commit/3dbf037c69548fac099b75f9e34a1fbd5076a572).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NonASCIICharacterChecker extends ScalariformChecker `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add a Community Projects page

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2219#issuecomment-6421
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20325/consoleFull)
 for   PR 2219 at commit 
[`7316822`](https://github.com/apache/spark/commit/7316822935dfd9bd8d9e432e1582f5470da10c32).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaSparkContext(val sc: SparkContext)`
  * `class TaskCompletionListenerException(errorMessages: Seq[String]) 
extends Exception `
  * `class RatingDeserializer(FramedSerializer):`
  * `  class Encoder[T : NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T : NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T : NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder extends compression.Encoder[IntegerType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[IntegerType.type])`
  * `  class Encoder extends compression.Encoder[LongType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[LongType.type])`
  * `class JavaStreamingContext(val ssc: StreamingContext) extends 
Closeable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-6582
  
Hey @pwendell, I can remove the commit once you confirm it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1977#issuecomment-6598
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20328/consoleFull)
 for   PR 1977 at commit 
[`4d4bc86`](https://github.com/apache/spark/commit/4d4bc8671a4ef7e9d2d9924681bed1f8e4695a20).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3501] [SQL] Fix the bug of Hive SimpleU...

2014-09-15 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2368#issuecomment-6637
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2357#issuecomment-6738
  
you are right about that, but we are forking it because we want an install 
plugin modified. If it was possible to run a plugin just before install and 
install plugin magically does the job correctly. That would have been nicer. 
But that isn't possible unless I try to hack things using reflection. Its like 
maven keeps its copy of objects before letting plugins use it.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3501] [SQL] Fix the bug of Hive SimpleU...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2368#issuecomment-6842
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20329/consoleFull)
 for   PR 2368 at commit 
[`b804abd`](https://github.com/apache/spark/commit/b804abd5be4161531db38193be310cf628674cec).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] Expose JSON representation of dat...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2333#issuecomment-7935
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20326/consoleFull)
 for   PR 2333 at commit 
[`d41b3ca`](https://github.com/apache/spark/commit/d41b3caf1adb0c807aa6ce9d011e5e2553408fe2).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaSparkContext(val sc: SparkContext)`
  * `  throw new IllegalStateException(The main method in the given 
main class must be static)`
  * `class TaskCompletionListenerException(errorMessages: Seq[String]) 
extends Exception `
  * `class Dummy(object):`
  * `class RatingDeserializer(FramedSerializer):`
  * `  class Encoder[T : NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T : NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder[T : NativeType](columnType: NativeColumnType[T]) 
extends compression.Encoder[T] `
  * `  class Encoder extends compression.Encoder[IntegerType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[IntegerType.type])`
  * `  class Encoder extends compression.Encoder[LongType.type] `
  * `  class Decoder(buffer: ByteBuffer, columnType: 
NativeColumnType[LongType.type])`
  * `class JavaStreamingContext(val ssc: StreamingContext) extends 
Closeable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528496
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -181,11 +182,25 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 val overwrite: Boolean = o.getOrElse() == OVERWRITE
 InsertIntoTable(r, Map[String, Option[String]](), s, overwrite)
 }
+
+  protected lazy val addCache: Parser[LogicalPlan] =
+ADD ~ CACHE ~ TABLE ~ ident ~ AS ~ select ~ opt(;) ^^ {
+ case tableName ~ as ~ s =
+   CacheTableAsSelectCommand(tableName,s)
+}
--- End diff --

I will remove it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528493
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -181,11 +182,25 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 val overwrite: Boolean = o.getOrElse() == OVERWRITE
 InsertIntoTable(r, Map[String, Option[String]](), s, overwrite)
 }
+
+  protected lazy val addCache: Parser[LogicalPlan] =
+ADD ~ CACHE ~ TABLE ~ ident ~ AS ~ select ~ opt(;) ^^ {
+ case tableName ~ as ~ s =
+   CacheTableAsSelectCommand(tableName,s)
+}
 
   protected lazy val cache: Parser[LogicalPlan] =
-(CACHE ^^^ true | UNCACHE ^^^ false) ~ TABLE ~ ident ^^ {
-  case doCache ~ _ ~ tableName = CacheCommand(tableName, doCache)
+CACHE ~ TABLE ~ ident ~ opt(AS) ~ opt(select) ~ opt(;) ^^ {
--- End diff --

 Thank you for your comments. Yes,It is better to add as ``` opt(AS ~ 
select)  ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528508
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -305,6 +305,8 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 Seq(execution.ExplainCommand(logicalPlan, plan.output, 
extended)(context))
   case logical.CacheCommand(tableName, cache) =
 Seq(execution.CacheCommand(tableName, cache)(context))
+  case logical.CacheTableAsSelectCommand(tableName,plan) =
--- End diff --

I will update


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528600
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -119,4 +119,16 @@ class CachedTableSuite extends QueryTest {
 }
 assert(!TestSQLContext.isCached(testData), Table 'testData' should 
not be cached)
   }
+  
+  test(CACHE TABLE tableName AS SELECT Star Table) {
+TestSQLContext.sql(CACHE TABLE testCacheTable AS SELECT * FROM 
testData)
+TestSQLContext.sql(SELECT * FROM testCacheTable WHERE key = 
1).collect()
--- End diff --

I will add


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528595
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,22 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
+sqlContext.cacheTable(tableName)
+// It does the caching eager.
--- End diff --

Either way is OK. But eager caching is existed only for this feature, so it 
may look inconsistent  compared to other caching commands.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528605
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -214,6 +214,7 @@ private[hive] object HiveQl {
*/
   def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new 
ParseDriver).parse(sql))
 
+ 
--- End diff --

Sorry for trouble. I am new to git,so added accidently


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528625
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -1097,7 +1109,7 @@ private[hive] object HiveQl {
 
   case Token(TOK_FUNCTION, Token(functionName, Nil) :: children) =
 HiveGenericUdtf(functionName, attributes, children.map(nodeToExpr))
-
+
--- End diff --

Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528618
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -229,11 +230,17 @@ private[hive] object HiveQl {
 SetCommand(Some(key), Some(value))
 }
   } else if (sql.trim.toLowerCase.startsWith(cache table)) {
-CacheCommand(sql.trim.drop(12).trim, true)
+sql.trim.drop(12).trim.split( ).toSeq match {
+  case Seq(tableName) = 
+CacheCommand(tableName, true)
+  case Seq(tableName,as, select@_*) = 
+CacheTableAsSelectCommand(tableName,
+createPlan(sql.trim.drop(12 + tableName.length() + 
as.length() + 2)))
+}
   } else if (sql.trim.toLowerCase.startsWith(uncache table)) {
 CacheCommand(sql.trim.drop(14).trim, false)
   } else if (sql.trim.toLowerCase.startsWith(add jar)) {
-AddJar(sql.trim.drop(8).trim)
+NativeCommand(sql)
--- End diff --

Sorry for trouble. I am new to git,so added accidently


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2390#discussion_r17528644
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -127,6 +127,7 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
   protected val SUBSTRING = Keyword(SUBSTRING)
   protected val SQRT = Keyword(SQRT)
   protected val ABS = Keyword(ABS)
+  protected val ADD = Keyword(ADD)
--- End diff --

I will remove it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-3468] WebUI Timeline-View feature

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-8998
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20324/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2357#issuecomment-9135
  
I am anyway trying more options to not need to modify maven-install-pluing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17528900
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -181,6 +182,12 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 val overwrite: Boolean = o.getOrElse() == OVERWRITE
 InsertIntoTable(r, Map[String, Option[String]](), s, overwrite)
 }
+
+  protected lazy val addCache: Parser[LogicalPlan] =
+ADD ~ CACHE ~ TABLE ~ ident ~ AS ~ select ~ opt(;) ^^ {
--- End diff --

Thanks for your comments. Sorry for misunderstanding I updated as per the 
syntax ```CACHE TABLE AS SELECT ... ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17528911
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -181,6 +182,12 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 val overwrite: Boolean = o.getOrElse() == OVERWRITE
 InsertIntoTable(r, Map[String, Option[String]](), s, overwrite)
 }
+
+  protected lazy val addCache: Parser[LogicalPlan] =
--- End diff --

Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17528919
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,24 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String,plan: LogicalPlan)(
--- End diff --

Ok.Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17528937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,24 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String,plan: LogicalPlan)(
+@transient context: SQLContext)
--- End diff --

OK. Removed passing ```sqlContext ``` manually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17528957
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -119,4 +119,20 @@ class CachedTableSuite extends QueryTest {
 }
 assert(!TestSQLContext.isCached(testData), Table 'testData' should 
not be cached)
   }
+  
+  test(ADD CACHE TABLE tableName AS SELECT Star Table) {
+TestSQLContext.sql(ADD CACHE TABLE testCacheTable AS SELECT * FROM 
testData)
+TestSQLContext.sql(SELECT * FROM testCacheTable WHERE key = 
1).collect()
+TestSQLContext.uncacheTable(testCacheTable)
+  }
+  
+  test('ADD CACHE TABLE tableName AS SELECT ..') {
+TestSQLContext.sql(ADD CACHE TABLE testCacheTable AS SELECT * FROM 
testData)
+TestSQLContext.table(testCacheTable).queryExecution.executedPlan 
match {
+  case _: InMemoryColumnarTableScan = // Found evidence of caching
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17528949
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,24 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String,plan: LogicalPlan)(
+@transient context: SQLContext)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+context.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
+context.cacheTable(tableName)
+//It does the caching eager.
+//TODO : Does it really require to collect?
--- End diff --

I have added as ```count```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17529002
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -214,6 +214,7 @@ private[hive] object HiveQl {
*/
   def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new 
ParseDriver).parse(sql))
 
+ 
--- End diff --

Sorry for trouble. I am new to git. I have removed new line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1087] Move python traceback utilities i...

2014-09-15 Thread staple

Github user staple commented on a diff in the pull request:

https://github.com/apache/spark/pull/2385#discussion_r17528982
  
--- Diff: python/pyspark/traceback_utils.py ---
@@ -0,0 +1,80 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from collections import namedtuple
+import os
+import traceback
+
+
+__all__ = [extract_concise_traceback, SparkContext]
--- End diff --

Looks like I also need to put JavaStackTrace here instead of SparkContext.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17529035
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -233,7 +234,7 @@ private[hive] object HiveQl {
   } else if (sql.trim.toLowerCase.startsWith(uncache table)) {
 CacheCommand(sql.trim.drop(14).trim, false)
   } else if (sql.trim.toLowerCase.startsWith(add jar)) {
-AddJar(sql.trim.drop(8).trim)
+NativeCommand(sql)
--- End diff --

Sorry for trouble. I am new to git. I have faced some problems in rebase.I 
have reverted it 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2381#discussion_r17529073
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -243,14 +244,12 @@ private[hive] object HiveQl {
   } else if (sql.trim.startsWith(!)) {
 ShellCommand(sql.drop(1))
   } else {
-val tree = getAst(sql)
-if (nativeCommands contains tree.getText) {
-  NativeCommand(sql)
+if (sql.trim.toLowerCase.startsWith(add cache table)) {
+  sql.trim.drop(16).split( ).toSeq match {
+   case Seq(tableName,as, xs@_*) = 
CacheTableAsSelectCommand(tableName,createPlan(sql.trim.drop(16+tableName.length()+as.length()+1)))
 
--- End diff --

Thank you for guiding me. I have run the ```sbt/sbt scalastyle``` and 
updated the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

2014-09-15 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/1847#issuecomment-55560383
  
I will close this PR, since most of work was done in #1846   #1962, and 
native command support for `EXPLAIN` probably not necessary, even Hive doesn't 
support it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2918] [SQL] [WIP] Support the extended ...

2014-09-15 Thread chenghao-intel

Github user chenghao-intel closed the pull request at:

https://github.com/apache/spark/pull/1847


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1977#issuecomment-55560776
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20328/consoleFull)
 for   PR 1977 at commit 
[`4d4bc86`](https://github.com/apache/spark/commit/4d4bc8671a4ef7e9d2d9924681bed1f8e4695a20).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ResultIterable(object):`
  * `class FlattedValuesSerializer(BatchedSerializer):`
  * `class SameKey(object):`
  * `class GroupByKey(object):`
  * `class ExternalGroupBy(ExternalMerger):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread ash211

Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-55560892
  
A flagged character looks like this:

```=
Running Scala style checks
=
Scalastyle checks failed at following occurrences:
error 
file=/home/jenkins/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/SparkContext.scala
 message=non.ascii.character.disallowed.message line=304 column=22
java.lang.RuntimeException: exists error
at scala.sys.package$.error(package.scala:27)
at scala.Predef$.error(Predef.scala:142)
[error] (core/*:scalastyle) exists error```

Seems reasonable to merge with that confirmation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3527] [SQL] Strip the string message

2014-09-15 Thread chenghao-intel

GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/2392

[SPARK-3527] [SQL] Strip the string message



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark trim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2392.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2392


commit e52024fc1a093d2464d694546757a988c75b629f
Author: Cheng Hao hao.ch...@intel.com
Date:   2014-09-15T07:37:56Z

trim the string message




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-55560989
  
Yeah Thanks, @ash211 I will get rid of that commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3527] [SQL] Strip the string message

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2392#issuecomment-55561297
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20330/consoleFull)
 for   PR 2392 at commit 
[`e52024f`](https://github.com/apache/spark/commit/e52024fc1a093d2464d694546757a988c75b629f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-55561642
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20331/consoleFull)
 for   PR 2358 at commit 
[`12a20f2`](https://github.com/apache/spark/commit/12a20f27cf9f1a7a04160add95da4375b123a40d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-15 Thread chenghao-intel

GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/2393

[SPARK-3529] [SQL] Delete the temp files after test exit

There are lots of temporal files created by TestHive under the /tmp by 
default, which may cause potential performance issue for testing. This PR will 
automatically delete them after test exit.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark delete_temp_on_exit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2393


commit 4ecc9d49a83082806b9f713ee49565aecf5df764
Author: Cheng Hao hao.ch...@intel.com
Date:   2014-09-12T01:58:51Z

Delete the temp files after test exit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: allow symlinking to shell scripts

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2386#discussion_r17530408
  
--- Diff: bin/spark-shell ---
@@ -29,7 +29,7 @@ esac
 set -o posix
 
 ## Global script variables
-FWDIR=$(cd `dirname $0`/..; pwd)
+FWDIR=$(cd $(dirname $(readlink -f $0))/..; pwd)
--- End diff --

You may have to quote these - so that dirs with spaces within their name 
work. Like they do at the moment.
Above should look like.
```FWDIR=$(cd $(dirname $(readlink -f $0))/..; pwd)```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2393#issuecomment-55563405
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20332/consoleFull)
 for   PR 2393 at commit 
[`4ecc9d4`](https://github.com/apache/spark/commit/4ecc9d49a83082806b9f713ee49565aecf5df764).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: allow symlinking to shell scripts

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/2386#discussion_r17530450
  
--- Diff: bin/spark-shell ---
@@ -29,7 +29,7 @@ esac
 set -o posix
 
 ## Global script variables
-FWDIR=$(cd `dirname $0`/..; pwd)
+FWDIR=$(cd $(dirname $(readlink -f $0))/..; pwd)
--- End diff --

Ofcourse this applies to all other places as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: make spark-class to work with openjdk

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2387#issuecomment-55563583
  
This is also a duplicate, of #2301. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Updates to shell globbing in run-example and s...

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/449#discussion_r17530603
  
--- Diff: bin/run-example ---
@@ -21,18 +21,25 @@ SCALA_VERSION=2.10
 
 FWDIR=$(cd `dirname $0`/..; pwd)
 export SPARK_HOME=$FWDIR
-EXAMPLES_DIR=$FWDIR/examples
 
-if [ -f $FWDIR/RELEASE ]; then
-  export SPARK_EXAMPLES_JAR=`ls $FWDIR/lib/spark-examples-*hadoop*.jar`
-elif [ -e 
$EXAMPLES_DIR/target/scala-$SCALA_VERSION/spark-examples-*hadoop*.jar ]; then
-  export SPARK_EXAMPLES_JAR=`ls 
$EXAMPLES_DIR/target/scala-$SCALA_VERSION/spark-examples-*hadoop*.jar`
-fi
+. $FWDIR/bin/load-spark-env.sh
+. $FWDIR/bin/sh-funcs.sh
--- End diff --

We generally quote string like above as ``` $FWDIR/bin/sh-funcs.sh``` and 
so on. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3501] [SQL] Fix the bug of Hive SimpleU...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2368#issuecomment-55564638
  
**[Tests timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20329/consoleFull)**
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3507] Adding RegressionLearner

2014-09-15 Thread epahomov

Github user epahomov commented on the pull request:

https://github.com/apache/spark/pull/2371#issuecomment-55565128
  
Closed, because currently there is similar work in Databricks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

2014-09-15 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2382#discussion_r17531198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -98,12 +98,12 @@ case class Star(
   override def withNullability(newNullability: Boolean) = this
   override def withQualifiers(newQualifiers: Seq[String]) = this
 
-  def expand(input: Seq[Attribute]): Seq[NamedExpression] = {
+  def expand(input: Seq[Attribute], resolver: Resolver): 
Seq[NamedExpression] = {
 val expandedAttributes: Seq[Attribute] = table match {
   // If there is no table specified, use all input attributes.
   case None = input
   // If there is a table, pick out attributes that are part of this 
table.
-  case Some(t) = input.filter(_.qualifiers contains t)
+  case Some(t) = 
input.filter(_.qualifiers.filter(resolver(_,t)).nonEmpty)
--- End diff --

Nit: space after `,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-3525] Adding gradient boosting

2014-09-15 Thread epahomov

GitHub user epahomov opened a pull request:

https://github.com/apache/spark/pull/2394

[Spark-3525] Adding gradient boosting



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/epahomov/spark SPARK-3525

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2394


commit d0dfb7b632715c60ef78964ea4d20aaa7712d2e2
Author: olgaoskina olgaosk...@yandex-team.ru
Date:   2014-09-04T06:51:45Z

Added stochastic gradient boosting algorithm

commit 11c247a72e1681661cef4314fec5d1b4283b087f
Author: olgaoskina olgaosk...@yandex-team.ru
Date:   2014-09-04T06:52:05Z

Added stochastic gradient boosting algorithm

commit fdfc88e046a29202058b8f45168d624ed91f6d16
Author: olgaoskina olgaosk...@yandex-team.ru
Date:   2014-09-05T12:25:41Z

Code refactor

commit b91b372c951db8bd1be6bd4d2308bc509bc1b44f
Author: olgaoskina olgaosk...@yandex-team.ru
Date:   2014-09-06T09:02:51Z

Added test 'StochasticGradientBoostingSuite'

commit 223f0907b6accaa0bf08c7948b2e6c1d728dab18
Author: olgaoskina olgaosk...@yandex-team.ru
Date:   2014-09-10T08:08:30Z

Added new test

commit da13706bd8101ec8a2b648ce6ddc9777516e121f
Author: olgaoskina olgaosk...@yandex-team.ru
Date:   2014-09-14T15:33:52Z

Refactor tests

commit eafa0b75785b2ac570ddbc26a80b08b328f7b29c
Author: Egor Pakhomov pahomov.e...@gmail.com
Date:   2014-09-15T07:42:53Z

Merge branch 'gradient_boosting' of https://github.com/olgaoskina/spark 
into olgaoskina-gradient_boosting

commit 3c56f4ef65fb0df80804b0f4b9436f0623582be7
Author: Egor Pakhomov pahomov.e...@gmail.com
Date:   2014-09-15T08:46:43Z

Merge branch 'olgaoskina-gradient_boosting' into SPARK-3525

commit ce1934a329783629a12f615cbeac3d7e1a05a791
Author: Egor Pakhomov pahomov.e...@gmail.com
Date:   2014-09-15T08:32:48Z

[SPARK-3525] Fixing GradientBoostingSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

2014-09-15 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2382#discussion_r17531193
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/package.scala
 ---
@@ -22,4 +22,9 @@ package org.apache.spark.sql.catalyst
  * Analysis consists of translating [[UnresolvedAttribute]]s and 
[[UnresolvedRelation]]s
  * into fully typed objects using information in a schema [[Catalog]].
  */
-package object analysis
+package object analysis {
+  type Resolver = (String, String) = Boolean
--- End diff --

`Resolver` probably a general name, can we use a more precise name for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2358#issuecomment-55565526
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20331/consoleFull)
 for   PR 2358 at commit 
[`12a20f2`](https://github.com/apache/spark/commit/12a20f27cf9f1a7a04160add95da4375b123a40d).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NonASCIICharacterChecker extends ScalariformChecker `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3414][SQL] Replace LowerCaseSchema with...

2014-09-15 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2382#discussion_r17531262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -105,7 +119,9 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] {
   // One match, but we also need to extract the requested nested field.
   case Seq((a, nestedFields)) =
 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), 
nestedFields.last)())
-  case Seq() = None // No matches.
+  case Seq() =
+println(sCould not find $name in ${input.mkString(, )})
--- End diff --

Use `logTrace` instead? As we did in `Analyzer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-3525] Adding gradient boosting

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2394#issuecomment-55565637
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-15 Thread scwf

GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/2395

[Minor]ignore all config files in conf

Some config files in ```conf``` should ignore, such as
conf/fairscheduler.xml
conf/hive-log4j.properties
conf/metrics.properties
...
So ignore all ```sh```/```properties```/```conf```/```xml``` files

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2395


commit 3c2986fbfa5b8f2d1a4573ae678ceffa306f0083
Author: wangfei wangf...@huawei.com
Date:   2014-09-15T08:54:46Z

ignore all config files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2395#issuecomment-55566939
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...

2014-09-15 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/2396

[SPARK-3531][SQL]select null from table would throw a MatchError



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark selectnull

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2396.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2396


commit 0981c4246d267eedf90a232d2b7b8e3aab6b642d
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2014-09-15T09:16:37Z

fix select null from table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3527] [SQL] Strip the string message

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2392#issuecomment-55568137
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20330/consoleFull)
 for   PR 2392 at commit 
[`e52024f`](https://github.com/apache/spark/commit/e52024fc1a093d2464d694546757a988c75b629f).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2396#issuecomment-55568780
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20333/consoleFull)
 for   PR 2396 at commit 
[`0981c42`](https://github.com/apache/spark/commit/0981c4246d267eedf90a232d2b7b8e3aab6b642d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2393#issuecomment-55570548
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20332/consoleFull)
 for   PR 2393 at commit 
[`4ecc9d4`](https://github.com/apache/spark/commit/4ecc9d49a83082806b9f713ee49565aecf5df764).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-15 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2357#issuecomment-55570721
  
- If you bind a plugin to `install` phase and declare it before 
`maven-install-plugin`, will it happen to respect the ordering?
- This is arguably something that can happen before `install`, in `package` 
phase? There's a `prepare-package` phase as well


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3529] [SQL] Delete the temp files after...

2014-09-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2393#discussion_r17533627
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala 
---
@@ -41,7 +49,27 @@ import org.apache.spark.sql.SQLConf
 import scala.collection.JavaConversions._
 
 object TestHive
-  extends TestHiveContext(new SparkContext(local[2], TestSQLContext, 
new SparkConf()))
+  extends TestHiveContext(new SparkContext(local[2], TestSQLContext, 
new SparkConf())) {
+
+  Signal.handle(new Signal(INT), new SignalHandler() {
--- End diff --

Yikes, this seems a whole lot more heavy handed than just implementing test 
lifecycle methods with annotations.  Elsewhere in the test framework, temp 
files are reliably delete by:

- Invoking the standard method to get a temp dir
- ... which calls `deleteOnExit()`
- ... which also cleans up the declared test dir in an annotated cleanup 
method

I would really avoid use of `Signal`! It does not seem required and is 
inconsistent with other tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: cycle of deleting history log

2014-09-15 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2391#issuecomment-55571126
  
Can you explain this patch? What problem does it solve and why? There is no 
JIRA here either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-15 Thread ScrapCodes

Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/2357#issuecomment-55571483
  
Hi @srowen, like I said. We can run a plugin before maven install - no 
question about that. But since maven install gets a copy of STATE(somehoe via 
guice). Altering it in another maven plugin does not help much. So as of now, I 
do not see a way around not modifying maven install plugin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-15 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2357#issuecomment-55572928
  
I see, I thought you mentioned above that running a plugin before `install` 
would work. It sounds like there is some internal state of the plugin you need 
to modify, OK. I don't know, maybe it's useful to elaborate this and see if 
anyone else can see a workaround.

It is worth keeping track of alternatives when evaluating just how far to 
hack this. For example: Scala 2.11 support could be bound up with Spark 2.x 
support. Or it can be in a branch that is maintained over a few minor versions, 
which is not such a big deal if the delta is just flipping 2.10 - 2.11 in 
many places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

GitHub user ravipesala opened a pull request:

https://github.com/apache/spark/pull/2397

[SPARK-2594][SQL] Add CACHE TABLE name AS SELECT ...

This feature allows user to add cache table from the select query.
Example : ADD CACHE TABLE AS SELECT * FROM TEST_TABLE.
Spark takes this type of SQL as command and it does eager caching.
It can be executed from SQLContext and HiveContext.

Recreated the pull request after rebasing with master.And fixed all the 
comments raised in previous pull requests.
https://github.com/apache/spark/pull/2381
https://github.com/apache/spark/pull/2390

Author : ravipesala ravindra.pes...@huawei.com


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/spark SPARK-2594

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2397


commit b803fc80efec026784b87c468b2597e5efbb6708
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-11T10:23:45Z

Add CACHE TABLE name AS SELECT ...

This feature allows user to add cache table from the select query.
Example : ADD CACHE TABLE tableName AS SELECT * FROM TEST_TABLE.
Spark takes this type of SQL as command and it does eager caching.
It can be executed from SQLContext and HiveContext.

Signed-off-by: ravipesala ravindra.pes...@huawei.com

commit 4e858d83b0020a1701ed65eac7047ee2978329db
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T12:36:49Z

Updated parser to support add cache table command

commit 13c8e27c33e8934bbd6fb458536675e97c3d8798
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T17:15:10Z

Updated parser to support add cache table command

commit 7459ce36775126f4c0636585c1d29f30ab35fd06
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T17:39:28Z

Added comment

commit 6758f808d14ec7a3da0953f7720f7f5b9a4e8a85
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T18:07:25Z

Changed style

commit eebc0c17f039d5a281aa4fef07d255daca3b8862
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-11T10:23:45Z

Add CACHE TABLE name AS SELECT ...

This feature allows user to add cache table from the select query.
Example : ADD CACHE TABLE tableName AS SELECT * FROM TEST_TABLE.
Spark takes this type of SQL as command and it does eager caching.
It can be executed from SQLContext and HiveContext.

Signed-off-by: ravipesala ravindra.pes...@huawei.com

commit b5276b22c8e0c271e98f445079ea2e3cf61db6dc
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T12:36:49Z

Updated parser to support add cache table command

commit dc3389557d3c14ccbc713a745fcb1a0c97bf8726
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T17:15:10Z

Updated parser to support add cache table command

commit aaf5b59ea71a9ccdc33a8cda7ee33c3341020c4d
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T17:39:28Z

Added comment

commit 724b9db63258936bf0d00cda44ca4d4ea4ff2dc5
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-13T18:07:25Z

Changed style

commit e3265d0773515821b1a908bb94025ac79807e325
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-14T21:46:09Z

Updated the code as per the comments by Admin in pull request.

commit bc0bffc994857b94831941d3626fdb22edb43c68
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-14T23:30:06Z

Merge remote-tracking branch 'ravipesala/Add-Cache-table-as' into
Add-Cache-table-as

commit d8b37b25cb893bf130d403011425161ae89dd187
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-15T06:02:55Z

Updated as per the comments by Admin

commit 8c9993cb2786a5c23bdb2328eb46a28823e1f9c6
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-15T06:08:24Z

Changed the style

commit fb1759bc4f4db17a321041c2167d86d431b0132e
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-15T06:26:54Z

Updated as per Admin comments

commit 394d5ca28fd39a5785b6eca7f6c476701df31702
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-15T06:30:30Z

Changed style

commit c18aa3878de86039b09b79a7c0844eafba447462
Author: ravipesala ravindra.pes...@huawei.com
Date:   2014-09-15T10:14:14Z

Merge remote-tracking branch 'remotes/ravipesala/Add-Cache-table-as' into 
SPARK-2594




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/2381#issuecomment-55574730
  
As there is a confusion in rebasing, I have created a new pull request 
https://github.com/apache/spark/pull/2397  rebased with master and also fixed 
the review comments raised here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread ravipesala

Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/2390#issuecomment-55574756
  
As there is a confusion in rebasing, I have created a new pull request 
https://github.com/apache/spark/pull/2397  rebased with master and also fixed 
the review comments raised here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2594][SQL] Add CACHE TABLE name AS SE...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-55574757
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2396#issuecomment-55575850
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20333/consoleFull)
 for   PR 2396 at commit 
[`0981c42`](https://github.com/apache/spark/commit/0981c4246d267eedf90a232d2b7b8e3aab6b642d).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55576959
  
I have changed the way to fix the problem here, keeping most of the 
original logics. @marmbrus


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55577018
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55577211
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20334/consoleFull)
 for   PR 2355 at commit 
[`0142696`](https://github.com/apache/spark/commit/01426963e85f33147c1074748cab820126c82cc5).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2951] [PySpark] support unpickle array....

2014-09-15 Thread mattf

Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/2365#issuecomment-55582578
  
thanks, +1, lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55582984
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20334/consoleFull)
 for   PR 2355 at commit 
[`0142696`](https://github.com/apache/spark/commit/01426963e85f33147c1074748cab820126c82cc5).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]LDA based on Graphx

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-55586887
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20335/consoleFull)
 for   PR 2388 at commit 
[`dc7ef13`](https://github.com/apache/spark/commit/dc7ef13c9b5b58cb7b0e12f586432e3140644b10).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added support for accessing secured HDFS

2014-09-15 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2320#issuecomment-55588750
  
yes that is how it works in standalone mode and it will.  The master, 
workers, and all the applications/clients/drivers need to have the same shared 
secret.  It will do authentication before being able to fetch the file that was 
added.

I think this is fine to support as long as we make it very clear exactly 
what is supported and what is not supported.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55590580
  
Sorry for the missings here...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread mattf

Github user mattf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2383#discussion_r17541864
  
--- Diff: python/pyspark/tests.py ---
@@ -586,6 +586,17 @@ def test_repartitionAndSortWithinPartitions(self):
 self.assertEquals(partitions[0], [(0, 5), (0, 8), (2, 6)])
 self.assertEquals(partitions[1], [(1, 3), (3, 8), (3, 8)])
 
+def test_distinct(self):
+rdd = self.sc.parallelize((1, 2, 3)*10).distinct()
+self.assertEquals(rdd.count(), 3)
+
+def test_distinct_numPartitions(self):
--- End diff --

can i have a pass? it looks like the python tests could use some attention 
during the test speed increase effort, but it'd rather wait for a big speedup 
recommendation before altering these cases.

though, if this is important to you, i'll do it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread mattf

Github user mattf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2383#discussion_r17542165
  
--- Diff: python/pyspark/rdd.py ---
@@ -353,7 +353,7 @@ def func(iterator):
 return ifilter(f, iterator)
 return self.mapPartitions(func, True)
 
-def distinct(self):
+def distinct(self, numPartitions=None):
 
--- End diff --

i can do that.

fyi, i ran into some problems initially...

```
from pyspark import sql
ssc = sql.SQLContext(sc)
rdd = sc.parallelize(['{a: 1}', '{b: 2}', '{c: 3}']*10)
srdd = ssc.jsonRDD(rdd)
srdd.distinct(10)
Traceback (most recent call last):
  File stdin, line 1, in module
  File 
/home/matt/Documents/Repositories/spark/dist/python/pyspark/sql.py, line 
1703, in distinct
rdd = self._jschema_rdd.distinct(numPartitions)
  File 
/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
 line 538, in __call__
  File 
/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
 line 304, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o32.distinct. 
Trace:
py4j.Py4JException: Method distinct([class java.lang.Integer]) does not 
exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-15 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2283#issuecomment-55592767
  
+1 looks good. Thanks @sarutak 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3410] The priority of shutdownhook for ...

2014-09-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2283


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...

2014-09-15 Thread BigCrunsh

GitHub user BigCrunsh opened a pull request:

https://github.com/apache/spark/pull/2398

[SPARK-3396][MLLIB] Use SquaredL2Updater in LogisticRegressionWithSGD

SimpleUpdater ignores the regularizer, which leads to an unregularized
LogReg. To enable the common L2 regularizer (and the corresponding
regularization parameter) for logistic regression the SquaredL2Updater
has to be used in SGD (see, e.g., [SVMWithSGD])

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/soundcloud/spark fix-regparam-logreg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2398.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2398


commit 0820c04bf26be840d0137b730e497ce4305938b1
Author: Christoph Sawade christ...@sawade.me
Date:   2014-09-15T14:00:02Z

Use SquaredL2Updater in LogisticRegressionWithSGD

SimpleUpdater ignores the regularizer, which leads to an unregularized
LogReg. To enable the common L2 regularizer (and the corresponding
regularization parameter) for logistic regression the SquaredL2Updater
has to be used in SGD (see, e.g., [SVMWithSGD])




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2383#issuecomment-55594346
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20336/consoleFull)
 for   PR 2383 at commit 
[`6bc4a2c`](https://github.com/apache/spark/commit/6bc4a2c8a184f2c88a2d2d65bf74bb7ead980aab).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2231#issuecomment-55594378
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20337/consoleFull)
 for   PR 2231 at commit 
[`0820c04`](https://github.com/apache/spark/commit/0820c04bf26be840d0137b730e497ce4305938b1).
 * This patch **does not** merge cleanly!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...

2014-09-15 Thread BigCrunsh

Github user BigCrunsh closed the pull request at:

https://github.com/apache/spark/pull/2231


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...

2014-09-15 Thread BigCrunsh

Github user BigCrunsh commented on the pull request:

https://github.com/apache/spark/pull/2231#issuecomment-55594501
  
Changed target to master (https://github.com/apache/spark/pull/2398)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55594998
  
@chenghao-intel This is not so complex since it is not GenericUDF, but 
simple UDF with limited types. So we do not need to call those here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2398#issuecomment-55595028
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20338/consoleFull)
 for   PR 2398 at commit 
[`0820c04`](https://github.com/apache/spark/commit/0820c04bf26be840d0137b730e497ce4305938b1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3485][SQL] should check parameter type ...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2355#issuecomment-55595044
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20339/consoleFull)
 for   PR 2355 at commit 
[`5f25ca5`](https://github.com/apache/spark/commit/5f25ca564b2805f0b50e835ad74863a77d739198).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]LDA based on Graphx

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2388#issuecomment-55595533
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20335/consoleFull)
 for   PR 2388 at commit 
[`dc7ef13`](https://github.com/apache/spark/commit/dc7ef13c9b5b58cb7b0e12f586432e3140644b10).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TopicModeling(@transient val tokens: RDD[(TopicModeling.WordId, 
TopicModeling.DocId)],`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3531][SQL]select null from table would ...

2014-09-15 Thread adrian-wang

Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/2396#issuecomment-55595820
  
This is according to how Hive handle immediate null value in queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/2014#issuecomment-55598514
  
@andrewor14 @nchammas @pwendell Humble ping on this one, I think it's good 
to go, and probably helps head off some build questions going forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-927] detect numpy at time of use

2014-09-15 Thread mattf

Github user mattf commented on the pull request:

https://github.com/apache/spark/pull/2313#issuecomment-55601210
  
thanks @erikerlandson. @davies @JoshRosen how would you guys like to 
proceed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread nchammas

Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2014#discussion_r17546110
  
--- Diff: docs/building-spark.md ---
@@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating 
the exact cause for this.
 
 The assembly jar produced by `mvn package` will, by default, include all 
of Spark's dependencies, including Hadoop and some of its ecosystem projects. 
On YARN deployments, this causes multiple versions of these to appear on 
executor classpaths: the version packaged in the Spark assembly and the version 
on each node, included with yarn.application.classpath.  The `hadoop-provided` 
profile builds the assembly without including Hadoop-ecosystem projects, like 
ZooKeeper and Hadoop itself. 
 
+# Building with SBT
 
+Maven is the official recommendation for packaging Spark, and is the 
build of reference.
+But SBT is supported for day-to-day development since it can provide much 
faster iterative
+compilation. More advanced developers may wish to use SBT.
+
+The SBT build is derived from the Maven POM files, and so the same Maven 
profiles and variables
+can be set to control the SBT build. For example:
+
+sbt -Pyarn -Phadoop-2.3 compile
--- End diff --

Do we need to add a bit more color here about how to use `sbt`, to match 
what used to be in the GitHub README? Or is this sufficient?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2014#discussion_r17546414
  
--- Diff: docs/building-spark.md ---
@@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating 
the exact cause for this.
 
 The assembly jar produced by `mvn package` will, by default, include all 
of Spark's dependencies, including Hadoop and some of its ecosystem projects. 
On YARN deployments, this causes multiple versions of these to appear on 
executor classpaths: the version packaged in the Spark assembly and the version 
on each node, included with yarn.application.classpath.  The `hadoop-provided` 
profile builds the assembly without including Hadoop-ecosystem projects, like 
ZooKeeper and Hadoop itself. 
 
+# Building with SBT
 
+Maven is the official recommendation for packaging Spark, and is the 
build of reference.
+But SBT is supported for day-to-day development since it can provide much 
faster iterative
+compilation. More advanced developers may wish to use SBT.
+
+The SBT build is derived from the Maven POM files, and so the same Maven 
profiles and variables
+can be set to control the SBT build. For example:
+
+sbt -Pyarn -Phadoop-2.3 compile
--- End diff --

I think the goal here is just a taste, assuming the advanced developer will 
understand and figure out the rest if needed. Happy to make further edits 
though, like, should we still suggest `./sbt/sbt` instead of a local `sbt`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread nchammas

Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2014#discussion_r17547220
  
--- Diff: docs/building-spark.md ---
@@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating 
the exact cause for this.
 
 The assembly jar produced by `mvn package` will, by default, include all 
of Spark's dependencies, including Hadoop and some of its ecosystem projects. 
On YARN deployments, this causes multiple versions of these to appear on 
executor classpaths: the version packaged in the Spark assembly and the version 
on each node, included with yarn.application.classpath.  The `hadoop-provided` 
profile builds the assembly without including Hadoop-ecosystem projects, like 
ZooKeeper and Hadoop itself. 
 
+# Building with SBT
 
+Maven is the official recommendation for packaging Spark, and is the 
build of reference.
+But SBT is supported for day-to-day development since it can provide much 
faster iterative
+compilation. More advanced developers may wish to use SBT.
+
+The SBT build is derived from the Maven POM files, and so the same Maven 
profiles and variables
+can be set to control the SBT build. For example:
+
+sbt -Pyarn -Phadoop-2.3 compile
--- End diff --

Hmm, I don't know enough to make a recommendation; I'll leave that to 
others. Just wanted to call out the fact that we'd have less info on using 
`sbt` than before. Maybe that's a good thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3519] add distinct(n) to PySpark

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2383#issuecomment-55604353
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20336/consoleFull)
 for   PR 2383 at commit 
[`6bc4a2c`](https://github.com/apache/spark/commit/6bc4a2c8a184f2c88a2d2d65bf74bb7ead980aab).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-911] allow efficient queries for a rang...

2014-09-15 Thread aaronjosephs

Github user aaronjosephs commented on the pull request:

https://github.com/apache/spark/pull/1381#issuecomment-55604431
  
@JoshRosen this isn't necessarily specified on the ticket but it's related. 
Since most of the time something will be range partitioned because you called 
sortByKey on it this could actually be even more efficient (if cached) on 
smaller data sets if you glommed the partition and did a binary search on the 
array. I'm not sure if the glomming overhead would outweigh the benefits of the 
binary search, I'd like to know if you have any opinions on this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3396][MLLIB] Use SquaredL2Updater in Lo...

2014-09-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2398#issuecomment-55604710
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20338/consoleFull)
 for   PR 2398 at commit 
[`0820c04`](https://github.com/apache/spark/commit/0820c04bf26be840d0137b730e497ce4305938b1).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 345 matches

Mail list logo