[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17865
  
Also cc @ueshin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18268: [SPARK-21054] [SQL] Reset Command support reset specific...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18268
  
**[Test build #77918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77918/testReport)**
 for PR 18268 at commit 
[`8745848`](https://github.com/apache/spark/commit/874584800aabd07dfd16c8c9632bb070cc7a8210).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16422
  
**[Test build #77919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77919/testReport)**
 for PR 16422 at commit 
[`0d3c7bf`](https://github.com/apache/spark/commit/0d3c7bf094a3e89f38b3abae0cf530e9c634594a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18268#discussion_r121320104
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala 
---
@@ -149,13 +149,30 @@ object SetCommand {
 /**
  * This command is for resetting SQLConf to the default values. Command 
that runs
  * {{{
+ *   reset key;
  *   reset;
  * }}}
  */
-case object ResetCommand extends RunnableCommand with Logging {
+case class ResetCommand(key: Option[String]) extends RunnableCommand with 
Logging {
+
+  private val runFunc: (SparkSession => Unit) = key match {
+
+case None =>
+  val runFunc = (sparkSession: SparkSession) => {
+sparkSession.sessionState.conf.clear()
+  }
+  runFunc
+
+// (In Hive, "RESET key" clear a specific property.)
+case Some(key) =>
+  val runFunc = (sparkSession: SparkSession) => {
+sparkSession.conf.unset(key)
+  }
+  runFunc
+  }
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
-sparkSession.sessionState.conf.clear()
+runFunc(sparkSession)
--- End diff --

Nit: Just use a few lines to implement logics here. No need to add the 
extra function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18268#discussion_r121319888
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala 
---
@@ -149,13 +149,30 @@ object SetCommand {
 /**
  * This command is for resetting SQLConf to the default values. Command 
that runs
  * {{{
+ *   reset key;
  *   reset;
  * }}}
  */
-case object ResetCommand extends RunnableCommand with Logging {
+case class ResetCommand(key: Option[String]) extends RunnableCommand with 
Logging {
+
+  private val runFunc: (SparkSession => Unit) = key match {
+
--- End diff --

Nit: remove this space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18268#discussion_r121319856
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -86,7 +86,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
*/
   override def visitResetConfiguration(
   ctx: ResetConfigurationContext): LogicalPlan = withOrigin(ctx) {
-ResetCommand
+val raw = remainder(ctx.RESET.getSymbol)
+if (raw.nonEmpty) {
+  ResetCommand(Some(raw.trim))
+} else {
+  ResetCommand(None)
+}
--- End diff --

You can use `map` to shorten it to a single line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18268: [SPARK-21054] [SQL] Reset Command support reset specific...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18268
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18268#discussion_r121319194
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala 
---
@@ -149,13 +149,30 @@ object SetCommand {
 /**
  * This command is for resetting SQLConf to the default values. Command 
that runs
  * {{{
+ *   reset key;
  *   reset;
  * }}}
  */
-case object ResetCommand extends RunnableCommand with Logging {
+case class ResetCommand(key: Option[String]) extends RunnableCommand with 
Logging {
+
+  private val runFunc: (SparkSession => Unit) = key match {
+
+case None =>
+  val runFunc = (sparkSession: SparkSession) => {
+sparkSession.sessionState.conf.clear()
+  }
+  runFunc
+
+// (In Hive, "RESET key" clear a specific property.)
--- End diff --

No need to mention Hive here. Just need to explain the semantics. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18238
  
**[Test build #77917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77917/testReport)**
 for PR 18238 at commit 
[`04307c6`](https://github.com/apache/spark/commit/04307c611811d2cc207793488f004d74eb2c6b25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18238
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18238: [SPARK-21016][core]Improve code fault tolerance for conv...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18238
  
Although it is not a common user error, it does not hurt to add an extra 
`trim`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77912/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18260
  
**[Test build #77912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77912/testReport)**
 for PR 18260 at commit 
[`e6e60e0`](https://github.com/apache/spark/commit/e6e60e0905dbc8693d840bf2c5e901488a97).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18258#discussion_r121317516
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -313,7 +316,7 @@ case class HashAggregateExec(
   TaskContext.get().taskMemoryManager(),
   1024 * 16, // initial capacity
   TaskContext.get().taskMemoryManager().pageSizeBytes,
-  false // disable tracking of performance metrics
+  true // tracking of performance metrics
--- End diff --

Yeah, based on the benchmark, seems the performance degradation is not an 
issue. We can completely remove this parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18272: [DOCS] Fix error: ambiguous reference to overloaded defi...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18272
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18272: [DOCS] Fix error: ambiguous reference to overload...

2017-06-11 Thread ZiyueHuang
GitHub user ZiyueHuang opened a pull request:

https://github.com/apache/spark/pull/18272

[DOCS] Fix error: ambiguous reference to overloaded definition

## What changes were proposed in this pull request?

`df.groupBy.count()` should be `df.groupBy().count()` , otherwise there is 
an error : 

ambiguous reference to overloaded definition, both method groupBy in class 
Dataset of type (col1: String, cols: String*) and method groupBy in class 
Dataset of type (cols: org.apache.spark.sql.Column*)

## How was this patch tested?

```scala
val df = spark.readStream.schema(...).json(...)
val dfCounts =df.groupBy().count()
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ZiyueHuang/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18272.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18272


commit 67dd3c75bce72bd320124b3b1a7585af40d18fb2
Author: Ziyue Huang 
Date:   2017-06-12T06:10:09Z

Fix error: ambiguous reference to overloaded definition, both method 
groupBy in class Dataset of type (col1: String, cols: String*) and (cols: 
org.apache.spark.sql.Column*)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18258#discussion_r121317053
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -313,7 +316,7 @@ case class HashAggregateExec(
   TaskContext.get().taskMemoryManager(),
   1024 * 16, // initial capacity
   TaskContext.get().taskMemoryManager().pageSizeBytes,
-  false // disable tracking of performance metrics
+  true // tracking of performance metrics
--- End diff --

Always turn it on? 

If we decide to always turn it on, why we still keep this parm?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18265
  
**[Test build #77916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77916/testReport)**
 for PR 18265 at commit 
[`6bcf66f`](https://github.com/apache/spark/commit/6bcf66f58c6333c1d0d965596ad59b49d8e9f28a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18265: [SPARK-21050][ML] Word2vec persistence overflow bug fix

2017-06-11 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/18265
  
Yep, someone hit the bug!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18269: [SPARK-21056][SQL] Use at most one spark job to l...

2017-06-11 Thread bbossy
Github user bbossy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18269#discussion_r121316195
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
 ---
@@ -248,60 +245,93 @@ object InMemoryFileIndex extends Logging {
* @return all children of path that match the specified filter.
*/
   private def listLeafFiles(
-  path: Path,
+  paths: Seq[Path],
   hadoopConf: Configuration,
   filter: PathFilter,
-  sessionOpt: Option[SparkSession]): Seq[FileStatus] = {
-logTrace(s"Listing $path")
-val fs = path.getFileSystem(hadoopConf)
+  sessionOpt: Option[SparkSession]): Seq[(Path, Seq[FileStatus])] = {
+logTrace(s"Listing ${paths.mkString(", ")}")
 
 // [SPARK-17599] Prevent InMemoryFileIndex from failing if path 
doesn't exist
 // Note that statuses only include FileStatus for the files and dirs 
directly under path,
 // and does not include anything else recursively.
-val statuses = try fs.listStatus(path) catch {
-  case _: FileNotFoundException =>
-logWarning(s"The directory $path was not found. Was it deleted 
very recently?")
-Array.empty[FileStatus]
+val statuses = paths.flatMap { path =>
+  try {
+val fs = path.getFileSystem(hadoopConf)
--- End diff --

Thanks! Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18258#discussion_r121316076
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -103,13 +103,43 @@ object SQLMetrics {
   }
 
   /**
+   * Create a metric to report the average information (including min, 
med, max) like
+   * avg hashmap probe. Because `SQLMetric` stores long values, we take 
the ceil of the average
+   * values before storing them. This metric is used to record an average 
value computed in the
+   * end of a task. It should be set once. The initial values (zeros) of 
this metrics will be
+   * excluded after.
+   */
+  def createAverageMetric(sc: SparkContext, name: String): SQLMetric = {
+// The final result of this metric in physical operator UI may looks 
like:
+// probe avg (min, med, max):
+// (1, 6, 2)
--- End diff --

oh. right. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18258: [SPARK-21051][SQL] Add hash map metrics to aggreg...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18258#discussion_r121315825
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -103,13 +103,43 @@ object SQLMetrics {
   }
 
   /**
+   * Create a metric to report the average information (including min, 
med, max) like
+   * avg hashmap probe. Because `SQLMetric` stores long values, we take 
the ceil of the average
+   * values before storing them. This metric is used to record an average 
value computed in the
+   * end of a task. It should be set once. The initial values (zeros) of 
this metrics will be
+   * excluded after.
+   */
+  def createAverageMetric(sc: SparkContext, name: String): SQLMetric = {
+// The final result of this metric in physical operator UI may looks 
like:
+// probe avg (min, med, max):
+// (1, 6, 2)
--- End diff --

med is medium? why 6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18265#discussion_r121315755
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 assert(math.abs(similarity(5) - similarityLarger(5) / similarity(5)) > 
1E-5)
   }
 
+  test("Word2Vec read/write numPartitions calculation") {
--- End diff --

Good point; I'll do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18265#discussion_r121315677
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/feature/Word2VecSuite.scala ---
@@ -188,6 +188,15 @@ class Word2VecSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defaul
 assert(math.abs(similarity(5) - similarityLarger(5) / similarity(5)) > 
1E-5)
   }
 
+  test("Word2Vec read/write numPartitions calculation") {
+val tinyModelNumPartitions = 
Word2VecModel.Word2VecModelWriter.calculateNumberOfPartitions(
+  sc, numWords = 10, vectorSize = 5)
+assert(tinyModelNumPartitions === 1)
+val mediumModelNumPartitions = 
Word2VecModel.Word2VecModelWriter.calculateNumberOfPartitions(
+  sc, numWords = 100, vectorSize = 5000)
+assert(mediumModelNumPartitions > 1)
--- End diff --

The "medium" one did cause an overflow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18265: [SPARK-21050][ML] Word2vec persistence overflow b...

2017-06-11 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18265#discussion_r121315648
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala 
---
@@ -355,9 +364,12 @@ object Word2VecModel extends MLReadable[Word2VecModel] 
{
   // Calculate the approximate size of the model.
   // Assuming an average word size of 15 bytes, the formula is:
   // (floatSize * vectorSize + 15) * numWords
-  val numWords = instance.wordVectors.wordIndex.size
-  val approximateSizeInBytes = (floatSize * instance.getVectorSize + 
averageWordSize) * numWords
-  ((approximateSizeInBytes / bufferSizeInBytes) + 1).toInt
+  val approximateSizeInBytes = (floatSize * vectorSize + 
averageWordSize) * numWords
+  val numPartitions = (approximateSizeInBytes / bufferSizeInBytes) + 1
+  require(numPartitions < 10e8, s"Word2VecModel calculated that it 
needs $numPartitions " +
--- End diff --

I'm pretty sure it is necessary.  If we cap it at Int.MAX and the user hits 
that cap, then it means that we'll fail when trying to write the partitions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18257
  
**[Test build #77915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77915/testReport)**
 for PR 18257 at commit 
[`46f60f0`](https://github.com/apache/spark/commit/46f60f0fd9981b52de5b4c719ce51d0de9a97805).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18236: [SPARK-21015] Check field name is not null and empty in ...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18236
  
How about just updating the comment of this function to explain the 
behavior we have now? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-11 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17758
  
ok, I'll also check again. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18266
  
**[Test build #77908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77908/testReport)**
 for PR 18266 at commit 
[`0444c4d`](https://github.com/apache/spark/commit/0444c4d25d5943408a3ea11f84395dd38246e2f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77908/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17645: [SPARK-20348] [ML] Support squared hinge loss (L2 loss) ...

2017-06-11 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17645
  
OK. I'll close it for now and try to merge it with 
https://github.com/apache/spark/pull/17862.
Thanks for the comment from @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18266
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17645: [SPARK-20348] [ML] Support squared hinge loss (L2...

2017-06-11 Thread hhbyyh
Github user hhbyyh closed the pull request at:

https://github.com/apache/spark/pull/17645


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18257: [SPARK-21041][SQL] SparkSession.range should be c...

2017-06-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18257#discussion_r121313742
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -191,6 +191,17 @@ class DataFrameRangeSuite extends QueryTest with 
SharedSQLContext with Eventuall
   checkAnswer(sql("SELECT * FROM range(3)"), Row(0) :: Row(1) :: 
Row(2) :: Nil)
 }
   }
+
+  test("SPARK-21041 SparkSession.range()'s behavior is inconsistent with 
SparkContext.range()") {
+val start = java.lang.Long.MAX_VALUE - 3
+val end = java.lang.Long.MIN_VALUE + 2
+Seq("false", "true").foreach { value =>
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> value) {
+assert(spark.sparkContext.range(start, end, 1).collect.length == 0)
+assert(spark.range(start, end, 1).collect.length == 0)
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r121313654
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Metadata.scala ---
@@ -273,6 +273,9 @@ class MetadataBuilder {
   /** Puts a [[Metadata]] array. */
   def putMetadataArray(key: String, value: Array[Metadata]): this.type = 
put(key, value)
 
+  /** Puts a name. */
+  def putName(name: String): this.type = put("name", name)
--- End diff --

This interface change is not desired. See the PR 
https://github.com/apache/spark/pull/16209

You can further enhance our parser by supporting the data types that are 
not natively supported by Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17251
  
**[Test build #77914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77914/testReport)**
 for PR 17251 at commit 
[`08596f5`](https://github.com/apache/spark/commit/08596f54e62b26c4411207912121d6a14bdb0133).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17251: [SPARK-19910][SQL] `stack` should not reject NULL values...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17251
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-06-11 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17758
  
Hi @maropu , I just did some simple search, and found many other places 
also related to duplicate columns. e.g. `InsertIntoHadoopFsRelationCommand`, 
`PartitioningUtils.normalizePartitionSpec`, `SessionCatalog.alterTableSchema`. 
Can you do a more comprehensive search to find if there are other places 
missed? Let's make all of them consistent as best as we can.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-06-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17583#discussion_r121312651
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/FuncTransformer.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
ObjectInputStream, ObjectOutputStream}
+
+import scala.reflect.runtime.universe.{typeOf, TypeTag}
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{DeveloperApi, Since}
+import org.apache.spark.ml.FuncTransformer.FuncTransformerWriter
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.types.DataType
+
+/**
+ * :: DeveloperApi ::
+ * A wrapper to allow easily creation of simple data manipulation for 
DataFrame.
+ * Note that FuncTransformer supports serialization via scala 
ObjectOutputStream and may not
+ * guarantee save/load compatibility between different scala version.
+ */
+@DeveloperApi
+@Since("2.3.0")
+class FuncTransformer [IN, OUT: TypeTag] @Since("2.3.0") (
+@Since("2.3.0") override val uid: String,
+@Since("2.3.0") val func: IN => OUT,
+@Since("2.3.0") val outputDataType: DataType
+  ) extends UnaryTransformer[IN, OUT, FuncTransformer[IN, OUT]] with 
DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this(fx: IN => OUT, outputDataType: DataType) =
+this(Identifiable.randomUID("FuncTransformer"), fx, outputDataType)
+
+  @Since("2.3.0")
+  def this(fx: IN => OUT) =
+this(Identifiable.randomUID("FuncTransformer"), fx,
+  
CatalystSqlParser.parseDataType(typeOf[OUT].typeSymbol.name.decodedName.toString))
--- End diff --

Thanks Nick, updated with the exception message and we use the same type 
infer code as in createDataFrame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to Ma...

2017-06-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18082


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17583: [SPARK-20271]Add FuncTransformer to simplify cust...

2017-06-11 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/17583#discussion_r121312459
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/FuncTransformer.scala ---
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import java.io.{ByteArrayInputStream, ByteArrayOutputStream, 
ObjectInputStream, ObjectOutputStream}
+
+import scala.reflect.runtime.universe.TypeTag
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{DeveloperApi, Since}
+import org.apache.spark.ml.UnaryTransformer
+import org.apache.spark.ml.feature.FuncTransformer.FuncTransformerWriter
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.ScalaReflection
+import org.apache.spark.sql.types.DataType
+
+/**
+ * :: DeveloperApi ::
+ * FuncTransformer allows easily creation of a custom feature transformer 
for DataFrame, such like
+ * conditional conversion(if...else...), type conversion, array indexing 
and many string ops.
+ * Note that FuncTransformer supports serialization via scala 
ObjectOutputStream and may not
+ * guarantee save/load compatibility between different scala version.
+ */
+@DeveloperApi
+@Since("2.3.0")
+class FuncTransformer [IN: TypeTag, OUT: TypeTag] @Since("2.3.0") (
+@Since("2.3.0") override val uid: String,
+@Since("2.3.0") val func: IN => OUT,
+@Since("2.3.0") val outputDataType: DataType
+  ) extends UnaryTransformer[IN, OUT, FuncTransformer[IN, OUT]] with 
DefaultParamsWritable {
+
+  /**
+   * Create a FuncTransformer with specific function and output data type.
+   * @param fx function which converts an input object to output object.
+   * @param outputDataType specific output data type
+   */
+  @Since("2.3.0")
+  def this(fx: IN => OUT, outputDataType: DataType) =
+this(Identifiable.randomUID("FuncTransformer"), fx, outputDataType)
+
+  /**
+   * Create a FuncTransformer with specific function and automatically 
infer the output data type.
+   * If the output data type cannot be automatically inferred, an 
exception will be thrown.
+   * @param fx function which converts an input object to output object.
+   */
+  @Since("2.3.0")
+  def this(fx: IN => OUT) = 
this(Identifiable.randomUID("FuncTransformer"), fx,
+try {
+  ScalaReflection.schemaFor[OUT].dataType
+} catch {
+  case _: UnsupportedOperationException => throw new 
UnsupportedOperationException(
+s"FuncTransformer outputDataType cannot be automatically inferred, 
please try" +
+  s" the constructor with specific outputDataType")
+}
+   )
+
+  setDefault(inputCol -> "input", outputCol -> "output")
+
+  @Since("2.3.0")
+  override def createTransformFunc: IN => OUT = func
+
+  @Since("2.3.0")
+  override def write: MLWriter = new FuncTransformerWriter(
+this.asInstanceOf[FuncTransformer[Nothing, Nothing]])
+
+  @Since("2.3.0")
+  override def copy(extra: ParamMap): FuncTransformer[IN, OUT] = {
+copyValues(new FuncTransformer(uid, func, outputDataType), extra)
+  }
+
+  override protected def validateInputType(inputType: DataType): Unit = {
+try {
+  val funcINType = ScalaReflection.schemaFor[IN].dataType
+  require(inputType.equals(funcINType),
+s"$uid only accept input type $funcINType but got $inputType.")
+} catch {
+  case _: UnsupportedOperationException =>
+// cannot infer the output data type, log warning but do not block 
transform
+logWarning(s"FuncTransformer input Type cannot be automatically 
inferred," +
+  s"Type check omitted for $uid")
+}
+  }
+}
+
+/**
+ * :: DeveloperApi ::
+ * Companion object for FuncTransformer with save a

[GitHub] spark pull request #18257: [SPARK-21041][SQL] SparkSession.range should be c...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18257#discussion_r121312457
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -191,6 +191,17 @@ class DataFrameRangeSuite extends QueryTest with 
SharedSQLContext with Eventuall
   checkAnswer(sql("SELECT * FROM range(3)"), Row(0) :: Row(1) :: 
Row(2) :: Nil)
 }
   }
+
+  test("SPARK-21041 SparkSession.range()'s behavior is inconsistent with 
SparkContext.range()") {
+val start = java.lang.Long.MAX_VALUE - 3
+val end = java.lang.Long.MIN_VALUE + 2
+Seq("false", "true").foreach { value =>
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> value) {
+assert(spark.sparkContext.range(start, end, 1).collect.length == 0)
+assert(spark.range(start, end, 1).collect.length == 0)
--- End diff --

Shall we also test the case `start == end`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18230
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18230
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77910/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...

2017-06-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18082
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18230
  
**[Test build #77910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77910/testReport)**
 for PR 18230 at commit 
[`2c50858`](https://github.com/apache/spark/commit/2c50858be9fd422d32d5648b651a4b26ba3f8728).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18260
  
LGTM except for two comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18270
  
**[Test build #77913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77913/testReport)**
 for PR 18270 at commit 
[`f532d9f`](https://github.com/apache/spark/commit/f532d9ff2b6bd8722ce215ed1c372bd991193a0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121311872
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -518,19 +519,13 @@ private void throwUnsupportedException(int 
requiredCapacity, Throwable cause) {
   public abstract double getDouble(int rowId);
 
   /**
-   * Puts a byte array that already exists in this column.
-   */
-  public abstract void putArray(int rowId, int offset, int length);
-
-  /**
-   * Returns the length of the array at rowid.
+   * After writing array elements to the child column vector, call this 
method to set the offset and
+   * size of the written array.
*/
-  public abstract int getArrayLength(int rowId);
-
-  /**
-   * Returns the offset of the array at rowid.
-   */
-  public abstract int getArrayOffset(int rowId);
+  public void putArrayOffsetAndSize(int rowId, int offset, int size) {
+long offsetAndSize = (offset << 32) | size;
--- End diff --

`offset` should be converted to `long` before shifting?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18257
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18257
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77906/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18257: [SPARK-21041][SQL] SparkSession.range should be consiste...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18257
  
**[Test build #77906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77906/testReport)**
 for PR 18257 at commit 
[`89dd7ad`](https://github.com/apache/spark/commit/89dd7ada850c1fb02fba32bc955f2de3a7ae3679).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121311563
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -43,14 +43,12 @@
   private byte[] byteData;
   private short[] shortData;
   private int[] intData;
+  // This is not only used to store data for int column vector, but also 
can store offsets and
--- End diff --

int column vector -> long column vector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18271
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18271
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77911/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18271
  
**[Test build #77911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77911/testReport)**
 for PR 18271 at commit 
[`d715ae8`](https://github.com/apache/spark/commit/d715ae89fba24bb56a2d2ca7fd0e0c1d438851af).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18260
  
@viirya @kiszk good catch! fixed by using long to store offset and size.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18260
  
**[Test build #77912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77912/testReport)**
 for PR 18260 at commit 
[`e6e60e0`](https://github.com/apache/spark/commit/e6e60e0905dbc8693d840bf2c5e901488a97).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve Running R Tests docs

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18271
  
**[Test build #77911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77911/testReport)**
 for PR 18271 at commit 
[`d715ae8`](https://github.com/apache/spark/commit/d715ae89fba24bb56a2d2ca7fd0e0c1d438851af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18128
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77909/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18128
  
**[Test build #77909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77909/testReport)**
 for PR 18128 at commit 
[`c3190b5`](https://github.com/apache/spark/commit/c3190b5b4701afeceec17bbaa7c4ef6f0239b2c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77905/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18260
  
**[Test build #77905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77905/testReport)**
 for PR 18260 at commit 
[`1dae660`](https://github.com/apache/spark/commit/1dae6604c0613a0b9e2a2a0dbc53a709cc232d09).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77900/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18260
  
**[Test build #77900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77900/testReport)**
 for PR 18260 at commit 
[`a61ba71`](https://github.com/apache/spark/commit/a61ba71ec6bf8245fcce423958d83cbc86b27adc).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18082
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18082
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77904/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to MathExpre...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18082
  
**[Test build #77904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77904/testReport)**
 for PR 18082 at commit 
[`4e20839`](https://github.com/apache/spark/commit/4e20839de8b67645252338006dc411bbc0c31173).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121307879
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -42,15 +42,13 @@
   // Array for each type. Only 1 is populated for any type.
   private byte[] byteData;
   private short[] shortData;
+  // This is not only used to store data for int column vector, but also 
can store offsets and
+  // lengths for array column vector.
   private int[] intData;
--- End diff --

@kiszk Do you meant we store a pair of offset/length together as an element 
in `longData`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18230
  
**[Test build #77910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77910/testReport)**
 for PR 18230 at commit 
[`2c50858`](https://github.com/apache/spark/commit/2c50858be9fd422d32d5648b651a4b26ba3f8728).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121307570
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -366,55 +364,22 @@ public double getDouble(int rowId) {
 }
   }
 
-  //
-  // APIs dealing with Arrays
-  //
-
-  @Override
-  public int getArrayLength(int rowId) {
-return arrayLengths[rowId];
-  }
-  @Override
-  public int getArrayOffset(int rowId) {
-return arrayOffsets[rowId];
-  }
-
-  @Override
-  public void putArray(int rowId, int offset, int length) {
-arrayOffsets[rowId] = offset;
-arrayLengths[rowId] = length;
-  }
-
   @Override
   public void loadBytes(ColumnVector.Array array) {
 array.byteArray = byteData;
 array.byteArrayOffset = array.offset;
   }
 
-  //
-  // APIs dealing with Byte Arrays
-  //
-
-  @Override
-  public int putByteArray(int rowId, byte[] value, int offset, int length) 
{
-int result = arrayData().appendBytes(length, value, offset);
-arrayOffsets[rowId] = result;
-arrayLengths[rowId] = length;
-return result;
-  }
-
   // Spilt this function out since it is the slow path.
   @Override
   protected void reserveInternal(int newCapacity) {
 if (this.resultArray != null || 
DecimalType.isByteArrayDecimalType(type)) {
-  int[] newLengths = new int[newCapacity];
-  int[] newOffsets = new int[newCapacity];
-  if (this.arrayLengths != null) {
-System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
-System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity);
+  // need 2 ints as offset and length for each array.
+  if (intData == null || intData.length < newCapacity * 2) {
+int[] newData = new int[newCapacity * 2];
--- End diff --

`newCapacity` here can be `MAX_CAPACITY` at most. When `newCapacity` is 
more than `MAX_CAPACITY /  2`, seems this allocation would cause problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18230: [SPARK-19688] [STREAMING] Not to read `spark.yarn.creden...

2017-06-11 Thread saturday-shi
Github user saturday-shi commented on the issue:

https://github.com/apache/spark/pull/18230
  
No, I don't mean to insist on my opinion. I'm just curious to know the 
reason for the changing (as it looks like another point fix).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77903/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121306993
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -42,15 +42,13 @@
   // Array for each type. Only 1 is populated for any type.
   private byte[] byteData;
   private short[] shortData;
+  // This is not only used to store data for int column vector, but also 
can store offsets and
+  // lengths for array column vector.
   private int[] intData;
--- End diff --

Oh. I see. We only check the limit of `MAX_CAPACITY` before actually going 
into `reserveInternal`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121306979
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -42,15 +42,13 @@
   // Array for each type. Only 1 is populated for any type.
   private byte[] byteData;
   private short[] shortData;
+  // This is not only used to store data for int column vector, but also 
can store offsets and
+  // lengths for array column vector.
   private int[] intData;
--- End diff --

Good catch. Is it possible to use `longData`, which has a pair of 32-bit 
offset and length, to keep `MAX_CAPACITY` array length?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18260
  
**[Test build #77903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77903/testReport)**
 for PR 18260 at commit 
[`368c346`](https://github.com/apache/spark/commit/368c3462667bdb6822be01ecc95dfcdb04ce747a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121306584
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
@@ -42,15 +42,13 @@
   // Array for each type. Only 1 is populated for any type.
   private byte[] byteData;
   private short[] shortData;
+  // This is not only used to store data for int column vector, but also 
can store offsets and
+  // lengths for array column vector.
   private int[] intData;
--- End diff --

One question I have is, the capacity of `ColumnVector` is bound by 
`MAX_CAPACITY`. Previously we store offset and length individually, so we can 
have `MAX_CAPACITY` arrays at most. Now we store offset and length together in 
data/intData which is bound to `MAX_CAPACITY`, doesn't it say we can just have 
`MAX_CAPACITY / 2` arrays at most?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18128: [SPARK-20906][SparkR]:Constrained Logistic Regression fo...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18128
  
**[Test build #77909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77909/testReport)**
 for PR 18128 at commit 
[`c3190b5`](https://github.com/apache/spark/commit/c3190b5b4701afeceec17bbaa7c4ef6f0239b2c8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18266
  
**[Test build #77908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77908/testReport)**
 for PR 18266 at commit 
[`0444c4d`](https://github.com/apache/spark/commit/0444c4d25d5943408a3ea11f84395dd38246e2f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18260
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18271: [MINOR][DOCS] Improve Running R Tests docs

2017-06-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18271#discussion_r121305976
  
--- Diff: docs/building-spark.md ---
@@ -219,7 +219,7 @@ The run-tests script also can be limited to a specific 
Python version or a speci
 ## Running R Tests
 
 To run the SparkR tests you will need to install the R package `testthat`
-(run `install.packages(testthat)` from R shell).  You can run just the 
SparkR tests using
+(run `install.packages("testthat")` from R shell).  You can run just the 
SparkR tests using
--- End diff --

Mind updating this contents to be consistent with 
https://github.com/apache/spark/blob/7e0cd1d9b168286386f15e9b55988733476ae2bb/R/README.md#examples-unit-tests
 if it sounds making sense to you?

This also looks including specifying the mirror from where to download the 
package.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-06-11 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/18266
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18271
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18271
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77907/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18271
  
**[Test build #77907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77907/testReport)**
 for PR 18271 at commit 
[`5643ab9`](https://github.com/apache/spark/commit/5643ab9e652b3d20c335a6d4e7545da0f115d774).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77902/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18260
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18260: [SPARK-21046][SQL] simplify the array offset and length ...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18260
  
**[Test build #77902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77902/testReport)**
 for PR 18260 at commit 
[`2b78043`](https://github.com/apache/spark/commit/2b780432173cc3e2027117e446d66899a097fe67).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-06-11 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/18266
  
retest please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77901/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18199
  
**[Test build #77901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77901/testReport)**
 for PR 18199 at commit 
[`1d8454d`](https://github.com/apache/spark/commit/1d8454db916853e783792411704f7a49314ee9eb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18271: [MINOR][DOCS] Improve docs to Running R Tests

2017-06-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18271
  
**[Test build #77907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77907/testReport)**
 for PR 18271 at commit 
[`5643ab9`](https://github.com/apache/spark/commit/5643ab9e652b3d20c335a6d4e7545da0f115d774).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18271: [MINOR][DOCS] Improve docs to Running R Tests

2017-06-11 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/18271

[MINOR][DOCS] Improve docs to Running R Tests

## What changes were proposed in this pull request?

`install.packages(testthat)` should be `install.packages("testthat")`, 
otherwise:
```
> install.packages(testthat)
Error in install.packages(testthat) : object 'testthat' not found
```

## How was this patch tested?
```
> install.packages("testthat")
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
trying URL 
'https://mirror.lzu.edu.cn/CRAN/src/contrib/testthat_1.0.2.tar.gz'
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark building-spark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18271


commit 5643ab9e652b3d20c335a6d4e7545da0f115d774
Author: Yuming Wang 
Date:   2017-06-12T02:54:11Z

Improve docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18260: [SPARK-21046][SQL] simplify the array offset and ...

2017-06-11 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18260#discussion_r121303857
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 ---
@@ -438,10 +401,8 @@ public void loadBytes(ColumnVector.Array array) {
   protected void reserveInternal(int newCapacity) {
 int oldCapacity = (this.data == 0L) ? 0 : capacity;
 if (this.resultArray != null) {
-  this.lengthData =
-  Platform.reallocateMemory(lengthData, oldCapacity * 4, 
newCapacity * 4);
-  this.offsetData =
-  Platform.reallocateMemory(offsetData, oldCapacity * 4, 
newCapacity * 4);
+  // need 2 ints as offset and length for each array.
--- End diff --

Oh. This's quite ambiguous. Nvm. `for each array` is good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >