[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17242
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17330: [SPARK-19993][SQL] Caching logical plans containi...

2017-03-16 Thread dilipbiswal
GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/17330

[SPARK-19993][SQL] Caching logical plans containing subquery expressions 
does not work.

## What changes were proposed in this pull request?
The sameResult() method does not work when the logical plan contains 
subquery expressions.

Before the fix
```SQL
scala> val ds = spark.sql("select * from s1 where s1.c1 in (select s2.c1 
from s2 where s1.c1 = s2.c1)")
ds: org.apache.spark.sql.DataFrame = [c1: int]

scala> ds.cache
res13: ds.type = [c1: int]

scala> spark.sql("select * from s1 where s1.c1 in (select s2.c1 from s2 
where s1.c1 = s2.c1)").explain(true)
== Analyzed Logical Plan ==
c1: int
Project [c1#86]
+- Filter c1#86 IN (list#78 [c1#86])
   :  +- Project [c1#87]
   : +- Filter (outer(c1#86) = c1#87)
   :+- SubqueryAlias s2
   :   +- Relation[c1#87] parquet
   +- SubqueryAlias s1
  +- Relation[c1#86] parquet

== Optimized Logical Plan ==
Join LeftSemi, ((c1#86 = c1#87) && (c1#86 = c1#87))
:- Relation[c1#86] parquet
+- Relation[c1#87] parquet
```
Plan after fix
```SQL
== Analyzed Logical Plan ==
c1: int
Project [c1#22]
+- Filter c1#22 IN (list#14 [c1#22])
   :  +- Project [c1#23]
   : +- Filter (outer(c1#22) = c1#23)
   :+- SubqueryAlias s2
   :   +- Relation[c1#23] parquet
   +- SubqueryAlias s1
  +- Relation[c1#22] parquet

== Optimized Logical Plan ==
InMemoryRelation [c1#22], true, 1, StorageLevel(disk, memory, 
deserialized, 1 replicas)
   +- *BroadcastHashJoin [c1#1, c1#1], [c1#2, c1#2], LeftSemi, BuildRight
  :- *FileScan parquet default.s1[c1#1] Batched: true, Format: Parquet, 
Location: 
InMemoryFileIndex[file:/Users/dbiswal/mygit/apache/spark/bin/spark-warehouse/s1],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
  +- BroadcastExchange 
HashedRelationBroadcastMode(List((shiftleft(cast(input[0, int, true] as 
bigint), 32) | (cast(input[0, int, true] as bigint) & 4294967295
 +- *FileScan parquet default.s2[c1#2] Batched: true, Format: 
Parquet, Location: 
InMemoryFileIndex[file:/Users/dbiswal/mygit/apache/spark/bin/spark-warehouse/s2],
 PartitionFilters: [], PushedFilters: [], ReadSchema: struct
```
## How was this patch tested?
New tests are added to CachedTableSuite.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark subquery_cache_final

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17330


commit dfa76dafdfaabee7240adbaaacb101e484418013
Author: Dilip Biswal 
Date:   2017-03-03T10:12:56Z

[SPARK-19993] Caching logical plans containing subquery expressions does 
not work




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17242
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17242
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74725/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17242
  
**[Test build #74725 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74725/testReport)**
 for PR 17242 at commit 
[`93b83ef`](https://github.com/apache/spark/commit/93b83ef0b15c453adddc459f57cccb36269e4e08).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17315: [SPARK-19949][SQL] unify bad record handling in CSV and ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17315
  
**[Test build #74728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74728/testReport)**
 for PR 17315 at commit 
[`10e70fe`](https://github.com/apache/spark/commit/10e70fefd13d744592807cbae4ba712f07f5debe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...

2017-03-16 Thread ScrapCodes
Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/17308
  
Please take a look, @tcondie @zsxwing !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17329
  
**[Test build #74727 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74727/testReport)**
 for PR 17329 at commit 
[`e0a3b62`](https://github.com/apache/spark/commit/e0a3b6286f8d1171d921fd1f90d88ca825b12b56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17320
  
**[Test build #74726 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74726/testReport)**
 for PR 17320 at commit 
[`ce39a9d`](https://github.com/apache/spark/commit/ce39a9dae6d322d0b800b260b9a4822d9e0e1f1d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry

2017-03-16 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17320
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106588563
  
--- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala ---
@@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, 
diskManager: DiskBlockManager) e
   }
 
   def putBytes(blockId: BlockId, bytes: ChunkedByteBuffer): Unit = {
-put(blockId) { fileOutputStream =>
-  val channel = fileOutputStream.getChannel
-  Utils.tryWithSafeFinally {
-bytes.writeFully(channel)
-  } {
-channel.close()
-  }
+put(blockId) { channel =>
+  bytes.writeFully(channel)
 }
   }
 
-  def getBytes(blockId: BlockId): ChunkedByteBuffer = {
+  def getBytes(blockId: BlockId): BlockData = {
 val file = diskManager.getFile(blockId.name)
-val channel = new RandomAccessFile(file, "r").getChannel
-Utils.tryWithSafeFinally {
-  // For small files, directly read rather than memory map
-  if (file.length < minMemoryMapBytes) {
-val buf = ByteBuffer.allocate(file.length.toInt)
-channel.position(0)
-while (buf.remaining() != 0) {
-  if (channel.read(buf) == -1) {
-throw new IOException("Reached EOF before filling buffer\n" +
-  
s"offset=0\nfile=${file.getAbsolutePath}\nbuf.remaining=${buf.remaining}")
+val blockSize = getSize(blockId)
+
+securityManager.getIOEncryptionKey() match {
+  case Some(key) =>
+// Encrypted blocks cannot be memory mapped; return a special 
object that does decryption
+// and provides InputStream / FileRegion implementations for 
reading the data.
+new EncryptedBlockData(file, blockSize, conf, key)
+
+  case _ =>
+val channel = new FileInputStream(file).getChannel()
+if (blockSize < minMemoryMapBytes) {
+  // For small files, directly read rather than memory map.
+  Utils.tryWithSafeFinally {
+val buf = ByteBuffer.allocate(blockSize.toInt)
+while (buf.remaining() > 0) {
+  channel.read(buf)
+}
+buf.flip()
+new ByteBufferBlockData(new ChunkedByteBuffer(buf))
+  } {
+channel.close()
+  }
+} else {
+  Utils.tryWithSafeFinally {
+new ByteBufferBlockData(
+  new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, 
file.length)))
+  } {
+channel.close()
   }
 }
-buf.flip()
-new ChunkedByteBuffer(buf)
-  } else {
-new ChunkedByteBuffer(channel.map(MapMode.READ_ONLY, 0, 
file.length))
-  }
-} {
-  channel.close()
 }
   }
 
   def remove(blockId: BlockId): Boolean = {
 val file = diskManager.getFile(blockId.name)
-if (file.exists()) {
-  val ret = file.delete()
-  if (!ret) {
-logWarning(s"Error deleting ${file.getPath()}")
+val meta = diskManager.getMetadataFile(blockId)
+
+def delete(f: File): Boolean = {
+  if (f.exists()) {
+val ret = f.delete()
+if (!ret) {
+  logWarning(s"Error deleting ${file.getPath()}")
+}
+
+ret
+  } else {
+false
   }
-  ret
-} else {
-  false
 }
+
+delete(file) & delete(meta)
   }
 
   def contains(blockId: BlockId): Boolean = {
 val file = diskManager.getFile(blockId.name)
 file.exists()
   }
+
+  private def openForWrite(file: File): WritableByteChannel = {
+val out = new FileOutputStream(file).getChannel()
+try {
+  securityManager.getIOEncryptionKey().map { key =>
+CryptoStreamUtils.createWritableChannel(out, conf, key)
+  }.getOrElse(out)
+} catch {
+  case e: Exception =>
+out.close()
+throw e
+}
+  }
+
+}
+
+private class EncryptedBlockData(
+file: File,
+blockSize: Long,
+conf: SparkConf,
+key: Array[Byte]) extends BlockData {
+
+  override def toInputStream(): InputStream = 
Channels.newInputStream(open())
+
+  override def toManagedBuffer(): ManagedBuffer = new 
EncryptedManagedBuffer()
+
+  override def toByteBuffer(allocator: Int => ByteBuffer): 
ChunkedByteBuffer = {
+val source = open()
+try {
+  var remaining = blockSize
+  val chunks = new ListBuffer[ByteBuffer]()
+  while (remaining > 0) {
+val chunkSize = math.min(remaining, Int.MaxValue)
+val chunk = allocator(chun

[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74721/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16209
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #74721 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74721/testReport)**
 for PR 16209 at commit 
[`e76b7e0`](https://github.com/apache/spark/commit/e76b7e0b6fab0adf30f2be7ea7be50298196ac72).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106587687
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -1235,7 +1251,7 @@ private[spark] class BlockManager(
   peer.port,
   peer.executorId,
   blockId,
-  new NettyManagedBuffer(data.toNetty),
+  new BlockManagerManagedBuffer(blockInfoManager, blockId, 
data.toManagedBuffer()),
--- End diff --

why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17242
  
**[Test build #74725 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74725/testReport)**
 for PR 17242 at commit 
[`93b83ef`](https://github.com/apache/spark/commit/93b83ef0b15c453adddc459f57cccb36269e4e08).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17329
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74715/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587332
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' 
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_features", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(features = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, 
',') as features")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' featureCol = "baskets", predictionCol = 
"predicted",
+#' numPartitions = 10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   featuresCol = "features", predictionCol = "prediction",
+   numPartitions = -1) {
+if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 
1) {
+  stop("minSupport should be a number [0, 1].")
+}
+if (!is.numeric(minConfidence) || minConfidence < 0 || 
minConfidence > 1) {
+  stop("minConfidence should be a number [0, 1].")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", 
"fit",
+data@sdf, as.numeric(minSupport), 
as.numeric(minConfidence),
+featuresCol, predictionCol, 
as.integer(numPartitions))
+new("FPGrowthModel", jobj = jobj)
+  })
+
  

[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17329
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587261
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' 
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_features", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(features = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, 
',') as features")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' featureCol = "baskets", predictionCol = 
"predicted",
+#' numPartitions = 10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   featuresCol = "features", predictionCol = "prediction",
+   numPartitions = -1) {
+if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 
1) {
+  stop("minSupport should be a number [0, 1].")
+}
+if (!is.numeric(minConfidence) || minConfidence < 0 || 
minConfidence > 1) {
+  stop("minConfidence should be a number [0, 1].")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", 
"fit",
+data@sdf, as.numeric(minSupport), 
as.numeric(minConfidence),
+featuresCol, predictionCol, 
as.integer(numPartitions))
+new("FPGrowthModel", jobj = jobj)
+  })
+
  

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587357
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' 
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_features", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(features = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, 
',') as features")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' featureCol = "baskets", predictionCol = 
"predicted",
+#' numPartitions = 10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   featuresCol = "features", predictionCol = "prediction",
+   numPartitions = -1) {
+if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 
1) {
+  stop("minSupport should be a number [0, 1].")
+}
+if (!is.numeric(minConfidence) || minConfidence < 0 || 
minConfidence > 1) {
+  stop("minConfidence should be a number [0, 1].")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", 
"fit",
+data@sdf, as.numeric(minSupport), 
as.numeric(minConfidence),
+featuresCol, predictionCol, 
as.integer(numPartitions))
+new("FPGrowthModel", jobj = jobj)
+  })
+
  

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587496
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala 
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FPGrowthWrapper private (val fpGrowthModel: 
FPGrowthModel) extends MLWritable {
+  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
+  def associationRules: DataFrame = fpGrowthModel.associationRules
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+fpGrowthModel.transform(dataset)
+  }
+
+  override def write: MLWriter = new 
FPGrowthWrapper.FPGrowthWrapperWriter(this)
+}
+
+private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
+
+  def fit(
+ data: DataFrame,
+ minSupport: Double,
+ minConfidence: Double,
+ featuresCol: String,
+ predictionCol: String,
+ numPartitions: Integer): FPGrowthWrapper = {
+val fpGrowth = new FPGrowth()
+  .setMinSupport(minSupport)
+  .setMinConfidence(minConfidence)
+  .setPredictionCol(predictionCol)
+
+if (numPartitions != null && numPartitions > 0) {
+  fpGrowth.setNumPartitions(numPartitions)
+}
+
+val fpGrowthModel = fpGrowth.fit(data)
+
+new FPGrowthWrapper(fpGrowthModel)
+  }
+
+  override def read: MLReader[FPGrowthWrapper] = new FPGrowthWrapperReader
+
+  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] {
+override def load(path: String): FPGrowthWrapper = {
+  val modelPath = new Path(path, "model").toString
+  val fPGrowthModel = FPGrowthModel.load(modelPath)
+
+  new FPGrowthWrapper(fPGrowthModel)
+}
+  }
+
+class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends 
MLWriter {
--- End diff --

indentation seems incorrect here and above line. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587413
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala 
---
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.r
+
+import org.apache.hadoop.fs.Path
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods._
+
+import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
+import org.apache.spark.ml.util._
+import org.apache.spark.sql.{DataFrame, Dataset}
+
+private[r] class FPGrowthWrapper private (val fpGrowthModel: 
FPGrowthModel) extends MLWritable {
+  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
+  def associationRules: DataFrame = fpGrowthModel.associationRules
+
+  def transform(dataset: Dataset[_]): DataFrame = {
+fpGrowthModel.transform(dataset)
+  }
+
+  override def write: MLWriter = new 
FPGrowthWrapper.FPGrowthWrapperWriter(this)
+}
+
+private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
+
+  def fit(
+ data: DataFrame,
--- End diff --

alignment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587130
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' 
--- End diff --

Other APIs do not have blank line here. I think we should be consistent. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587054
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
--- End diff --

This line seems exceeding the length limit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587315
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' 
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_features", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(features = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, 
',') as features")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' featureCol = "baskets", predictionCol = 
"predicted",
+#' numPartitions = 10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   featuresCol = "features", predictionCol = "prediction",
+   numPartitions = -1) {
+if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 
1) {
+  stop("minSupport should be a number [0, 1].")
+}
+if (!is.numeric(minConfidence) || minConfidence < 0 || 
minConfidence > 1) {
+  stop("minConfidence should be a number [0, 1].")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", 
"fit",
+data@sdf, as.numeric(minSupport), 
as.numeric(minConfidence),
+featuresCol, predictionCol, 
as.integer(numPartitions))
+new("FPGrowthModel", jobj = jobj)
+  })
+
  

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-16 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587292
  
--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining 
algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#' 
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm 
is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP 
distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described 
in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#' 
+#' @rdname spark.fpGrowth
+#' @name spark.fpGrowth
+#' @aliases spark.fpGrowth,SparkDataFrame-method
+#' @export
+#' @examples
+#' \dontrun{
+#' raw_data <- read.df(
+#'   "data/mllib/sample_fpgrowth.txt",
+#'   source = "csv",
+#'   schema = structType(structField("raw_features", "string")))
+#'
+#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
+#' model <- spark.fpGrowth(data)
+#'
+#' # Show frequent itemsets
+#' frequent_itemsets <- spark.freqItemsets(model)
+#' showDF(frequent_itemsets)
+#'
+#' # Show association rules
+#' association_rules <- spark.associationRules(model)
+#' showDF(association_rules)
+#'
+#' # Predict on new data
+#' new_itemsets <- data.frame(features = c("t", "t,s"))
+#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, 
',') as features")
+#' predict(model, new_data)
+#'
+#' # Save and load model
+#' path <- "/path/to/model"
+#' write.ml(model, path)
+#' read.ml(path)
+#'
+#' # Optional arguments
+#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, 
',') as baskets")
+#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 
0.5
+#' featureCol = "baskets", predictionCol = 
"predicted",
+#' numPartitions = 10)
+#' }
+#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
+#' @note spark.fpGrowth since 2.2.0
+setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
+  function(data, minSupport = 0.3, minConfidence = 0.8,
+   featuresCol = "features", predictionCol = "prediction",
+   numPartitions = -1) {
+if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 
1) {
+  stop("minSupport should be a number [0, 1].")
+}
+if (!is.numeric(minConfidence) || minConfidence < 0 || 
minConfidence > 1) {
+  stop("minConfidence should be a number [0, 1].")
+}
+
+jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", 
"fit",
+data@sdf, as.numeric(minSupport), 
as.numeric(minConfidence),
+featuresCol, predictionCol, 
as.integer(numPartitions))
+new("FPGrowthModel", jobj = jobj)
+  })
+
  

[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17329
  
**[Test build #74715 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74715/testReport)**
 for PR 17329 at commit 
[`dc5fd8d`](https://github.com/apache/spark/commit/dc5fd8dda4717b485d4c4b2dfcdc5d115abf811c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106587428
  
--- Diff: 
core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala ---
@@ -102,4 +150,34 @@ private[spark] object CryptoStreamUtils extends 
Logging {
 }
 iv
   }
+
+  /**
+   * This class is a workaround for CRYPTO-125, that forces all bytes to 
be written to the
+   * underlying channel. Since the callers of this API are using blocking 
I/O, there are no
+   * concerns with regards to CPU usage here.
--- End diff --

is it a separated bug fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17295#discussion_r106587322
  
--- Diff: 
core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala ---
@@ -63,12 +83,40 @@ private[spark] object CryptoStreamUtils extends Logging 
{
   is: InputStream,
   sparkConf: SparkConf,
   key: Array[Byte]): InputStream = {
-val properties = toCryptoConf(sparkConf)
 val iv = new Array[Byte](IV_LENGTH_IN_BYTES)
-is.read(iv, 0, iv.length)
-val transformationStr = sparkConf.get(IO_CRYPTO_CIPHER_TRANSFORMATION)
-new CryptoInputStream(transformationStr, properties, is,
-  new SecretKeySpec(key, "AES"), new IvParameterSpec(iv))
+var read = 0
+while (read < iv.length) {
--- End diff --

what does this while loop do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17242: [SPARK-19902][SQL] Add optimization rule to simplify exp...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17242
  
**[Test build #74722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74722/testReport)**
 for PR 17242 at commit 
[`f4e771d`](https://github.com/apache/spark/commit/f4e771d85a33ff465d793e74ff4401453eaf0f3b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16971
  
**[Test build #74724 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74724/testReport)**
 for PR 16971 at commit 
[`ed6dacd`](https://github.com/apache/spark/commit/ed6dacdb3e3bdfd4e9ccb5c57bf8b4118636b0c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17192
  
**[Test build #74723 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74723/testReport)**
 for PR 17192 at commit 
[`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17329
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74714/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17329
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17329
  
**[Test build #74714 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74714/testReport)**
 for PR 17329 at commit 
[`abcfc79`](https://github.com/apache/spark/commit/abcfc79991ecd1d5cef2cd1e275b872695ba19d9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17192
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17192
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74718/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17192
  
**[Test build #74718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74718/testReport)**
 for PR 17192 at commit 
[`8c97406`](https://github.com/apache/spark/commit/8c97406b984ab68b74df2116547c1dbedb675785).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class JsonToStructs(`
  * `case class StructsToJson(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17088
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74710/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17088
  
**[Test build #74710 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74710/testReport)**
 for PR 17088 at commit 
[`8787db1`](https://github.com/apache/spark/commit/8787db1679c5b468afa3d2ede64eee53908fa5de).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16971
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74717/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16971
  
**[Test build #74717 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74717/testReport)**
 for PR 16971 at commit 
[`00d67f7`](https://github.com/apache/spark/commit/00d67f71e8c3eb254eabb63a53efdf675689aeb3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17320
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74719/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17320
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17320
  
**[Test build #74719 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74719/testReport)**
 for PR 17320 at commit 
[`ce39a9d`](https://github.com/apache/spark/commit/ce39a9dae6d322d0b800b260b9a4822d9e0e1f1d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17192
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74720/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17192
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17192
  
**[Test build #74720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74720/testReport)**
 for PR 17192 at commit 
[`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17302
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74716/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17302
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17302
  
**[Test build #74716 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74716/testReport)**
 for PR 17302 at commit 
[`43678e7`](https://github.com/apache/spark/commit/43678e793148521b44713b4373d89d8db0bb2e66).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17095: [SPARK-19763][SQL]qualified external datasource table lo...

2017-03-16 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/17095
  
Sounds like this was caused by a different PR (see the comment on the JIRA) 
and is now being fixed, so never mind here!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17295
  
makes sense. one more question, ideally, shall we also transfer shuffle 
blocks after decryption?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16209: [WIP][SPARK-10849][SQL] Adds option to the JDBC data sou...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16209
  
**[Test build #74721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74721/testReport)**
 for PR 16209 at commit 
[`e76b7e0`](https://github.com/apache/spark/commit/e76b7e0b6fab0adf30f2be7ea7be50298196ac72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-03-16 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17191
  
okay, I'll recheck the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17191
  
ok makes sense, let's support it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17286: [SPARK-19915][SQL] Exclude cartesian product candidates ...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17286
  
LGTM except some minor comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106583389
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala
 ---
@@ -187,6 +220,8 @@ class JoinReorderSuite extends PlanTest with 
StatsEstimationTestBase {
   case (j1: Join, j2: Join) =>
 (sameJoinPlan(j1.left, j2.left) && sameJoinPlan(j1.right, 
j2.right)) ||
   (sameJoinPlan(j1.left, j2.right) && sameJoinPlan(j1.right, 
j2.left))
+  case _ if plan1.children.nonEmpty && plan2.children.nonEmpty =>
--- End diff --

when will we hit this branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106583140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -710,6 +710,14 @@ object SQLConf {
   .intConf
   .createWithDefault(12)
 
+  val JOIN_REORDER_CARD_WEIGHT =
+buildConf("spark.sql.cbo.joinReorder.card.weight")
+  .doc("The weight of cardinality (number of rows) for plan cost 
comparison in join reorder: " +
+"rows * weight + size * (1 - weight).")
+  .doubleConf
+  .checkValue(weight => weight >= 0 && weight <= 1, "The weight value 
must be in [0, 1].")
+  .createWithDefault(0.7)
--- End diff --

it is useful to expose this config? I think most of the users will just 
disable join reordering if they have problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106583013
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -203,64 +205,46 @@ object JoinReorderDP extends PredicateHelper {
   private def buildJoin(
   oneJoinPlan: JoinPlan,
   otherJoinPlan: JoinPlan,
-  conf: CatalystConf,
+  conf: SQLConf,
   conditions: Set[Expression],
-  topOutput: AttributeSet): JoinPlan = {
+  topOutput: AttributeSet): Option[JoinPlan] = {
 
 val onePlan = oneJoinPlan.plan
 val otherPlan = otherJoinPlan.plan
-// Now both onePlan and otherPlan become intermediate joins, so the 
cost of the
-// new join should also include their own cardinalities and sizes.
-val newCost = if (isCartesianProduct(onePlan) || 
isCartesianProduct(otherPlan)) {
-  // We consider cartesian product very expensive, thus set a very 
large cost for it.
-  // This enables to plan all the cartesian products at the end, 
because having a cartesian
-  // product as an intermediate join will significantly increase a 
plan's cost, making it
-  // impossible to be selected as the best plan for the items, unless 
there's no other choice.
-  Cost(
-rows = BigInt(Long.MaxValue) * BigInt(Long.MaxValue),
-size = BigInt(Long.MaxValue) * BigInt(Long.MaxValue))
-} else {
-  val onePlanStats = onePlan.stats(conf)
-  val otherPlanStats = otherPlan.stats(conf)
-  Cost(
-rows = oneJoinPlan.cost.rows + onePlanStats.rowCount.get +
-  otherJoinPlan.cost.rows + otherPlanStats.rowCount.get,
-size = oneJoinPlan.cost.size + onePlanStats.sizeInBytes +
-  otherJoinPlan.cost.size + otherPlanStats.sizeInBytes)
-}
-
-// Put the deeper side on the left, tend to build a left-deep tree.
-val (left, right) = if (oneJoinPlan.itemIds.size >= 
otherJoinPlan.itemIds.size) {
-  (onePlan, otherPlan)
-} else {
-  (otherPlan, onePlan)
-}
 val joinConds = conditions
   .filterNot(l => canEvaluate(l, onePlan))
   .filterNot(r => canEvaluate(r, otherPlan))
   .filter(e => e.references.subsetOf(onePlan.outputSet ++ 
otherPlan.outputSet))
-// We use inner join whether join condition is empty or not. Since 
cross join is
-// equivalent to inner join without condition.
-val newJoin = Join(left, right, Inner, joinConds.reduceOption(And))
-val collectedJoinConds = joinConds ++ oneJoinPlan.joinConds ++ 
otherJoinPlan.joinConds
-val remainingConds = conditions -- collectedJoinConds
-val neededAttr = AttributeSet(remainingConds.flatMap(_.references)) ++ 
topOutput
-val neededFromNewJoin = newJoin.outputSet.filter(neededAttr.contains)
-val newPlan =
-  if ((newJoin.outputSet -- neededFromNewJoin).nonEmpty) {
-Project(neededFromNewJoin.toSeq, newJoin)
+if (joinConds.isEmpty) {
+  // Cartesian product is very expensive, so we exclude them from 
candidate plans.
+  // This also significantly reduces the search space.
--- End diff --

great! now we can safely apply this optimization :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106582845
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -272,26 +256,39 @@ object JoinReorderDP extends PredicateHelper {
* @param itemIds Set of item ids participating in this partial plan.
* @param plan The plan tree with the lowest cost for these items found 
so far.
* @param joinConds Join conditions included in the plan.
-   * @param cost The cost of this plan is the sum of costs of all 
intermediate joins.
+   * @param planCost The cost of this plan tree is the sum of costs of all 
intermediate joins.
--- End diff --

I think `cost` is good enough, why rename it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106582668
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -185,11 +184,14 @@ object JoinReorderDP extends PredicateHelper {
   // Should not join two overlapping item sets.
   if 
(oneSidePlan.itemIds.intersect(otherSidePlan.itemIds).isEmpty) {
 val joinPlan = buildJoin(oneSidePlan, otherSidePlan, conf, 
conditions, topOutput)
-// Check if it's the first plan for the item set, or it's a 
better plan than
-// the existing one due to lower cost.
-val existingPlan = nextLevel.get(joinPlan.itemIds)
-if (existingPlan.isEmpty || 
joinPlan.cost.lessThan(existingPlan.get.cost)) {
-  nextLevel.update(joinPlan.itemIds, joinPlan)
+if (joinPlan.isDefined) {
--- End diff --

when will this condition be false?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17286: [SPARK-19915][SQL] Exclude cartesian product cand...

2017-03-16 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17286#discussion_r106582563
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -128,38 +131,34 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
 object JoinReorderDP extends PredicateHelper {
 
   def search(
-  conf: CatalystConf,
+  conf: SQLConf,
   items: Seq[LogicalPlan],
   conditions: Set[Expression],
-  topOutput: AttributeSet): Option[LogicalPlan] = {
+  topOutput: AttributeSet): LogicalPlan = {
 
 // Level i maintains all found plans for i + 1 items.
 // Create the initial plans: each plan is a single item with zero cost.
-val itemIndex = items.zipWithIndex
+val itemIndex = items.zipWithIndex.map(_.swap).toMap
--- End diff --

looks like an unnecessary change now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17191
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17191
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74713/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17191
  
**[Test build #74713 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74713/testReport)**
 for PR 17191 at commit 
[`5d8c853`](https://github.com/apache/spark/commit/5d8c8532433fc2ebebdf506d636238e6b644b4ae).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17179: [SPARK-19067][SS] Processing-time-based timeout in MapGr...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17179
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74707/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17179: [SPARK-19067][SS] Processing-time-based timeout in MapGr...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17179
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17192
  
**[Test build #74720 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74720/testReport)**
 for PR 17192 at commit 
[`703a6cb`](https://github.com/apache/spark/commit/703a6cb36ea920e87a3536f16572020c11197345).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17179: [SPARK-19067][SS] Processing-time-based timeout in MapGr...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17179
  
**[Test build #74707 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74707/testReport)**
 for PR 17179 at commit 
[`1d0008c`](https://github.com/apache/spark/commit/1d0008cedd3e37832b31b75451eaf7a67ab832f3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class KeyedStateTimeout `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17320: [SPARK-19967][SQL] Add from_json in FunctionRegistry

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17320
  
**[Test build #74719 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74719/testReport)**
 for PR 17320 at commit 
[`ce39a9d`](https://github.com/apache/spark/commit/ce39a9dae6d322d0b800b260b9a4822d9e0e1f1d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17192: [SPARK-19849][SQL] Support ArrayType in to_json to produ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17192
  
**[Test build #74718 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74718/testReport)**
 for PR 17192 at commit 
[`8c97406`](https://github.com/apache/spark/commit/8c97406b984ab68b74df2116547c1dbedb675785).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17216: [SPARK-19873][SS] Record num shuffle partitions in offse...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74708/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17216: [SPARK-19873][SS] Record num shuffle partitions in offse...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17216
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17216: [SPARK-19873][SS] Record num shuffle partitions in offse...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17216
  
**[Test build #74708 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74708/testReport)**
 for PR 17216 at commit 
[`4733b4e`](https://github.com/apache/spark/commit/4733b4e160bff010521319a1aa61e4f7981c65d6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message for ver...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17327
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message for ver...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17327
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74704/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17327: [SPARK-19721][SS][BRANCH-2.1] Good error message for ver...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17327
  
**[Test build #74704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74704/testReport)**
 for PR 17327 at commit 
[`daabb27`](https://github.com/apache/spark/commit/daabb27aa32cb19c157e19081f6d08ff368bb42b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16971
  
**[Test build #74717 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74717/testReport)**
 for PR 16971 at commit 
[`00d67f7`](https://github.com/apache/spark/commit/00d67f71e8c3eb254eabb63a53efdf675689aeb3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-03-16 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/17191
  
@cloud-fan In the mixed case, it seems PotgreSQL and MySQL support the 
syntax;
```

// PostgreSQL v9.5
postgres=# \d t2
  Table "public.t2"
 Column |  Type   | Modifiers 
+-+---
 gkey1  | integer | 
 gkey2  | integer | 
 value  | integer | 

postgres=# select gkey1 AS key1, gkey2, count(value) from t2 group by key1, 
2;
 key1 | gkey2 | count 
--+---+---
1 | 1 | 1
(1 row)

// MySQL v5.7.13 
mysql> SHOW COLUMNS FROM t2;
+---+-+--+-+-+---+
| Field | Type| Null | Key | Default | Extra |
+---+-+--+-+-+---+
| gkey1 | int(11) | YES  | | NULL|   |
| gkey2 | int(11) | YES  | | NULL|   |
| value | int(11) | YES  | | NULL|   |
+---+-+--+-+-+---+
3 rows in set (0.00 sec)

mysql> select gkey1 AS key1, gkey2, count(value) from t2 group by key1, 2;
+--+---+--+
| key1 | gkey2 | count(value) |
+--+---+--+
|1 | 1 |1 |
+--+---+--+
1 row in set (0.00 sec)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17302: [SPARK-19959][SQL] Fix to throw NullPointerException in ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17302
  
**[Test build #74716 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74716/testReport)**
 for PR 17302 at commit 
[`43678e7`](https://github.com/apache/spark/commit/43678e793148521b44713b4373d89d8db0bb2e66).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17329
  
**[Test build #74715 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74715/testReport)**
 for PR 17329 at commit 
[`dc5fd8d`](https://github.com/apache/spark/commit/dc5fd8dda4717b485d4c4b2dfcdc5d115abf811c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74702/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74699/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #74702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74702/testReport)**
 for PR 17166 at commit 
[`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17307
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17307: [SPARK-13369] Make number of consecutive fetch failures ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17307
  
**[Test build #74699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74699/testReport)**
 for PR 17307 at commit 
[`0f95c8b`](https://github.com/apache/spark/commit/0f95c8b1ad260abb1a64d9cbd25d09a1bafeb1d8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17216: [SPARK-19873][SS] Record num shuffle partitions in offse...

2017-03-16 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17216
  
Does this PR mix in some test file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17329: [SPARK-19991]FileSegmentManagedBuffer performance improv...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17329
  
**[Test build #74714 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74714/testReport)**
 for PR 17329 at commit 
[`abcfc79`](https://github.com/apache/spark/commit/abcfc79991ecd1d5cef2cd1e275b872695ba19d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17329: [SPARK-19991]FileSegmentManagedBuffer performance...

2017-03-16 Thread witgo
GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/17329

[SPARK-19991]FileSegmentManagedBuffer performance improvement

FileSegmentManagedBuffer performance improvement.


## What changes were proposed in this pull request?

When we do not set the value of the configuration items 
`spark.storage.memoryMapThreshold` and `spark.shuffle.io.lazyFD`, 
each call to the cFileSegmentManagedBuffer.nioByteBuffer or 
FileSegmentManagedBuffer.createInputStream method creates a 
NoSuchElementException instance. This is a more time-consuming operation.

In the use case, this PR can improve the performance of about 3.5%

The test code:

``` scala

(1 to 10).foreach { i =>
  val numPartition = 1
  val rdd = sc.parallelize(0 until 
numPartition).repartition(numPartition).flatMap { t =>
(0 until numPartition).map(r => r * numPartition + t)
  }.repartition(numPartition)
  val serializeStart = System.currentTimeMillis()
  rdd.sum()
  val serializeFinish = System.currentTimeMillis()
  println(f"Test $i: ${(serializeFinish - serializeStart) / 1000D}%1.2f")
}


```

and `spark-defaults.conf` file:

```
spark.master  yarn-client
spark.executor.instances  20
spark.driver.memory   64g
spark.executor.memory 30g
spark.executor.cores  5
spark.default.parallelism 100 
spark.sql.shuffle.partitions  100
spark.serializer  
org.apache.spark.serializer.KryoSerializer
spark.driver.maxResultSize0
spark.ui.enabled  false 
spark.driver.extraJavaOptions -XX:+UseG1GC 
-XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=512M 
spark.executor.extraJavaOptions   -XX:+UseG1GC 
-XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M 
spark.cleaner.referenceTracking.blocking  true
spark.cleaner.referenceTracking.blocking.shuffle  true

```

The test results are as follows

| [SPARK-19991](https://github.com/witgo/spark/tree/SPARK-19991) 
|https://github.com/apache/spark/commit/68ea290b3aa89b2a539d13ea2c18bdb5a651b2bf|
|---| --- | 
|226.09 s| 235.21 s|

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-19991

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17329


commit abcfc79991ecd1d5cef2cd1e275b872695ba19d9
Author: Guoqiang Li 
Date:   2017-03-17T03:19:37Z

FileSegmentManagedBuffer performance improvement




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74705/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16971
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16971
  
**[Test build #74705 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74705/testReport)**
 for PR 16971 at commit 
[`7bf7db3`](https://github.com/apache/spark/commit/7bf7db3114234dc900a9fd7a9b36615fa0bc1a3f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74709/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >