date:20141005

git commit: [SPARK-3597][Mesos] Implement `killTask`.

2014-10-05 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master cf1d32e3e - 32fad4233


[SPARK-3597][Mesos] Implement `killTask`.

The MesosSchedulerBackend did not previously implement `killTask`,
resulting in an exception.

Author: Brenden Matthews bren...@diddyinc.com

Closes #2453 from brndnmtthws/implement-killtask and squashes the following 
commits:

23ddcdc [Brenden Matthews] [SPARK-3597][Mesos] Implement `killTask`.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/32fad423
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/32fad423
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/32fad423

Branch: refs/heads/master
Commit: 32fad4233f353814496c84e15ba64326730b7ae7
Parents: cf1d32e
Author: Brenden Matthews bren...@diddyinc.com
Authored: Sun Oct 5 09:49:24 2014 -0700
Committer: Andrew Or andrewo...@gmail.com
Committed: Sun Oct 5 09:49:24 2014 -0700

--
 .../spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala | 7 +++
 1 file changed, 7 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/32fad423/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
index b117863..e0f2fd6 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
@@ -372,6 +372,13 @@ private[spark] class MesosSchedulerBackend(
 recordSlaveLost(d, slaveId, ExecutorExited(status))
   }
 
+  override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean): Unit = {
+driver.killTask(
+  TaskID.newBuilder()
+.setValue(taskId.toString).build()
+)
+  }
+
   // TODO: query Mesos for number of cores
   override def defaultParallelism() = 
sc.conf.getInt(spark.default.parallelism, 8)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3597][Mesos] Implement `killTask`.

2014-10-05 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 e4ddedee6 - d9cf4d08a


[SPARK-3597][Mesos] Implement `killTask`.

The MesosSchedulerBackend did not previously implement `killTask`,
resulting in an exception.

Author: Brenden Matthews bren...@diddyinc.com

Closes #2453 from brndnmtthws/implement-killtask and squashes the following 
commits:

23ddcdc [Brenden Matthews] [SPARK-3597][Mesos] Implement `killTask`.

(cherry picked from commit 32fad4233f353814496c84e15ba64326730b7ae7)
Signed-off-by: Andrew Or andrewo...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d9cf4d08
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d9cf4d08
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d9cf4d08

Branch: refs/heads/branch-1.1
Commit: d9cf4d08ae392cc840fac21ba153fdf9d9219782
Parents: e4ddede
Author: Brenden Matthews bren...@diddyinc.com
Authored: Sun Oct 5 09:49:24 2014 -0700
Committer: Andrew Or andrewo...@gmail.com
Committed: Sun Oct 5 09:49:35 2014 -0700

--
 .../spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala | 7 +++
 1 file changed, 7 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d9cf4d08/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
index 06f2c09..8f064bf 100644
--- 
a/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
+++ 
b/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala
@@ -369,6 +369,13 @@ private[spark] class MesosSchedulerBackend(
 recordSlaveLost(d, slaveId, ExecutorExited(status))
   }
 
+  override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean): Unit = {
+driver.killTask(
+  TaskID.newBuilder()
+.setValue(taskId.toString).build()
+)
+  }
+
   // TODO: query Mesos for number of cores
   override def defaultParallelism() = 
sc.conf.getInt(spark.default.parallelism, 8)
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: SPARK-1656: Fix potential resource leaks

2014-10-05 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master 32fad4233 - a7c73130f


SPARK-1656: Fix potential resource leaks

JIRA: https://issues.apache.org/jira/browse/SPARK-1656

Author: zsxwing zsxw...@gmail.com

Closes #577 from zsxwing/SPARK-1656 and squashes the following commits:

c431095 [zsxwing] Add a comment and fix the code style
2de96e5 [zsxwing] Make sure file will be deleted if exception happens
28b90dc [zsxwing] Update to follow the code style
4521d6e [zsxwing] Merge branch 'master' into SPARK-1656
afc3383 [zsxwing] Update to follow the code style
071fdd1 [zsxwing] SPARK-1656: Fix potential resource leaks


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7c73130
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7c73130
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7c73130

Branch: refs/heads/master
Commit: a7c73130f1b6b0b8b19a7b0a0de5c713b673cd7b
Parents: 32fad42
Author: zsxwing zsxw...@gmail.com
Authored: Sun Oct 5 09:55:17 2014 -0700
Committer: Andrew Or andrewo...@gmail.com
Committed: Sun Oct 5 09:56:23 2014 -0700

--
 .../apache/spark/broadcast/HttpBroadcast.scala  | 25 
 .../master/FileSystemPersistenceEngine.scala| 14 +++
 .../org/apache/spark/storage/DiskStore.scala| 16 -
 3 files changed, 40 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7c73130/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
--
diff --git a/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala 
b/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
index 942dc7d..4cd4f4f 100644
--- a/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
+++ b/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
@@ -163,18 +163,23 @@ private[broadcast] object HttpBroadcast extends Logging {
 
   private def write(id: Long, value: Any) {
 val file = getFile(id)
-val out: OutputStream = {
-  if (compress) {
-compressionCodec.compressedOutputStream(new FileOutputStream(file))
-  } else {
-new BufferedOutputStream(new FileOutputStream(file), bufferSize)
+val fileOutputStream = new FileOutputStream(file)
+try {
+  val out: OutputStream = {
+if (compress) {
+  compressionCodec.compressedOutputStream(fileOutputStream)
+} else {
+  new BufferedOutputStream(fileOutputStream, bufferSize)
+}
   }
+  val ser = SparkEnv.get.serializer.newInstance()
+  val serOut = ser.serializeStream(out)
+  serOut.writeObject(value)
+  serOut.close()
+  files += file
+} finally {
+  fileOutputStream.close()
 }
-val ser = SparkEnv.get.serializer.newInstance()
-val serOut = ser.serializeStream(out)
-serOut.writeObject(value)
-serOut.close()
-files += file
   }
 
   private def read[T: ClassTag](id: Long): T = {

http://git-wip-us.apache.org/repos/asf/spark/blob/a7c73130/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
 
b/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
index aa85aa0..08a99bb 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
@@ -83,15 +83,21 @@ private[spark] class FileSystemPersistenceEngine(
 val serialized = serializer.toBinary(value)
 
 val out = new FileOutputStream(file)
-out.write(serialized)
-out.close()
+try {
+  out.write(serialized)
+} finally {
+  out.close()
+}
   }
 
   def deserializeFromFile[T](file: File)(implicit m: Manifest[T]): T = {
 val fileData = new Array[Byte](file.length().asInstanceOf[Int])
 val dis = new DataInputStream(new FileInputStream(file))
-dis.readFully(fileData)
-dis.close()
+try {
+  dis.readFully(fileData)
+} finally {
+  dis.close()
+}
 
 val clazz = m.runtimeClass.asInstanceOf[Class[T]]
 val serializer = serialization.serializerFor(clazz)

http://git-wip-us.apache.org/repos/asf/spark/blob/a7c73130/core/src/main/scala/org/apache/spark/storage/DiskStore.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/DiskStore.scala 
b/core/src/main/scala/org/apache/spark/storage/DiskStore.scala
index e9304f6..bac459e 100644
--- a/core/src/main/scala/org/apache/spark/storage/DiskStore.scala
+++

git commit: SPARK-1656: Fix potential resource leaks

2014-10-05 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 d9cf4d08a - c068d9084


SPARK-1656: Fix potential resource leaks

JIRA: https://issues.apache.org/jira/browse/SPARK-1656

Author: zsxwing zsxw...@gmail.com

Closes #577 from zsxwing/SPARK-1656 and squashes the following commits:

c431095 [zsxwing] Add a comment and fix the code style
2de96e5 [zsxwing] Make sure file will be deleted if exception happens
28b90dc [zsxwing] Update to follow the code style
4521d6e [zsxwing] Merge branch 'master' into SPARK-1656
afc3383 [zsxwing] Update to follow the code style
071fdd1 [zsxwing] SPARK-1656: Fix potential resource leaks

(cherry picked from commit a7c73130f1b6b0b8b19a7b0a0de5c713b673cd7b)
Signed-off-by: Andrew Or andrewo...@gmail.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c068d908
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c068d908
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c068d908

Branch: refs/heads/branch-1.1
Commit: c068d9084c94cdd1aeee2c6ad6f55148bc4527ce
Parents: d9cf4d0
Author: zsxwing zsxw...@gmail.com
Authored: Sun Oct 5 09:55:17 2014 -0700
Committer: Andrew Or andrewo...@gmail.com
Committed: Sun Oct 5 09:56:32 2014 -0700

--
 .../apache/spark/broadcast/HttpBroadcast.scala  | 25 
 .../master/FileSystemPersistenceEngine.scala| 14 +++
 .../org/apache/spark/storage/DiskStore.scala| 16 -
 3 files changed, 40 insertions(+), 15 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c068d908/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
--
diff --git a/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala 
b/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
index 942dc7d..4cd4f4f 100644
--- a/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
+++ b/core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala
@@ -163,18 +163,23 @@ private[broadcast] object HttpBroadcast extends Logging {
 
   private def write(id: Long, value: Any) {
 val file = getFile(id)
-val out: OutputStream = {
-  if (compress) {
-compressionCodec.compressedOutputStream(new FileOutputStream(file))
-  } else {
-new BufferedOutputStream(new FileOutputStream(file), bufferSize)
+val fileOutputStream = new FileOutputStream(file)
+try {
+  val out: OutputStream = {
+if (compress) {
+  compressionCodec.compressedOutputStream(fileOutputStream)
+} else {
+  new BufferedOutputStream(fileOutputStream, bufferSize)
+}
   }
+  val ser = SparkEnv.get.serializer.newInstance()
+  val serOut = ser.serializeStream(out)
+  serOut.writeObject(value)
+  serOut.close()
+  files += file
+} finally {
+  fileOutputStream.close()
 }
-val ser = SparkEnv.get.serializer.newInstance()
-val serOut = ser.serializeStream(out)
-serOut.writeObject(value)
-serOut.close()
-files += file
   }
 
   private def read[T: ClassTag](id: Long): T = {

http://git-wip-us.apache.org/repos/asf/spark/blob/c068d908/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
 
b/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
index aa85aa0..08a99bb 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala
@@ -83,15 +83,21 @@ private[spark] class FileSystemPersistenceEngine(
 val serialized = serializer.toBinary(value)
 
 val out = new FileOutputStream(file)
-out.write(serialized)
-out.close()
+try {
+  out.write(serialized)
+} finally {
+  out.close()
+}
   }
 
   def deserializeFromFile[T](file: File)(implicit m: Manifest[T]): T = {
 val fileData = new Array[Byte](file.length().asInstanceOf[Int])
 val dis = new DataInputStream(new FileInputStream(file))
-dis.readFully(fileData)
-dis.close()
+try {
+  dis.readFully(fileData)
+} finally {
+  dis.close()
+}
 
 val clazz = m.runtimeClass.asInstanceOf[Class[T]]
 val serializer = serialization.serializerFor(clazz)

http://git-wip-us.apache.org/repos/asf/spark/blob/c068d908/core/src/main/scala/org/apache/spark/storage/DiskStore.scala
--
diff --git a/core/src/main/scala/org/apache/spark/storage/DiskStore.scala

git commit: [SPARK-3007][SQL] Fixes dynamic partitioning support for lower Hadoop versions

2014-10-05 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master a7c73130f - 1b97a941a


[SPARK-3007][SQL] Fixes dynamic partitioning support for lower Hadoop versions

This is a follow up of #2226 and #2616 to fix Jenkins master SBT build failures 
for lower Hadoop versions (1.0.x and 2.0.x).

The root cause is the semantics difference of `FileSystem.globStatus()` between 
different versions of Hadoop, as illustrated by the following test code:

```scala
object GlobExperiments extends App {
  val conf = new Configuration()
  val fs = FileSystem.getLocal(conf)
  fs.globStatus(new Path(/tmp/wh/*/*/*)).foreach { status =
println(status.getPath)
  }
}
```

Target directory structure:

```
/tmp/wh
âââ dir0
âÂ Â  âââ dir1
âÂ Â  âÂ Â  âââ level2
âÂ Â  âââ level1
âââ level0
```

Hadoop 2.4.1 result:

```
file:/tmp/wh/dir0/dir1/level2
```

Hadoop 1.0.4 resuet:

```
file:/tmp/wh/dir0/dir1/level2
file:/tmp/wh/dir0/level1
file:/tmp/wh/level0
```

In #2226 and #2616, we call `FileOutputCommitter.commitJob()` at the end of the 
job, and the `_SUCCESS` mark file is written. When working with lower Hadoop 
versions, due to the `globStatus()` semantics issue, `_SUCCESS` is included as 
a separate partition data file by `Hive.loadDynamicPartitions()`, and fails 
partition spec checking.  The fix introduced in this PR is kind of a hack: when 
inserting data with dynamic partitioning, we intentionally avoid writing the 
`_SUCCESS` marker to workaround this issue.

Hive doesn't suffer this issue because `FileSinkOperator` doesn't call 
`FileOutputCommitter.commitJob()`, instead, it calls 
`Utilities.mvFileToFinalPath()` to cleanup the output directory and then loads 
it into Hive warehouse by with 
`loadDynamicPartitions()`/`loadPartition()`/`loadTable()`. This approach is 
better because it handles failed job and speculative tasks properly. We should 
add this step to `InsertIntoHiveTable` in another PR.

Author: Cheng Lian lian.cs@gmail.com

Closes #2663 from liancheng/dp-hadoop-1-fix and squashes the following commits:

0177dae [Cheng Lian] Fixes dynamic partitioning support for lower Hadoop 
versions


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b97a941
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b97a941
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b97a941

Branch: refs/heads/master
Commit: 1b97a941a09a2f63d442f435c1b444d857cd6956
Parents: a7c7313
Author: Cheng Lian lian.cs@gmail.com
Authored: Sun Oct 5 11:19:17 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sun Oct 5 11:19:17 2014 -0700

--
 .../spark/sql/hive/hiveWriterContainers.scala   | 26 +---
 1 file changed, 22 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1b97a941/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
index ac5c7a8..6ccbc22 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala
@@ -55,8 +55,8 @@ private[hive] class SparkHiveWriterContainer(
   private var taID: SerializableWritable[TaskAttemptID] = null
 
   @transient private var writer: FileSinkOperator.RecordWriter = null
-  @transient private lazy val committer = conf.value.getOutputCommitter
-  @transient private lazy val jobContext = newJobContext(conf.value, jID.value)
+  @transient protected lazy val committer = conf.value.getOutputCommitter
+  @transient protected lazy val jobContext = newJobContext(conf.value, 
jID.value)
   @transient private lazy val taskContext = newTaskAttemptContext(conf.value, 
taID.value)
   @transient private lazy val outputFormat =
 conf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef,Writable]]
@@ -122,8 +122,6 @@ private[hive] class SparkHiveWriterContainer(
 }
   }
 
-  // * Private Functions *
-
   private def setIDs(jobId: Int, splitId: Int, attemptId: Int) {
 jobID = jobId
 splitID = splitId
@@ -157,12 +155,18 @@ private[hive] object SparkHiveWriterContainer {
   }
 }
 
+private[spark] object SparkHiveDynamicPartitionWriterContainer {
+  val SUCCESSFUL_JOB_OUTPUT_DIR_MARKER = 
mapreduce.fileoutputcommitter.marksuccessfuljobs
+}
+
 private[spark] class SparkHiveDynamicPartitionWriterContainer(
 @transient jobConf: JobConf,
 fileSinkConf: FileSinkDesc,
 dynamicPartColNames: Array[String])
   extends SparkHiveWriterContainer(jobConf, fileSinkConf) {
 
+  import

git commit: HOTFIX: Fix unicode error in merge script.

2014-10-05 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 1b97a941a - e21e2


HOTFIX: Fix unicode error in merge script.

The merge script builds up a big command array and sometimes
this contains both unicode and ascii strings. This doesn't work
if you try to join them into a single string. Longer term a solution
is to go and make sure the source of all strings is unicode.

This patch provides a simpler solution... just print the array
rather than joining. I actually prefer printing an array here
anyways since joining on spaces is lossy in the case of arguments
that themselves contain spaces.

Author: Patrick Wendell pwend...@gmail.com

Closes #2645 from pwendell/merge-script and squashes the following commits:

167b792 [Patrick Wendell] HOTFIX: Fix unicode error in merge script.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e21e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e21e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e21e

Branch: refs/heads/master
Commit: e21e24c122300bbde6d5ec4002a7c42b2e24
Parents: 1b97a94
Author: Patrick Wendell pwend...@gmail.com
Authored: Sun Oct 5 13:22:40 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sun Oct 5 13:22:40 2014 -0700

--
 dev/merge_spark_pr.py | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e21e/dev/merge_spark_pr.py
--
diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index a8e92e3..02ac209 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -73,11 +73,10 @@ def fail(msg):
 
 
 def run_cmd(cmd):
+print cmd
 if isinstance(cmd, list):
-print  .join(cmd)
 return subprocess.check_output(cmd)
 else:
-print cmd
 return subprocess.check_output(cmd.split( ))
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3792][SQL] Enable JavaHiveQLSuite

2014-10-05 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 79b2108de - 58f5361ca


[SPARK-3792][SQL] Enable JavaHiveQLSuite

Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two 
SparkContexts in one jvm and enable JavaHiveQLSuite

Author: scwf wangf...@huawei.com

Closes #2652 from scwf/fix-JavaHiveQLSuite and squashes the following commits:

be35c91 [scwf] enable JavaHiveQLSuite


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/58f5361c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/58f5361c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/58f5361c

Branch: refs/heads/master
Commit: 58f5361caaa2f898e38ae4b3794167881e20a818
Parents: 79b2108
Author: scwf wangf...@huawei.com
Authored: Sun Oct 5 17:47:20 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sun Oct 5 17:49:41 2014 -0700

--
 .../sql/hive/api/java/JavaHiveQLSuite.scala | 27 +++-
 1 file changed, 9 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/58f5361c/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
index 9644b70..46b11b5 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
@@ -25,34 +25,30 @@ import org.apache.spark.api.java.JavaSparkContext
 import org.apache.spark.sql.api.java.JavaSchemaRDD
 import org.apache.spark.sql.execution.ExplainCommand
 import org.apache.spark.sql.hive.test.TestHive
-import org.apache.spark.sql.test.TestSQLContext
 
 // Implicits
 import scala.collection.JavaConversions._
 
 class JavaHiveQLSuite extends FunSuite {
-  lazy val javaCtx = new JavaSparkContext(TestSQLContext.sparkContext)
+  lazy val javaCtx = new JavaSparkContext(TestHive.sparkContext)
 
   // There is a little trickery here to avoid instantiating two HiveContexts 
in the same JVM
   lazy val javaHiveCtx = new JavaHiveContext(javaCtx) {
 override val sqlContext = TestHive
   }
 
-  ignore(SELECT * FROM src) {
+  test(SELECT * FROM src) {
 assert(
   javaHiveCtx.sql(SELECT * FROM src).collect().map(_.getInt(0)) ===
 TestHive.sql(SELECT * FROM src).collect().map(_.getInt(0)).toSeq)
   }
 
-  private val explainCommandClassName =
-classOf[ExplainCommand].getSimpleName.stripSuffix($)
-
   def isExplanation(result: JavaSchemaRDD) = {
 val explanation = result.collect().map(_.getString(0))
-explanation.size  1  
explanation.head.startsWith(explainCommandClassName)
+explanation.size  1  explanation.head.startsWith(== Physical Plan ==)
   }
 
-  ignore(Query Hive native command execution result) {
+  test(Query Hive native command execution result) {
 val tableName = test_native_commands
 
 assertResult(0) {
@@ -63,23 +59,18 @@ class JavaHiveQLSuite extends FunSuite {
   javaHiveCtx.sql(sCREATE TABLE $tableName(key INT, value 
STRING)).count()
 }
 
-javaHiveCtx.sql(SHOW TABLES).registerTempTable(show_tables)
-
 assert(
   javaHiveCtx
-.sql(SELECT result FROM show_tables)
+.sql(SHOW TABLES)
 .collect()
 .map(_.getString(0))
 .contains(tableName))
 
-assertResult(Array(Array(key, int, None), Array(value, string, 
None))) {
-  javaHiveCtx.sql(sDESCRIBE 
$tableName).registerTempTable(describe_table)
-
-
+assertResult(Array(Array(key, int), Array(value, string))) {
   javaHiveCtx
-.sql(SELECT result FROM describe_table)
+.sql(sdescribe $tableName)
 .collect()
-.map(_.getString(0).split(\t).map(_.trim))
+.map(row = Array(row.get(0).asInstanceOf[String], 
row.get(1).asInstanceOf[String]))
 .toArray
 }
 
@@ -89,7 +80,7 @@ class JavaHiveQLSuite extends FunSuite {
 TestHive.reset()
   }
 
-  ignore(Exactly once semantics for DDL and command statements) {
+  test(Exactly once semantics for DDL and command statements) {
 val tableName = test_exactly_once
 val q0 = javaHiveCtx.sql(sCREATE TABLE $tableName(key INT, value STRING))
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3792][SQL] Enable JavaHiveQLSuite

2014-10-05 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 c068d9084 - 964e3aa48


[SPARK-3792][SQL] Enable JavaHiveQLSuite

Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two 
SparkContexts in one jvm and enable JavaHiveQLSuite

Author: scwf wangf...@huawei.com

Closes #2652 from scwf/fix-JavaHiveQLSuite and squashes the following commits:

be35c91 [scwf] enable JavaHiveQLSuite

(cherry picked from commit 58f5361caaa2f898e38ae4b3794167881e20a818)
Signed-off-by: Michael Armbrust mich...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/964e3aa4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/964e3aa4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/964e3aa4

Branch: refs/heads/branch-1.1
Commit: 964e3aa4800a31037e00b533f965b0f162d29e67
Parents: c068d90
Author: scwf wangf...@huawei.com
Authored: Sun Oct 5 17:47:20 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sun Oct 5 17:50:11 2014 -0700

--
 .../sql/hive/api/java/JavaHiveQLSuite.scala | 27 +++-
 1 file changed, 9 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/964e3aa4/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
--
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
index 9644b70..46b11b5 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/api/java/JavaHiveQLSuite.scala
@@ -25,34 +25,30 @@ import org.apache.spark.api.java.JavaSparkContext
 import org.apache.spark.sql.api.java.JavaSchemaRDD
 import org.apache.spark.sql.execution.ExplainCommand
 import org.apache.spark.sql.hive.test.TestHive
-import org.apache.spark.sql.test.TestSQLContext
 
 // Implicits
 import scala.collection.JavaConversions._
 
 class JavaHiveQLSuite extends FunSuite {
-  lazy val javaCtx = new JavaSparkContext(TestSQLContext.sparkContext)
+  lazy val javaCtx = new JavaSparkContext(TestHive.sparkContext)
 
   // There is a little trickery here to avoid instantiating two HiveContexts 
in the same JVM
   lazy val javaHiveCtx = new JavaHiveContext(javaCtx) {
 override val sqlContext = TestHive
   }
 
-  ignore(SELECT * FROM src) {
+  test(SELECT * FROM src) {
 assert(
   javaHiveCtx.sql(SELECT * FROM src).collect().map(_.getInt(0)) ===
 TestHive.sql(SELECT * FROM src).collect().map(_.getInt(0)).toSeq)
   }
 
-  private val explainCommandClassName =
-classOf[ExplainCommand].getSimpleName.stripSuffix($)
-
   def isExplanation(result: JavaSchemaRDD) = {
 val explanation = result.collect().map(_.getString(0))
-explanation.size  1  
explanation.head.startsWith(explainCommandClassName)
+explanation.size  1  explanation.head.startsWith(== Physical Plan ==)
   }
 
-  ignore(Query Hive native command execution result) {
+  test(Query Hive native command execution result) {
 val tableName = test_native_commands
 
 assertResult(0) {
@@ -63,23 +59,18 @@ class JavaHiveQLSuite extends FunSuite {
   javaHiveCtx.sql(sCREATE TABLE $tableName(key INT, value 
STRING)).count()
 }
 
-javaHiveCtx.sql(SHOW TABLES).registerTempTable(show_tables)
-
 assert(
   javaHiveCtx
-.sql(SELECT result FROM show_tables)
+.sql(SHOW TABLES)
 .collect()
 .map(_.getString(0))
 .contains(tableName))
 
-assertResult(Array(Array(key, int, None), Array(value, string, 
None))) {
-  javaHiveCtx.sql(sDESCRIBE 
$tableName).registerTempTable(describe_table)
-
-
+assertResult(Array(Array(key, int), Array(value, string))) {
   javaHiveCtx
-.sql(SELECT result FROM describe_table)
+.sql(sdescribe $tableName)
 .collect()
-.map(_.getString(0).split(\t).map(_.trim))
+.map(row = Array(row.get(0).asInstanceOf[String], 
row.get(1).asInstanceOf[String]))
 .toArray
 }
 
@@ -89,7 +80,7 @@ class JavaHiveQLSuite extends FunSuite {
 TestHive.reset()
   }
 
-  ignore(Exactly once semantics for DDL and command statements) {
+  test(Exactly once semantics for DDL and command statements) {
 val tableName = test_exactly_once
 val q0 = javaHiveCtx.sql(sCREATE TABLE $tableName(key INT, value STRING))
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3645][SQL] Makes table caching eager by default and adds syntax for lazy caching

2014-10-05 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 58f5361ca - 34b97a067


[SPARK-3645][SQL] Makes table caching eager by default and adds syntax for lazy 
caching

Although lazy caching for in-memory table seems consistent with the 
`RDD.cache()` API, it's relatively confusing for users who mainly work with SQL 
and not familiar with Spark internals. The `CACHE TABLE t; SELECT COUNT(*) FROM 
t;` pattern is also commonly seen just to ensure predictable performance.

This PR makes both the `CACHE TABLE t [AS SELECT ...]` statement and the 
`SQLContext.cacheTable()` API eager by default, and adds a new `CACHE LAZY 
TABLE t [AS SELECT ...]` syntax to provide lazy in-memory table caching.

Also, took the chance to make some refactoring: `CacheCommand` and 
`CacheTableAsSelectCommand` are now merged and renamed to `CacheTableCommand` 
since the former is strictly a special case of the latter. A new 
`UncacheTableCommand` is added for the `UNCACHE TABLE t` statement.

Author: Cheng Lian lian.cs@gmail.com

Closes #2513 from liancheng/eager-caching and squashes the following commits:

fe92287 [Cheng Lian] Makes table caching eager by default and adds syntax for 
lazy caching


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/34b97a06
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/34b97a06
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/34b97a06

Branch: refs/heads/master
Commit: 34b97a067d1b370fbed8ecafab2f48501a35d783
Parents: 58f5361
Author: Cheng Lian lian.cs@gmail.com
Authored: Sun Oct 5 17:51:59 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sun Oct 5 17:51:59 2014 -0700

--
 .../apache/spark/sql/catalyst/SqlParser.scala   |  45 +++---
 .../spark/sql/catalyst/analysis/Catalog.scala   |   2 +-
 .../sql/catalyst/plans/logical/commands.scala   |  15 +-
 .../org/apache/spark/sql/CacheManager.scala |   9 +-
 .../columnar/InMemoryColumnarTableScan.scala|   2 +-
 .../spark/sql/execution/SparkStrategies.scala   |   8 +-
 .../apache/spark/sql/execution/commands.scala   |  47 +++---
 .../org/apache/spark/sql/CachedTableSuite.scala | 145 +--
 .../spark/sql/hive/ExtendedHiveQlParser.scala   |  66 -
 .../org/apache/spark/sql/hive/TestHive.scala|   6 +-
 .../spark/sql/hive/CachedTableSuite.scala   |  78 +++---
 11 files changed, 265 insertions(+), 158 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/34b97a06/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
index 2633633..854b5b4 100755
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
@@ -67,11 +67,12 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
   protected implicit def asParser(k: Keyword): Parser[String] =
 lexical.allCaseVersions(k.str).map(x = x : Parser[String]).reduce(_ | _)
 
+  protected val ABS = Keyword(ABS)
   protected val ALL = Keyword(ALL)
   protected val AND = Keyword(AND)
+  protected val APPROXIMATE = Keyword(APPROXIMATE)
   protected val AS = Keyword(AS)
   protected val ASC = Keyword(ASC)
-  protected val APPROXIMATE = Keyword(APPROXIMATE)
   protected val AVG = Keyword(AVG)
   protected val BETWEEN = Keyword(BETWEEN)
   protected val BY = Keyword(BY)
@@ -80,9 +81,9 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
   protected val COUNT = Keyword(COUNT)
   protected val DESC = Keyword(DESC)
   protected val DISTINCT = Keyword(DISTINCT)
+  protected val EXCEPT = Keyword(EXCEPT)
   protected val FALSE = Keyword(FALSE)
   protected val FIRST = Keyword(FIRST)
-  protected val LAST = Keyword(LAST)
   protected val FROM = Keyword(FROM)
   protected val FULL = Keyword(FULL)
   protected val GROUP = Keyword(GROUP)
@@ -91,42 +92,42 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
   protected val IN = Keyword(IN)
   protected val INNER = Keyword(INNER)
   protected val INSERT = Keyword(INSERT)
+  protected val INTERSECT = Keyword(INTERSECT)
   protected val INTO = Keyword(INTO)
   protected val IS = Keyword(IS)
   protected val JOIN = Keyword(JOIN)
+  protected val LAST = Keyword(LAST)
+  protected val LAZY = Keyword(LAZY)
   protected val LEFT = Keyword(LEFT)
+  protected val LIKE = Keyword(LIKE)
   protected val LIMIT = Keyword(LIMIT)
+  protected val LOWER = Keyword(LOWER)
   protected val MAX = Keyword(MAX)
   protected val MIN = Keyword(MIN)
   protected val NOT = Keyword(NOT)
   protected val NULL =

git commit: [SPARK-3776][SQL] Wrong conversion to Catalyst for Option[Product]

2014-10-05 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 34b97a067 - 90897ea5f


[SPARK-3776][SQL] Wrong conversion to Catalyst for Option[Product]

Author: Renat Yusupov re.yusu...@2gis.ru

Closes #2641 from r3natko/feature/catalyst_option and squashes the following 
commits:

55d0c06 [Renat Yusupov] [SQL] SPARK-3776: Wrong conversion to Catalyst for 
Option[Product]


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/90897ea5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/90897ea5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/90897ea5

Branch: refs/heads/master
Commit: 90897ea5f24b03c9f3455a62c7f68b3d3f0435ad
Parents: 34b97a0
Author: Renat Yusupov re.yusu...@2gis.ru
Authored: Sun Oct 5 17:56:24 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sun Oct 5 17:56:34 2014 -0700

--
 .../spark/sql/catalyst/ScalaReflection.scala|  2 +-
 .../sql/catalyst/ScalaReflectionSuite.scala | 21 +---
 2 files changed, 19 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/90897ea5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
index 88a8fa7..b3ae8e6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
@@ -33,7 +33,7 @@ object ScalaReflection {
 
   /** Converts Scala objects to catalyst rows / types */
   def convertToCatalyst(a: Any): Any = a match {
-case o: Option[_] = o.orNull
+case o: Option[_] = o.map(convertToCatalyst).orNull
 case s: Seq[_] = s.map(convertToCatalyst)
 case m: Map[_, _] = m.map { case (k, v) = convertToCatalyst(k) - 
convertToCatalyst(v) }
 case p: Product = new 
GenericRow(p.productIterator.map(convertToCatalyst).toArray)

http://git-wip-us.apache.org/repos/asf/spark/blob/90897ea5/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
index 428607d..488e373 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
@@ -53,7 +53,8 @@ case class OptionalData(
 floatField: Option[Float],
 shortField: Option[Short],
 byteField: Option[Byte],
-booleanField: Option[Boolean])
+booleanField: Option[Boolean],
+structField: Option[PrimitiveData])
 
 case class ComplexData(
 arrayField: Seq[Int],
@@ -100,7 +101,7 @@ class ScalaReflectionSuite extends FunSuite {
   nullable = true))
   }
 
-  test(optinal data) {
+  test(optional data) {
 val schema = schemaFor[OptionalData]
 assert(schema === Schema(
   StructType(Seq(
@@ -110,7 +111,8 @@ class ScalaReflectionSuite extends FunSuite {
 StructField(floatField, FloatType, nullable = true),
 StructField(shortField, ShortType, nullable = true),
 StructField(byteField, ByteType, nullable = true),
-StructField(booleanField, BooleanType, nullable = true))),
+StructField(booleanField, BooleanType, nullable = true),
+StructField(structField, schemaFor[PrimitiveData].dataType, nullable 
= true))),
   nullable = true))
   }
 
@@ -228,4 +230,17 @@ class ScalaReflectionSuite extends FunSuite {
 assert(ArrayType(IntegerType) === typeOfObject3(Seq(1, 2, 3)))
 assert(ArrayType(ArrayType(IntegerType)) === 
typeOfObject3(Seq(Seq(1,2,3
   }
+
+  test(convert PrimitiveData to catalyst) {
+val data = PrimitiveData(1, 1, 1, 1, 1, 1, true)
+val convertedData = Seq(1, 1.toLong, 1.toDouble, 1.toFloat, 1.toShort, 
1.toByte, true)
+assert(convertToCatalyst(data) === convertedData)
+  }
+
+  test(convert Option[Product] to catalyst) {
+val primitiveData = PrimitiveData(1, 1, 1, 1, 1, 1, true)
+val data = OptionalData(Some(1), Some(1), Some(1), Some(1), Some(1), 
Some(1), Some(true), Some(primitiveData))
+val convertedData = Seq(1, 1.toLong, 1.toDouble, 1.toFloat, 1.toShort, 
1.toByte, true, convertToCatalyst(primitiveData))
+assert(convertToCatalyst(data) === convertedData)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For

git commit: SPARK-3794 [CORE] Building spark core fails due to inadvertent dependency on Commons IO

2014-10-05 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 90897ea5f - 8d22dbb5e


SPARK-3794 [CORE] Building spark core fails due to inadvertent dependency on 
Commons IO

Remove references to Commons IO FileUtils and replace with pure Java version, 
which doesn't need to traverse the whole directory tree first.

I think this method could be refined further if it would be alright to rename 
it and its args and break it down into two methods. I'm starting with a simple 
recursive rendition.

Author: Sean Owen so...@cloudera.com

Closes #2662 from srowen/SPARK-3794 and squashes the following commits:

4cd172f [Sean Owen] Remove references to Commons IO FileUtils and replace with 
pure Java version, which doesn't need to traverse the whole directory tree first


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8d22dbb5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8d22dbb5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8d22dbb5

Branch: refs/heads/master
Commit: 8d22dbb5ec7a0727afdfebbbc2c57ffdb384dd0b
Parents: 90897ea
Author: Sean Owen so...@cloudera.com
Authored: Sun Oct 5 18:44:12 2014 -0700
Committer: Michael Armbrust mich...@databricks.com
Committed: Sun Oct 5 18:44:12 2014 -0700

--
 .../org/apache/spark/deploy/worker/Worker.scala |  1 -
 .../scala/org/apache/spark/util/Utils.scala | 20 ++--
 2 files changed, 10 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8d22dbb5/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
index 3b13f43..9b52cb0 100755
--- a/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
@@ -29,7 +29,6 @@ import scala.language.postfixOps
 
 import akka.actor._
 import akka.remote.{DisassociatedEvent, RemotingLifecycleEvent}
-import org.apache.commons.io.FileUtils
 
 import org.apache.spark.{Logging, SecurityManager, SparkConf, SparkException}
 import org.apache.spark.deploy.{ExecutorDescription, ExecutorState}

http://git-wip-us.apache.org/repos/asf/spark/blob/8d22dbb5/core/src/main/scala/org/apache/spark/util/Utils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/util/Utils.scala 
b/core/src/main/scala/org/apache/spark/util/Utils.scala
index a671241..3d307b3 100644
--- a/core/src/main/scala/org/apache/spark/util/Utils.scala
+++ b/core/src/main/scala/org/apache/spark/util/Utils.scala
@@ -35,8 +35,6 @@ import scala.util.control.{ControlThrowable, NonFatal}
 
 import com.google.common.io.Files
 import com.google.common.util.concurrent.ThreadFactoryBuilder
-import org.apache.commons.io.FileUtils
-import org.apache.commons.io.filefilter.TrueFileFilter
 import org.apache.commons.lang3.SystemUtils
 import org.apache.hadoop.conf.Configuration
 import org.apache.log4j.PropertyConfigurator
@@ -710,18 +708,20 @@ private[spark] object Utils extends Logging {
* Determines if a directory contains any files newer than cutoff seconds.
* 
* @param dir must be the path to a directory, or IllegalArgumentException 
is thrown
-   * @param cutoff measured in seconds. Returns true if there are any files in 
dir newer than this.
+   * @param cutoff measured in seconds. Returns true if there are any files or 
directories in the
+   *   given directory whose last modified time is later than this 
many seconds ago
*/
   def doesDirectoryContainAnyNewFiles(dir: File, cutoff: Long): Boolean = {
-val currentTimeMillis = System.currentTimeMillis
 if (!dir.isDirectory) {
-  throw new IllegalArgumentException (dir +  is not a directory!)
-} else {
-  val files = FileUtils.listFilesAndDirs(dir, TrueFileFilter.TRUE, 
TrueFileFilter.TRUE)
-  val cutoffTimeInMillis = (currentTimeMillis - (cutoff * 1000))
-  val newFiles = files.filter { _.lastModified  cutoffTimeInMillis }
-  newFiles.nonEmpty
+  throw new IllegalArgumentException($dir is not a directory!)
 }
+val filesAndDirs = dir.listFiles()
+val cutoffTimeInMillis = System.currentTimeMillis - (cutoff * 1000)
+
+filesAndDirs.exists(_.lastModified()  cutoffTimeInMillis) ||
+filesAndDirs.filter(_.isDirectory).exists(
+  subdir = doesDirectoryContainAnyNewFiles(subdir, cutoff)
+)
   }
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: Rectify gereneric parameter names between SparkContext and AccumulablePa...

2014-10-05 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master 8d22dbb5e - fd7b15539


Rectify gereneric parameter names between SparkContext and AccumulablePa...

AccumulableParam gave its generic parameters as 'R, T', whereas SparkContext 
labeled them 'T, R'.

Trivial, but really confusing.

I resolved this in favor of AccumulableParam, because it seemed to have some 
logic for its names.  I also extended this minimal, but at least present, 
justification into the SparkContext comments.

Author: Nathan Kronenfeld nkronenf...@oculusinfo.com

Closes #2637 from nkronenfeld/accumulators and squashes the following commits:

98d6b74 [Nathan Kronenfeld] Rectify gereneric parameter names between 
SparkContext and AccumulableParam


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fd7b1553
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fd7b1553
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fd7b1553

Branch: refs/heads/master
Commit: fd7b15539669b14996a51610d6724ca0811f9d65
Parents: 8d22dbb
Author: Nathan Kronenfeld nkronenf...@oculusinfo.com
Authored: Sun Oct 5 21:03:48 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sun Oct 5 21:03:48 2014 -0700

--
 core/src/main/scala/org/apache/spark/SparkContext.scala | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fd7b1553/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 97109b9..396cdd1 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -779,20 +779,20 @@ class SparkContext(config: SparkConf) extends Logging {
   /**
* Create an [[org.apache.spark.Accumulable]] shared variable, to which 
tasks can add values
* with `+=`. Only the driver can access the accumuable's `value`.
-   * @tparam T accumulator type
-   * @tparam R type that can be added to the accumulator
+   * @tparam R accumulator result type
+   * @tparam T type that can be added to the accumulator
*/
-  def accumulable[T, R](initialValue: T)(implicit param: AccumulableParam[T, 
R]) =
+  def accumulable[R, T](initialValue: R)(implicit param: AccumulableParam[R, 
T]) =
 new Accumulable(initialValue, param)
 
   /**
* Create an [[org.apache.spark.Accumulable]] shared variable, with a name 
for display in the
* Spark UI. Tasks can add values to the accumuable using the `+=` operator. 
Only the driver can
* access the accumuable's `value`.
-   * @tparam T accumulator type
-   * @tparam R type that can be added to the accumulator
+   * @tparam R accumulator result type
+   * @tparam T type that can be added to the accumulator
*/
-  def accumulable[T, R](initialValue: T, name: String)(implicit param: 
AccumulableParam[T, R]) =
+  def accumulable[R, T](initialValue: R, name: String)(implicit param: 
AccumulableParam[R, T]) =
 new Accumulable(initialValue, param, Some(name))
 
   /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3765][Doc] Add test information to sbt build docs

2014-10-05 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master fd7b15539 - c9ae79fba


[SPARK-3765][Doc] Add test information to sbt build docs

Add testing with sbt to doc ```building-spark.md```

Author: scwf wangf...@huawei.com

Closes #2629 from scwf/sbt-doc and squashes the following commits:

fd9cf29 [scwf] add testing with sbt to docs


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9ae79fb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9ae79fb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9ae79fb

Branch: refs/heads/master
Commit: c9ae79fba25cd49ca70ca398bc75434202d26a97
Parents: fd7b155
Author: scwf wangf...@huawei.com
Authored: Sun Oct 5 21:36:20 2014 -0700
Committer: Patrick Wendell pwend...@gmail.com
Committed: Sun Oct 5 21:36:28 2014 -0700

--
 docs/building-spark.md | 15 +++
 1 file changed, 15 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c9ae79fb/docs/building-spark.md
--
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 901c157..b2940ee 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -171,6 +171,21 @@ can be set to control the SBT build. For example:
 
 sbt/sbt -Pyarn -Phadoop-2.3 assembly
 
+# Testing with SBT
+
+Some of the tests require Spark to be packaged first, so always run `sbt/sbt 
assembly` the first time.  The following is an example of a correct (build, 
test) sequence:
+
+sbt/sbt -Pyarn -Phadoop-2.3 -Phive assembly
+sbt/sbt -Pyarn -Phadoop-2.3 -Phive test
+
+To run only a specific test suite as follows:
+
+sbt/sbt -Pyarn -Phadoop-2.3 -Phive test-only 
org.apache.spark.repl.ReplSuite
+
+To run test suites of a specific sub project as follows:
+
+sbt/sbt -Pyarn -Phadoop-2.3 -Phive core/test
+
 # Speeding up Compilation with Zinc
 
 [Zinc](https://github.com/typesafehub/zinc) is a long-running server version 
of SBT's incremental


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [SPARK-3597][Mesos] Implement `killTask`.

git commit: [SPARK-3597][Mesos] Implement `killTask`.

git commit: SPARK-1656: Fix potential resource leaks

git commit: SPARK-1656: Fix potential resource leaks

git commit: [SPARK-3007][SQL] Fixes dynamic partitioning support for lower Hadoop versions

git commit: HOTFIX: Fix unicode error in merge script.

git commit: [SPARK-3792][SQL] Enable JavaHiveQLSuite

git commit: [SPARK-3792][SQL] Enable JavaHiveQLSuite

git commit: [SPARK-3645][SQL] Makes table caching eager by default and adds syntax for lazy caching

git commit: [SPARK-3776][SQL] Wrong conversion to Catalyst for Option[Product]

git commit: SPARK-3794 [CORE] Building spark core fails due to inadvertent dependency on Commons IO

git commit: Rectify gereneric parameter names between SparkContext and AccumulablePa...

git commit: [SPARK-3765][Doc] Add test information to sbt build docs

13 matches

Site Navigation

Mail list logo

Footer information