date:20160315

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/11751#issuecomment-197164762
  
LGTM assuming that MiMa checks pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284643
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala ---
@@ -81,8 +81,9 @@ private[spark] class TaskResultGetter(sparkEnv: SparkEnv, 
scheduler: TaskSchedul
   taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
 return
   }
+  // TODO(josh): assumption that there is only one chunk here 
is a hack
   val deserializedResult = 
serializer.get().deserialize[DirectTaskResult[_]](
-serializedTaskResult.get)
+serializedTaskResult.get.getChunks().head)
--- End diff --

This is a messy temporary hack caused by some interface mismatching; I'll 
see about fixing this so that we don't make any assumptions about there only 
being one chunk here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13281][CORE] Switch broadcast of RDD to...

2016-03-15 Thread breakdawn

Github user breakdawn commented on the pull request:

https://github.com/apache/spark/pull/11735#issuecomment-197163774
  
Is it ok to test, or is there any action i can follow up?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13316][Streaming]add check to avoid reg...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11753#issuecomment-197163615
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53275/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13316][Streaming]add check to avoid reg...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11753#issuecomment-197163614
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13316][Streaming]add check to avoid reg...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11753#issuecomment-197163501
  
**[Test build #53275 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53275/consoleFull)**
 for PR 11753 at commit 
[`51a0399`](https://github.com/apache/spark/commit/51a03994ae3f66fa80b41a488f27b3fe94abe0ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284364
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.io
+
+import java.nio.ByteBuffer
+
+import com.google.common.io.ByteStreams
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.util.io.ChunkedByteBuffer
+
+class ChunkedByteBufferSuite extends SparkFunSuite {
+
+  test("must have at least one chunk") {
--- End diff --

I think it marginally simplified some initialization code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284334
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
+if (limit >= Integer.MAX_VALUE) {
+  throw new UnsupportedOperationException(
+s"cannot call toArray because buffer size ($limit bytes) exceeds 
maximum array size")
+}
+val byteChannel = new ByteArrayWritableChannel(limit.toInt)
+writeFully(byteChannel)
+byteChannel.close()
+byteChannel.getData
+  }
+
+  def toByteBuffer: ByteBuffer = {
+if (chunks.length == 1) {
+  chunks.head
+} else {
+  ByteBuffer.wrap(toArray)
+}
+  }
+
+  def toInputStream(dispose: Boolean = false): InputStream = {
+new ChunkedByteBufferInputStream(this, dispose)
+  }
+
+  def getChunks(): Array[ByteBuffer] = {
+chunks.map(_.duplicate())
+  }
+
+  def copy(): ChunkedByteBuffer = {
+val copiedChunks = getChunks().map { chunk =>
+  // TODO: accept an allocator in this copy method to integrate with 
mem. accounting systems
+  val newChunk = ByteBuffer.allocate(chunk.limit())
+  newChunk.put(chunk)
+  newChunk.flip()
+  newChunk
+}
+new ChunkedByteBuffer(copiedChunks)
+  }
+
+  /**
+   * Attempt to clean up a ByteBuffer if it is memory-mapped. This uses an 
*unsafe* Sun API that
+   * might cause errors if one attempts to read from the unmapped buffer, 
but it's better than
+   * waiting for the GC to find it because that could lead to huge numbers 
of open files. There's
+   * unfortunately no standard API to do this.
+   */
+  def dispose(): Unit = {
+chunks.foreach(BlockManager.dispose)
+  }
+}
+
+/**
+ * Reads data from a ChunkedByteBuffer, and optionally cleans it up using 
BlockManager.dispose()
+ * at the end of the stream (e.g. to close a memory-mapped file).
+ */
+private class ChunkedByteBufferInputStream(
+var chunkedByteBuffer: ChunkedByteBuffer,
+dispose: Boolean)
--- End diff --

Yep, will do. The semantics of `dispose()` are a bit confusing given some 
other weird semantics surrounding its use elsewhere in the BlockManager.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/11751#issuecomment-197162819
  
@JoshRosen , I removed the legacy code and related comments now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284289
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala ---
@@ -81,8 +81,9 @@ private[spark] class TaskResultGetter(sparkEnv: SparkEnv, 
scheduler: TaskSchedul
   taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
 return
   }
+  // TODO(josh): assumption that there is only one chunk here 
is a hack
   val deserializedResult = 
serializer.get().deserialize[DirectTaskResult[_]](
-serializedTaskResult.get)
+serializedTaskResult.get.getChunks().head)
--- End diff --

i guess it is because on the write side we enforced it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284273
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala ---
@@ -81,8 +81,9 @@ private[spark] class TaskResultGetter(sparkEnv: SparkEnv, 
scheduler: TaskSchedul
   taskSetManager, tid, TaskState.FINISHED, TaskResultLost)
 return
   }
+  // TODO(josh): assumption that there is only one chunk here 
is a hack
   val deserializedResult = 
serializer.get().deserialize[DirectTaskResult[_]](
-serializedTaskResult.get)
+serializedTaskResult.get.getChunks().head)
--- End diff --

this is not safe to do is it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284277
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
+if (limit >= Integer.MAX_VALUE) {
+  throw new UnsupportedOperationException(
+s"cannot call toArray because buffer size ($limit bytes) exceeds 
maximum array size")
+}
+val byteChannel = new ByteArrayWritableChannel(limit.toInt)
+writeFully(byteChannel)
+byteChannel.close()
+byteChannel.getData
+  }
+
+  def toByteBuffer: ByteBuffer = {
+if (chunks.length == 1) {
+  chunks.head
+} else {
+  ByteBuffer.wrap(toArray)
+}
+  }
+
+  def toInputStream(dispose: Boolean = false): InputStream = {
+new ChunkedByteBufferInputStream(this, dispose)
+  }
+
+  def getChunks(): Array[ByteBuffer] = {
+chunks.map(_.duplicate())
+  }
+
+  def copy(): ChunkedByteBuffer = {
+val copiedChunks = getChunks().map { chunk =>
+  // TODO: accept an allocator in this copy method to integrate with 
mem. accounting systems
+  val newChunk = ByteBuffer.allocate(chunk.limit())
+  newChunk.put(chunk)
+  newChunk.flip()
+  newChunk
+}
+new ChunkedByteBuffer(copiedChunks)
+  }
+
+  /**
+   * Attempt to clean up a ByteBuffer if it is memory-mapped. This uses an 
*unsafe* Sun API that
+   * might cause errors if one attempts to read from the unmapped buffer, 
but it's better than
+   * waiting for the GC to find it because that could lead to huge numbers 
of open files. There's
+   * unfortunately no standard API to do this.
+   */
+  def dispose(): Unit = {
+chunks.foreach(BlockManager.dispose)
--- End diff --

Yeah, we should definitely move it into a different companion object or 
utilities class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13885][YARN] Fix attempt id regression ...

2016-03-15 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/11721#discussion_r56284175
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -374,12 +374,6 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   throw new SparkException("An application name must be set in your 
configuration")
 }
 
-// System property spark.yarn.app.id must be set if user code ran by 
AM on a YARN cluster
-if (master == "yarn" && deployMode == "cluster" && 
!_conf.contains("spark.yarn.app.id")) {
--- End diff --

I see, removing this will still get other error messages if trying to run 
cluster mode with such way, but maybe a little odd as you mentioned.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11751#issuecomment-197161777
  
**[Test build #53279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53279/consoleFull)**
 for PR 11751 at commit 
[`16121b0`](https://github.com/apache/spark/commit/16121b04daff0e4db86bcd2ff2f1b904b067f23d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284073
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
+if (limit >= Integer.MAX_VALUE) {
+  throw new UnsupportedOperationException(
+s"cannot call toArray because buffer size ($limit bytes) exceeds 
maximum array size")
+}
+val byteChannel = new ByteArrayWritableChannel(limit.toInt)
+writeFully(byteChannel)
+byteChannel.close()
+byteChannel.getData
+  }
+
+  def toByteBuffer: ByteBuffer = {
+if (chunks.length == 1) {
+  chunks.head
+} else {
+  ByteBuffer.wrap(toArray)
+}
+  }
+
+  def toInputStream(dispose: Boolean = false): InputStream = {
+new ChunkedByteBufferInputStream(this, dispose)
+  }
+
+  def getChunks(): Array[ByteBuffer] = {
+chunks.map(_.duplicate())
+  }
+
+  def copy(): ChunkedByteBuffer = {
+val copiedChunks = getChunks().map { chunk =>
+  // TODO: accept an allocator in this copy method to integrate with 
mem. accounting systems
+  val newChunk = ByteBuffer.allocate(chunk.limit())
+  newChunk.put(chunk)
+  newChunk.flip()
+  newChunk
+}
+new ChunkedByteBuffer(copiedChunks)
+  }
+
+  /**
+   * Attempt to clean up a ByteBuffer if it is memory-mapped. This uses an 
*unsafe* Sun API that
+   * might cause errors if one attempts to read from the unmapped buffer, 
but it's better than
+   * waiting for the GC to find it because that could lead to huge numbers 
of open files. There's
+   * unfortunately no standard API to do this.
+   */
+  def dispose(): Unit = {
+chunks.foreach(BlockManager.dispose)
--- End diff --

maybe move BlockManager.dispose into an util and call it there? we reduces 
some coupling this way.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56284010
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
+if (limit >= Integer.MAX_VALUE) {
+  throw new UnsupportedOperationException(
+s"cannot call toArray because buffer size ($limit bytes) exceeds 
maximum array size")
+}
+val byteChannel = new ByteArrayWritableChannel(limit.toInt)
+writeFully(byteChannel)
+byteChannel.close()
+byteChannel.getData
+  }
+
+  def toByteBuffer: ByteBuffer = {
+if (chunks.length == 1) {
+  chunks.head
+} else {
+  ByteBuffer.wrap(toArray)
+}
+  }
+
+  def toInputStream(dispose: Boolean = false): InputStream = {
+new ChunkedByteBufferInputStream(this, dispose)
+  }
+
+  def getChunks(): Array[ByteBuffer] = {
+chunks.map(_.duplicate())
+  }
+
+  def copy(): ChunkedByteBuffer = {
+val copiedChunks = getChunks().map { chunk =>
+  // TODO: accept an allocator in this copy method to integrate with 
mem. accounting systems
+  val newChunk = ByteBuffer.allocate(chunk.limit())
+  newChunk.put(chunk)
+  newChunk.flip()
+  newChunk
+}
+new ChunkedByteBuffer(copiedChunks)
+  }
+
+  /**
+   * Attempt to clean up a ByteBuffer if it is memory-mapped. This uses an 
*unsafe* Sun API that
+   * might cause errors if one attempts to read from the unmapped buffer, 
but it's better than
+   * waiting for the GC to find it because that could lead to huge numbers 
of open files. There's
+   * unfortunately no standard API to do this.
+   */
+  def dispose(): Unit = {
+chunks.foreach(BlockManager.dispose)
+  }
+}
+
+/**
+ * Reads data from a ChunkedByteBuffer, and optionally cleans it up using 
BlockManager.dispose()
+ * at the end of the stream (e.g. to close a memory-mapped file).
+ */
+private class ChunkedByteBufferInputStream(
+var chunkedByteBuffer: ChunkedByteBuffer,
+dispose: Boolean)
--- End diff --

never mind the close part makes sense. just document dispose.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56283999
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
--- End diff --

same for toByteBuffer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56283991
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
--- End diff --

should document this throws exceptions if size doesn't fit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13894][SQL] SqlContext.range return typ...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11730#issuecomment-197161220
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53266/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56283895
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.io
+
+import java.nio.ByteBuffer
+
+import com.google.common.io.ByteStreams
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.util.io.ChunkedByteBuffer
+
+class ChunkedByteBufferSuite extends SparkFunSuite {
+
+  test("must have at least one chunk") {
--- End diff --

any reason we want to enforce this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13894][SQL] SqlContext.range return typ...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11730#issuecomment-197161216
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11748#discussion_r56283941
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.io
+
+import java.io.InputStream
+import java.nio.ByteBuffer
+import java.nio.channels.WritableByteChannel
+
+import com.google.common.primitives.UnsignedBytes
+import io.netty.buffer.{ByteBuf, Unpooled}
+
+import org.apache.spark.network.util.ByteArrayWritableChannel
+import org.apache.spark.storage.BlockManager
+
+private[spark] class ChunkedByteBuffer(var chunks: Array[ByteBuffer]) {
+  require(chunks != null, "chunks must not be null")
+  require(chunks.nonEmpty, "Cannot create a ChunkedByteBuffer with no 
chunks")
+  require(chunks.forall(_.limit() > 0), "chunks must be non-empty")
+  require(chunks.forall(_.position() == 0), "chunks' positions must be 0")
+
+  val limit: Long = chunks.map(_.limit().asInstanceOf[Long]).sum
+
+  def this(byteBuffer: ByteBuffer) = {
+this(Array(byteBuffer))
+  }
+
+  def writeFully(channel: WritableByteChannel): Unit = {
+for (bytes <- getChunks()) {
+  while (bytes.remaining > 0) {
+channel.write(bytes)
+  }
+}
+  }
+
+  def toNetty: ByteBuf = {
+Unpooled.wrappedBuffer(getChunks(): _*)
+  }
+
+  def toArray: Array[Byte] = {
+if (limit >= Integer.MAX_VALUE) {
+  throw new UnsupportedOperationException(
+s"cannot call toArray because buffer size ($limit bytes) exceeds 
maximum array size")
+}
+val byteChannel = new ByteArrayWritableChannel(limit.toInt)
+writeFully(byteChannel)
+byteChannel.close()
+byteChannel.getData
+  }
+
+  def toByteBuffer: ByteBuffer = {
+if (chunks.length == 1) {
+  chunks.head
+} else {
+  ByteBuffer.wrap(toArray)
+}
+  }
+
+  def toInputStream(dispose: Boolean = false): InputStream = {
+new ChunkedByteBufferInputStream(this, dispose)
+  }
+
+  def getChunks(): Array[ByteBuffer] = {
+chunks.map(_.duplicate())
+  }
+
+  def copy(): ChunkedByteBuffer = {
+val copiedChunks = getChunks().map { chunk =>
+  // TODO: accept an allocator in this copy method to integrate with 
mem. accounting systems
+  val newChunk = ByteBuffer.allocate(chunk.limit())
+  newChunk.put(chunk)
+  newChunk.flip()
+  newChunk
+}
+new ChunkedByteBuffer(copiedChunks)
+  }
+
+  /**
+   * Attempt to clean up a ByteBuffer if it is memory-mapped. This uses an 
*unsafe* Sun API that
+   * might cause errors if one attempts to read from the unmapped buffer, 
but it's better than
+   * waiting for the GC to find it because that could lead to huge numbers 
of open files. There's
+   * unfortunately no standard API to do this.
+   */
+  def dispose(): Unit = {
+chunks.foreach(BlockManager.dispose)
+  }
+}
+
+/**
+ * Reads data from a ChunkedByteBuffer, and optionally cleans it up using 
BlockManager.dispose()
+ * at the end of the stream (e.g. to close a memory-mapped file).
+ */
+private class ChunkedByteBufferInputStream(
+var chunkedByteBuffer: ChunkedByteBuffer,
+dispose: Boolean)
--- End diff --

we should document what dispose means. does it mean releasing the buffer 
upon close? why would we ever want to do that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail:

[GitHub] spark pull request: [SPARK-13894][SQL] SqlContext.range return typ...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11730#issuecomment-197160978
  
**[Test build #53266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53266/consoleFull)**
 for PR 11730 at commit 
[`31967ac`](https://github.com/apache/spark/commit/31967ac4a97174c11706930e09a509b08cdc7d09).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9837] [ML] R-like summary statistics fo...

2016-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11694


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11723#issuecomment-197159617
  
Hmm I'm not sure if it makes sense to create a public API for this at this 
point. This is directly exposing something that is very internal to the current 
implementation of Spark (SchedulerBackend), and these are by no means stable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9837] [ML] R-like summary statistics fo...

2016-03-15 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/11694#issuecomment-197159610
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/11751#discussion_r56283381
  
--- Diff: 
tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala ---
@@ -39,12 +39,14 @@ import org.clapper.classutil.ClassFinder
 object GenerateMIMAIgnore {
   private val classLoader = Thread.currentThread().getContextClassLoader
   private val mirror = runtimeMirror(classLoader)
+  // SPARK-13920: MIMA checks should apply to @Experimental and 
@DeveloperAPI APIs
+  private val ignoreExpDevApi = false
--- End diff --

Oh, I see. I'll remove all then. Thank you for fast review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197157889
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197157899
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53268/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13917] [SQL] generate broadcast semi jo...

2016-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11742


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197157762
  
**[Test build #53268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53268/consoleFull)**
 for PR 11645 at commit 
[`8123818`](https://github.com/apache/spark/commit/81238189f560f5df846b36e5cb5e50532f6f82a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13917] [SQL] generate broadcast semi jo...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11742#issuecomment-197157622
  
**[Test build #53278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53278/consoleFull)**
 for PR 11742 at commit 
[`f1ca6f2`](https://github.com/apache/spark/commit/f1ca6f250d85ee9ba9fcdbb308486588a527eb87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13918] [SQL] Merge SortMergeJoin and So...

2016-03-15 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11743#issuecomment-197157236
  
Will do that when implement codegen support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/11751#discussion_r56282812
  
--- Diff: 
tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala ---
@@ -39,12 +39,14 @@ import org.clapper.classutil.ClassFinder
 object GenerateMIMAIgnore {
   private val classLoader = Thread.currentThread().getContextClassLoader
   private val mirror = runtimeMirror(classLoader)
+  // SPARK-13920: MIMA checks should apply to @Experimental and 
@DeveloperAPI APIs
+  private val ignoreExpDevApi = false
--- End diff --

Rather than adding an on/off switch, I'd just go ahead and completely 
remove these methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13885][YARN] Fix attempt id regression ...

2016-03-15 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11721#issuecomment-197156547
  
I think `SchedulerExtensionService` can still be worked, since I still 
maintain the full attempt id in `YarnScheduler`, only change to the simple 
counter for Spark scheduler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-529] [core] [yarn] Add type-safe config...

2016-03-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10205#issuecomment-197156120
  
@vanzin I finally had some time to look over this change. I like the 
direction this is going to apply more semantics for configs, but I find the 
pre-existing SQLConf much easier to understand as its class structure is much 
simpler.

A few questions/comments:

- What's the difference between OptionalConfigEntry and something that has 
default value null? I only find OptionalConfigEntry used in the setter, but 
nothing specific to a getter.

- Why is the builder pattern better than just a ctor with default argument 
values? 

- Function naming is inconsistent. E.g. "withDefault" and "doc" vs "withDoc"

- The convention is for state mutation functions (e.g. internal) to have 
parentheses. Right now they look like some variable that's being accessed.

- Do we really need all these classes? Even if we use the builder pattern, 
I'd imagine we can live with only two classes, a builder and a config. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13903][SQL] Modify output nullability w...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11722#issuecomment-197155618
  
**[Test build #53277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53277/consoleFull)**
 for PR 11722 at commit 
[`93a73b7`](https://github.com/apache/spark/commit/93a73b79fbeee9c1bd722f5aa66257409d3c2512).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11723#issuecomment-197154731
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11723#issuecomment-197154732
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53264/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11723#issuecomment-197154650
  
**[Test build #53264 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53264/consoleFull)**
 for PR 11723 at commit 
[`800834f`](https://github.com/apache/spark/commit/800834f24ad1f0c4a68d8d49f600db6570d100ef).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ExternalClusterManager `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11720#issuecomment-197154174
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11720#issuecomment-197154175
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53265/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13902][SCHEDULER] Make DAGScheduler.get...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11720#issuecomment-197154079
  
**[Test build #53265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53265/consoleFull)**
 for PR 11720 at commit 
[`0ea3fc8`](https://github.com/apache/spark/commit/0ea3fc838f689729794b6ea3aaf0b88a339ec20c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13852][YARN]handle the InterruptedExcep...

2016-03-15 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11692#issuecomment-197154066
  
I see your point, so the real issue should only be the log issue.

But marking the state as `FINISHED` with `SUCCEED` should be open to 
question, since here we don't know the real state of application, if it is 
finished with failure, say job is aborted due to continuous stage failure, 
should we still mark this as `SUCCEED`? So here I'm conservative to this 
change, because:

1. This issue is happened rarely and does no harm to the result of 
application.
2. We cannot get the real exit state of application, so marking as 
`SUCCEED` should be open to question.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13924][SQL] officially support multi-in...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11754#issuecomment-197153176
  
**[Test build #53274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53274/consoleFull)**
 for PR 11754 at commit 
[`1ff3dec`](https://github.com/apache/spark/commit/1ff3dec5f73e53aa5a4fb5ab86a9d55e62edc202).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/11713#issuecomment-197153189
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13903][SQL] Modify output nullability w...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11722#issuecomment-197153188
  
**[Test build #53276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53276/consoleFull)**
 for PR 11722 at commit 
[`5bf4b4b`](https://github.com/apache/spark/commit/5bf4b4b544ef2aa25d93c974e94f8314a6626ef7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13316][Streaming]add check to avoid reg...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11753#issuecomment-197153177
  
**[Test build #53275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53275/consoleFull)**
 for PR 11753 at commit 
[`51a0399`](https://github.com/apache/spark/commit/51a03994ae3f66fa80b41a488f27b3fe94abe0ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13924][SQL] officially support multi-in...

2016-03-15 Thread cloud-fan

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11754#issuecomment-197152664
  
cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11752#issuecomment-197152647
  
cc @yhuai 

(Since the JIRA is pretty old one, I was confused if I should make a 
follow-up like this but I just made this since it is a follow-up for the 
support of rows wrapped with an array).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13924][SQL] officially support multi-in...

2016-03-15 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/11754

[SPARK-13924][SQL] officially support multi-insert

## What changes were proposed in this pull request?

There is a feature of hive SQL called multi-insert. For example:
```
FROM src
INSERT OVERWRITE TABLE dest1
SELECT key + 1
INSERT OVERWRITE TABLE dest2
SELECT key WHERE key > 2
INSERT OVERWRITE TABLE dest3
SELECT col EXPLODE(arr) exp AS col
...
```

We partially support it currently, with some limitations: 1) WHERE can't 
reference columns produced by LATERAL VIEW. 2) It's not executed eagerly, i.e. 
`sql("...multi-insert clause...")` won't take place right away like other 
commands, e.g. CREATE TABLE.

This PR removes these limitations and make us fully support multi-insert.

## How was this patch tested?

new tests in `SQLQuerySuite`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark lateral-view

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11754


commit 1ff3dec5f73e53aa5a4fb5ab86a9d55e62edc202
Author: Wenchen Fan 
Date:   2016-03-16T04:35:10Z

officially support multi-insert




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11752#issuecomment-197152653
  
Just to make sure, I am doing this partly due to 
[SPARK-13764](https://issues.apache.org/jira/browse/SPARK-13764), which deals 
with parse modes just like in CSV data sources. So, the behaviour for failed 
records should be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9837] [ML] R-like summary statistics fo...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11694#issuecomment-197152626
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53269/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9837] [ML] R-like summary statistics fo...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11694#issuecomment-197152624
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9837] [ML] R-like summary statistics fo...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11694#issuecomment-197152557
  
**[Test build #53269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53269/consoleFull)**
 for PR 11694 at commit 
[`f89cdf0`](https://github.com/apache/spark/commit/f89cdf01d94faab7d6f9372df35033afe695f8aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11713#issuecomment-197152507
  
cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11752#issuecomment-197152532
  
Just to make sure, I am doing this partly due to 
[SPARK-13764](https://issues.apache.org/jira/browse/SPARK-13764), which deals 
with parse modes just like in CSV data sources. So, the behaviour for failed 
records should be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13316][Streaming]add check to avoid reg...

2016-03-15 Thread mwws

GitHub user mwws opened a pull request:

https://github.com/apache/spark/pull/11753

[SPARK-13316][Streaming]add check to avoid registering new DStream when 
recovering from CP

When creating a recoverable streaming job, it must be no new DStream 
registered after a StreamingContext has been recreated from checkpoint. Or you 
will see following exception:
```
org.apache.spark.SparkException: 
org.apache.spark.streaming.dstream.ConstantInputDStream@724797ab has not been 
initialized
  at 
org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:311)
  at 
org.apache.spark.streaming.dstream.InputDStream.isTimeValid(InputDStream.scala:89)
  at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:332)
  at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:332)
  at scala.Option.orElse(Option.scala:289)
  at 
org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:329)
  at 
org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
  at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:117)
  at 
org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
```

This PR is to add meaningful error message to make it obvious at first. 

I manually tested the PR with following repo code
```scala
def createStreamingContext(): StreamingContext = {
val ssc = new StreamingContext(sparkConf, Duration(1000))
ssc.checkpoint(checkpointDir)
ssc
}
val ssc = StreamingContext.getOrCreate(checkpointDir), 
createStreamingContext)

val socketStream = ssc.socketTextStream(...)
socketStream.checkpoint(Seconds(1))
socketStream.foreachRDD(...)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mwws/spark SPARK-13316

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11753


commit 51a03994ae3f66fa80b41a488f27b3fe94abe0ff
Author: mwws 
Date:   2016-03-16T04:20:43Z

add check to avoid registering new DStream when recovering from Checkpoint




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11752#issuecomment-197152181
  
cc @yhuai 

(Since the JIRA is pretty old one, I was confused if I should make a 
follow-up like this but I just made this since it is a follow-up for the 
support of rows wrapped with an array).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11752#issuecomment-197152249
  
**[Test build #53273 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53273/consoleFull)**
 for PR 11752 at commit 
[`4ab1e80`](https://github.com/apache/spark/commit/4ab1e80e8bca868fc2387cf74004cc44b6e4664a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][TEST][SQL] Remove wrong "expected" par...

2016-03-15 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11718


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows h...

2016-03-15 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/11752

[SPARK-3308][SQL][FOLLOW-UP] Parse JSON rows having an array type and a 
struct type in the same fieild

## What changes were proposed in this pull request?

This https://github.com/apache/spark/pull/2400 added the support to parse 
JSON rows wrapped with an array. However, when the given data contains array 
data and struct data in the same field as below:

```json
{"a": {"b": 1}}
{"a": []}
```

and the schema is given as below:

```scala
val schema =
  StructType(
StructField("a", StructType(
  StructField("b", StringType) :: Nil
)) :: Nil)
```

This throws an exception below:
```java
Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
to stage failure: Task 7 in stage 0.0 failed 4 times, most recent failure: Lost 
task 7.3 in stage 0.0 (TID 10, 192.168.1.170): java.lang.ClassCastException: 
org.apache.spark.sql.types.GenericArrayData cannot be cast to 
org.apache.spark.sql.catalyst.InternalRow
at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getStruct(rows.scala:50)
at 
org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getStruct(rows.scala:247)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown
 Source)
...
```

For other data types, in this case it converts the given values are `null` 
but only this case emits an exception. 

This PR makes the support for wrapped rows applied only at the top level.

## How was this patch tested?

Unit tests were used and `./dev/run_tests` for code style tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-3308-follow-up

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11752


commit 4ab1e80e8bca868fc2387cf74004cc44b6e4664a
Author: hyukjinkwon 
Date:   2016-03-16T04:36:56Z

Parse JSON rows having an array type and a struct type in the same field.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][TEST][SQL] Remove wrong "expected" par...

2016-03-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11718#issuecomment-197151890
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][TEST][SQL] Remove wrong "expected" par...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11718#issuecomment-197151328
  
**[Test build #2643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2643/consoleFull)**
 for PR 11718 at commit 
[`0bd1bbc`](https://github.com/apache/spark/commit/0bd1bbc9fa267e3c506d1f51495efa58a1b88796).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13898][SQL] Merge DatasetHolder and Dat...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11737#issuecomment-197151199
  
**[Test build #53272 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53272/consoleFull)**
 for PR 11737 at commit 
[`e422f52`](https://github.com/apache/spark/commit/e422f529e94506905563d0b487851a891b146605).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13903][SQL] Modify output nullability w...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11722#issuecomment-197149789
  
**[Test build #53271 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53271/consoleFull)**
 for PR 11722 at commit 
[`aef73d5`](https://github.com/apache/spark/commit/aef73d5e71b9c997ac98a7172036c7c79f9e9b1c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11751#issuecomment-197149790
  
**[Test build #53270 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53270/consoleFull)**
 for PR 11751 at commit 
[`b5e24a4`](https://github.com/apache/spark/commit/b5e24a41cd2b2bf021ca1edf115f26ad8b467ca6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13920][BUILD] MIMA checks should apply ...

2016-03-15 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/11751

[SPARK-13920][BUILD] MIMA checks should apply to @Experimental and 
@DeveloperAPI APIs

## What changes were proposed in this pull request?

We are able to change `@Experimental` and `@DeveloperAPI` API freely but 
also should monitor and manage those API carefully. This PR for 
[SPARK-13920](https://issues.apache.org/jira/browse/SPARK-13920) enables MiMa 
check for  by the followings.

- Adds a on/off switch flag code for `@Experimental` and `@Developer` API.
- Adds MiMa filters for `@Experimental` and `@Developer` API.

Note that this PR does not delete the existing functionality. We can turn 
off this feature if needed.

## How was this patch tested?

Pass the Jenkins tests (including MiMa).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-13920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11751.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11751


commit b5e24a41cd2b2bf021ca1edf115f26ad8b467ca6
Author: Dongjoon Hyun 
Date:   2016-03-16T04:19:11Z

[SPARK-13920][BUILD] MIMA checks should apply to @Experimental and 
@DeveloperAPI APIs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13801][SQL] DataFrame.col should return...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11632#issuecomment-197144342
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53261/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13801][SQL] DataFrame.col should return...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11632#issuecomment-197144341
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13801][SQL] DataFrame.col should return...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11632#issuecomment-197144257
  
**[Test build #53261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53261/consoleFull)**
 for PR 11632 at commit 
[`6faf40e`](https://github.com/apache/spark/commit/6faf40e554d28ef625d3d818328254fe16849a52).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9837] [ML] R-like summary statistics fo...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11694#issuecomment-197143000
  
**[Test build #53269 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53269/consoleFull)**
 for PR 11694 at commit 
[`f89cdf0`](https://github.com/apache/spark/commit/f89cdf01d94faab7d6f9372df35033afe695f8aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-529] [sql] Modify SQLConf to use new co...

2016-03-15 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11570#issuecomment-197142803
  
@vanzin I don't think anybody gets a ping message unless you explicitly cc 
them. Will take a look.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13600] [MLlib] Use approxQuantile from ...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-197140007
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13600] [MLlib] Use approxQuantile from ...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-197140010
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53267/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13600] [MLlib] Use approxQuantile from ...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-197139782
  
**[Test build #53267 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53267/consoleFull)**
 for PR 11553 at commit 
[`18a3ec6`](https://github.com/apache/spark/commit/18a3ec6b8e9357c0055340cc4a4860c450999366).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197138848
  
**[Test build #53268 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53268/consoleFull)**
 for PR 11645 at commit 
[`8123818`](https://github.com/apache/spark/commit/81238189f560f5df846b36e5cb5e50532f6f82a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...

2016-03-15 Thread nongli

Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11749#issuecomment-197135193
  
Can you add some test cases to columnarbatchsuite that exercises this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13852][YARN]handle the InterruptedExcep...

2016-03-15 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/11692#issuecomment-197135484
  
the application is finished successfully(RM UI also show success state) but 
log shows it failed, that's the problem i think.

yeah you're right sleep method can throw InterruptedException. this pr is 
trying to fix the problem we find in RM HA switching. 

what i am trying to say is that interrupting a monitor thread should not 
print the failed message in log.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...

2016-03-15 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11749#discussion_r56277661
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java
 ---
@@ -58,6 +58,9 @@
   // True if the row is filtered.
   private final boolean[] filteredRows;
 
+  // Column indices that cannot have null values.
+  private final Vector nullFilteredColumns;
--- End diff --

change this to a set. I think it's more accurate and makes this easier to 
use.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13883][SQL] Parquet Implementation...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11709#issuecomment-197134877
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13883][SQL] Parquet Implementation...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11709#issuecomment-197134881
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53262/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13883][SQL] Parquet Implementation...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11709#issuecomment-197134624
  
**[Test build #53262 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53262/consoleFull)**
 for PR 11709 at commit 
[`122f572`](https://github.com/apache/spark/commit/122f572a9bfb7ebdda66fae0fd70d92c9d4d6524).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...

2016-03-15 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11749#discussion_r56277607
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java
 ---
@@ -284,11 +287,23 @@ public void reset() {
   }
 
   /**
-   * Sets the number of rows that are valid.
+   * Sets the number of rows that are valid. Additionally, marks all rows 
as "filtered" if one or
+   * more of their attributes are part of a non-nullable column.
*/
   public void setNumRows(int numRows) {
-assert(numRows <= this.capacity);
+assert (numRows <= this.capacity);
--- End diff --

did you mean to add a space?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...

2016-03-15 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11749#discussion_r56277494
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadBenchmark.scala
 ---
@@ -299,10 +299,112 @@ object ParquetReadBenchmark {
 }
   }
 
+  def stringWithNullsScanBenchmark(values: Int, fractionOfNulls: Double): 
Unit = {
+withTempPath { dir =>
+  withTempTable("t1", "tempTable") {
+sqlContext.range(values).registerTempTable("t1")
+sqlContext.sql(s"select IF(rand(1) < $fractionOfNulls, NULL, 
cast(id as STRING)) as c1, " +
+  s"IF(rand(2) < $fractionOfNulls, NULL, cast(id as STRING)) as c2 
from t1")
+  .write.parquet(dir.getCanonicalPath)
+
sqlContext.read.parquet(dir.getCanonicalPath).registerTempTable("tempTable")
+
+val benchmark = new Benchmark("String with Nulls Scan", values)
+
+benchmark.addCase("SQL Parquet Vectorized") { iter =>
+  sqlContext.sql("select sum(length(c2)) from tempTable where c1 
is " +
+"not NULL and c2 is not NULL").collect()
+}
+
+val files = 
SpecificParquetRecordReaderBase.listDirectory(dir).toArray
+benchmark.addCase("PR Vectorized") { num =>
+  var sum = 0
+  files.map(_.asInstanceOf[String]).foreach { p =>
+val reader = new UnsafeRowParquetRecordReader
+try {
+  reader.initialize(p, ("c1" :: "c2" :: Nil).asJava)
+  val batch = reader.resultBatch()
+  while (reader.nextBatch()) {
+val rowIterator = batch.rowIterator()
+while (rowIterator.hasNext) {
+  val row = rowIterator.next()
+  val value = row.getUTF8String(0)
+  if (!row.isNullAt(0) && !row.isNullAt(1)) sum += 
value.numBytes()
+}
+  }
+} finally {
+  reader.close()
+}
+  }
+}
+
+benchmark.addCase("PR Vectorized (Null Filtering)") { num =>
+  var sum = 0L
+  files.map(_.asInstanceOf[String]).foreach { p =>
+val reader = new UnsafeRowParquetRecordReader
+try {
+  reader.initialize(p, ("c1" :: "c2" :: Nil).asJava)
+  val batch = reader.resultBatch()
+  batch.filterNullsInColumn(0)
+  batch.filterNullsInColumn(1)
+  while (reader.nextBatch()) {
+val rowIterator = batch.rowIterator()
+while (rowIterator.hasNext) {
+  sum += rowIterator.next().getUTF8String(0).numBytes()
+}
+  }
+} finally {
+  reader.close()
+}
+  }
+}
+
+/*
+===
+Fraction of NULLs: 0
+===
+
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+String with Nulls Scan: Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative
+
---
+SQL Parquet Vectorized   1164 / 1333  9.0  
   111.0   1.0X
+PR Vectorized 809 /  882 13.0  
77.1   1.4X
+PR Vectorized (Null Filtering)723 /  800 14.5  
69.0   1.6X
+
+===
+Fraction of NULLs: 0.5
+===
+
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+String with Nulls Scan: Best/Avg Time(ms)Rate(M/s) 
  Per Row(ns)   Relative
+
---
+SQL Parquet Vectorized983 / 1001 10.7  
93.8   1.0X
+PR Vectorized 699 /  728 15.0  
66.7   1.4X
+PR Vectorized (Null Filtering)722 /  746 14.5  
68.9   1.4X
+
+===
--- End diff --

Instead of commenting it this way, can you put the fraction in the 
benchmark name?

e.g. String with Nulls Scan (95%)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-13917] [SQL] generate broadcast semi jo...

2016-03-15 Thread nongli

Github user nongli commented on the pull request:

https://github.com/apache/spark/pull/11742#issuecomment-197133107
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13917] [SQL] generate broadcast semi jo...

2016-03-15 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11742#discussion_r56277373
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
 ---
@@ -322,4 +327,70 @@ case class BroadcastHashJoin(
""".stripMargin
 }
   }
+
+  /**
+   * Generates the code for left semi join.
+   */
+  private def codegenSemi(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val (broadcastRelation, relationTerm) = prepareBroadcast(ctx)
+val (keyEv, anyNull) = genStreamSideJoinKey(ctx, input)
+val matched = ctx.freshName("matched")
+val buildVars = genBuildSideVars(ctx, matched)
+val numOutput = metricTerm(ctx, "numOutputRows")
+
+val checkCondition = if (condition.isDefined) {
+  val expr = condition.get
+  // evaluate the variables from build side that used by condition
+  val eval = evaluateRequiredVariables(buildPlan.output, buildVars, 
expr.references)
+  // filter the output via condition
+  ctx.currentVars = input ++ buildVars
+  val ev = BindReferences.bindReference(expr, streamedPlan.output ++ 
buildPlan.output).gen(ctx)
+  s"""
+ |$eval
+ |${ev.code}
+ |if (${ev.isNull} || !${ev.value}) continue;
+   """.stripMargin
+} else {
+  ""
+}
+
+if (broadcastRelation.value.isInstanceOf[UniqueHashedRelation]) {
+  s"""
+ |// generate join key for stream side
+ |${keyEv.code}
+ |// find matches from HashedRelation
+ |UnsafeRow $matched = $anyNull ? null: 
(UnsafeRow)$relationTerm.getValue(${keyEv.value});
+ |if ($matched == null) continue;
+ |$checkCondition
+ |$numOutput.add(1);
+ |${consume(ctx, input)}
+   """.stripMargin
+
+} else {
+  val matches = ctx.freshName("matches")
+  val bufferType = classOf[CompactBuffer[UnsafeRow]].getName
+  val i = ctx.freshName("i")
+  val size = ctx.freshName("size")
+  val found = ctx.freshName("found")
+  s"""
+ |// generate join key for stream side
+ |${keyEv.code}
+ |// find matches from HashRelation
+ |$bufferType $matches = $anyNull ? null : 
($bufferType)$relationTerm.get(${keyEv.value});
+ |if ($matches == null) continue;
+ |int $size = $matches.size();
+ |boolean $found = false;
+ |for (int $i = 0; $i < $size; $i++) {
+ |  UnsafeRow $matched = (UnsafeRow) $matches.apply($i);
+ |  $checkCondition
+ |  $found = true;
+ |  break;
+ |}
+ |if (!$found) continue;
+ |$numOutput.add(1);
+ |${consume(ctx, input)}
+   """.stripMargin
+
--- End diff --

and here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13917] [SQL] generate broadcast semi jo...

2016-03-15 Thread nongli

Github user nongli commented on a diff in the pull request:

https://github.com/apache/spark/pull/11742#discussion_r56277324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
 ---
@@ -322,4 +327,70 @@ case class BroadcastHashJoin(
""".stripMargin
 }
   }
+
+  /**
+   * Generates the code for left semi join.
+   */
+  private def codegenSemi(ctx: CodegenContext, input: Seq[ExprCode]): 
String = {
+val (broadcastRelation, relationTerm) = prepareBroadcast(ctx)
+val (keyEv, anyNull) = genStreamSideJoinKey(ctx, input)
+val matched = ctx.freshName("matched")
+val buildVars = genBuildSideVars(ctx, matched)
+val numOutput = metricTerm(ctx, "numOutputRows")
+
+val checkCondition = if (condition.isDefined) {
+  val expr = condition.get
+  // evaluate the variables from build side that used by condition
+  val eval = evaluateRequiredVariables(buildPlan.output, buildVars, 
expr.references)
+  // filter the output via condition
+  ctx.currentVars = input ++ buildVars
+  val ev = BindReferences.bindReference(expr, streamedPlan.output ++ 
buildPlan.output).gen(ctx)
+  s"""
+ |$eval
+ |${ev.code}
+ |if (${ev.isNull} || !${ev.value}) continue;
+   """.stripMargin
+} else {
+  ""
+}
+
+if (broadcastRelation.value.isInstanceOf[UniqueHashedRelation]) {
+  s"""
+ |// generate join key for stream side
+ |${keyEv.code}
+ |// find matches from HashedRelation
+ |UnsafeRow $matched = $anyNull ? null: 
(UnsafeRow)$relationTerm.getValue(${keyEv.value});
+ |if ($matched == null) continue;
+ |$checkCondition
+ |$numOutput.add(1);
+ |${consume(ctx, input)}
+   """.stripMargin
+
--- End diff --

extra new line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11750#issuecomment-197131127
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53263/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11750#issuecomment-197131125
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11750#issuecomment-197131108
  
**[Test build #53263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53263/consoleFull)**
 for PR 11750 at commit 
[`3b2e48a`](https://github.com/apache/spark/commit/3b2e48a4efa81bb886331ce5818e2d44c7c2f7da).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13889][YARN] Fix integer overflow when ...

2016-03-15 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/11713#issuecomment-197131091
  
cc @rxin @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog

2016-03-15 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11750#discussion_r56276611
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -0,0 +1,484 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.catalog
+
+import java.util.concurrent.ConcurrentHashMap
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.{FunctionIdentifier, TableIdentifier}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
SubqueryAlias}
+
+
+/**
+ * An internal catalog that is used by a Spark Session. This internal 
catalog serves as a
+ * proxy to the underlying metastore (e.g. Hive Metastore) and it also 
manages temporary
+ * tables and functions of the Spark Session that it belongs to.
+ */
+class SessionCatalog(externalCatalog: ExternalCatalog) {
+  import ExternalCatalog._
+
+  private[this] val tempTables = new ConcurrentHashMap[String, LogicalPlan]
+
+  private[this] val tempFunctions = new ConcurrentHashMap[String, 
CatalogFunction]
+
+  // 

+  // Databases
+  // 

+  // All methods in this category interact directly with the underlying 
catalog.
+  // 

+
+  def createDatabase(dbDefinition: CatalogDatabase, ignoreIfExists: 
Boolean): Unit = {
+externalCatalog.createDatabase(dbDefinition, ignoreIfExists)
+  }
+
+  def dropDatabase(db: String, ignoreIfNotExists: Boolean, cascade: 
Boolean): Unit = {
+externalCatalog.dropDatabase(db, ignoreIfNotExists, cascade)
+  }
+
+  def alterDatabase(dbDefinition: CatalogDatabase): Unit = {
+externalCatalog.alterDatabase(dbDefinition)
+  }
+
+  def getDatabase(db: String): CatalogDatabase = {
+externalCatalog.getDatabase(db)
+  }
+
+  def databaseExists(db: String): Boolean = {
+externalCatalog.databaseExists(db)
+  }
+
+  def listDatabases(): Seq[String] = {
+externalCatalog.listDatabases()
+  }
+
+  def listDatabases(pattern: String): Seq[String] = {
+externalCatalog.listDatabases(pattern)
+  }
+
+  // 

+  // Tables
+  // 

+  // There are two kinds of tables, temporary tables and metastore tables.
+  // Temporary tables are isolated across sessions and do not belong to any
+  // particular database. Metastore tables can be used across multiple
+  // sessions as their metadata is persisted in the underlying catalog.
+  // 

+
+  // 
+  // | Methods that interact with metastore tables only |
+  // 
+
+  /**
+   * Create a metastore table in the database specified in 
`tableDefinition`.
+   * If no such database is specified, create it in the current database.
+   */
+  def createTable(
+  currentDb: String,
--- End diff --

it is somewhat strange to pass in currentDb and then rely on some table 
definition's database. Have you thought about just figuring out that part in 
the caller outside the session catalog? i.e. the catalog itself doesn't need to 
handle currentDb.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at

[GitHub] spark pull request: [SPARK-13852][YARN]handle the InterruptedExcep...

2016-03-15 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/11692#issuecomment-197129360
  
But from my understanding, this exception does no harm to your application, 
since your application is about to finish itself, also this may happen 
occasionally.

Also does it relate to RM HA, from my understanding, this 
`InterruptedException` will be thrown in any case where the code run into 
`sleep`, no matter in HA or not. Is there any special thing for RM HA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13894][SQL] SqlContext.range return typ...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11730#issuecomment-197128907
  
**[Test build #53266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53266/consoleFull)**
 for PR 11730 at commit 
[`31967ac`](https://github.com/apache/spark/commit/31967ac4a97174c11706930e09a509b08cdc7d09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-13600] [MLlib] Use approxQuantile from ...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11553#issuecomment-197128917
  
**[Test build #53267 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53267/consoleFull)**
 for PR 11553 at commit 
[`18a3ec6`](https://github.com/apache/spark/commit/18a3ec6b8e9357c0055340cc4a4860c450999366).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197128812
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MINOR][TEST][SQL] Remove wrong "expected" par...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11718#issuecomment-197128775
  
**[Test build #2643 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2643/consoleFull)**
 for PR 11718 at commit 
[`0bd1bbc`](https://github.com/apache/spark/commit/0bd1bbc9fa267e3c506d1f51495efa58a1b88796).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197128814
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53258/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...

2016-03-15 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11645#issuecomment-197128647
  
**[Test build #53258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53258/consoleFull)**
 for PR 11645 at commit 
[`76dd988`](https://github.com/apache/spark/commit/76dd988fcac508609be3f32754284bc07524e481).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 >

1 - 100 of 875 matches

Mail list logo