date:20170804

[GitHub] spark pull request #18779: [SPARK-21580][SQL]Integers in aggregation express...

2017-08-04 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18779


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80279/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80279 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80279/testReport)**
 for PR 18828 at commit 
[`84e76b9`](https://github.com/apache/spark/commit/84e76b938517363de7d20adb9f2b87486293edac).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18779
  
Thanks! Merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18779
  
Really appreciate your patience and your work! This is how we work in Spark 
SQL. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131515429
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -99,18 +100,36 @@ class SparkHadoopUtil extends Logging {
   hadoopConf.set("fs.s3a.session.token", sessionToken)
 }
   }
-  // Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
-  conf.getAll.foreach { case (key, value) =>
-if (key.startsWith("spark.hadoop.")) {
-  hadoopConf.set(key.substring("spark.hadoop.".length), value)
-}
-  }
+  appendSparkHadoopConfigs(conf, hadoopConf)
   val bufferSize = conf.get("spark.buffer.size", "65536")
   hadoopConf.set("io.file.buffer.size", bufferSize)
 }
   }
 
   /**
+   * Appends spark.hadoop.* configurations from a [[SparkConf]] to a Hadoop
+   * configuration without the spark.hadoop. prefix.
+   */
+  def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
+// Copy any "spark.hadoop.foo=bar" spark properties into conf as 
"foo=bar"
+conf.getAll.foreach { case (key, value) if 
key.startsWith("spark.hadoop.") =>
+  hadoopConf.set(key.substring("spark.hadoop.".length), value)
+}
+  }
+
+  /**
+   * Appends spark.hadoop.* configurations from a Map to another without 
the spark.hadoop. prefix.
+   */
+  def appendSparkHadoopConfigs(
+  srcMap: Map[String, String],
+  destMap: HashMap[String, String]): Unit = {
+// Copy any "spark.hadoop.foo=bar" system properties into destMap as 
"foo=bar"
+srcMap.foreach { case (key, value) if key.startsWith("spark.hadoop.") 
=>
+  destMap.put(key.substring("spark.hadoop.".length), value)
+}
--- End diff --

Your solution requires another `case _ => `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131515413
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -99,18 +100,36 @@ class SparkHadoopUtil extends Logging {
   hadoopConf.set("fs.s3a.session.token", sessionToken)
 }
   }
-  // Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
-  conf.getAll.foreach { case (key, value) =>
-if (key.startsWith("spark.hadoop.")) {
-  hadoopConf.set(key.substring("spark.hadoop.".length), value)
-}
-  }
+  appendSparkHadoopConfigs(conf, hadoopConf)
   val bufferSize = conf.get("spark.buffer.size", "65536")
   hadoopConf.set("io.file.buffer.size", bufferSize)
 }
   }
 
   /**
+   * Appends spark.hadoop.* configurations from a [[SparkConf]] to a Hadoop
+   * configuration without the spark.hadoop. prefix.
+   */
+  def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
+// Copy any "spark.hadoop.foo=bar" spark properties into conf as 
"foo=bar"
+conf.getAll.foreach { case (key, value) if 
key.startsWith("spark.hadoop.") =>
+  hadoopConf.set(key.substring("spark.hadoop.".length), value)
+}
+  }
+
+  /**
+   * Appends spark.hadoop.* configurations from a Map to another without 
the spark.hadoop. prefix.
+   */
+  def appendSparkHadoopConfigs(
+  srcMap: Map[String, String],
+  destMap: HashMap[String, String]): Unit = {
+// Copy any "spark.hadoop.foo=bar" system properties into destMap as 
"foo=bar"
+srcMap.foreach { case (key, value) if key.startsWith("spark.hadoop.") 
=>
+  destMap.put(key.substring("spark.hadoop.".length), value)
+}
--- End diff --

Your change is different from what I posted before
```Scala
for ((key, value) <- conf if key.startsWith("spark.hadoop.")) {
  propMap.put(key.substring("spark.hadoop.".length), value)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18623: [SPARK-21374][CORE] Fix reading globbed paths fro...

2017-08-04 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18623


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18848: [SPARK-21374][CORE] Fix reading globbed paths fro...

2017-08-04 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18848


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18848
  
Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80277/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18779
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80277 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80277/testReport)**
 for PR 18779 at commit 
[`2bf42b1`](https://github.com/apache/spark/commit/2bf42b11cec794d19ed8f2fd000a9cd7aeb159bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-08-04 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r131514782
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,279 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private Lock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean isReadInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean isReadAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
Executors.newSingleThreadExecutor();
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private final byte[] oneByte = new byte[1];
+
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
+Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes 
should be greater than 0");
+Preconditions.checkArgument(readAheadThresholdInBytes > 0 && 
readAheadThresholdInBytes < bufferSizeInBytes,
+"readAheadThresholdInBytes should be 
greater than 0 and less than bufferSizeInBytes" );
+activeBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+this.readAheadThresholdInBytes = readAheadThresholdInBytes;
+this.underlyingInputStream = inputStream;
+activeBuffer.flip();
+readAheadBuffer.flip();
+  }
+
+  private boolean isEndOfStream() {
+if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 
&& endOfStream) {
+  return true;
+}
+return  false;
+  }
+
+
+  private void readAsync(final ByteBuffer byteBuffer) throws IOException {
+stateChangeLock.lock();
+if (endOfStream || isReadInProgress) {
+  stateChangeLock.unlock();
+  return;
+}
+byteBuffer.position(0);
+byteBuffer.flip();
+isReadInProgress = true;
+stateChangeLock.unlock();
+executorService.execute(() -> {
+  byte[] arr;
+  stateChangeLock.lock();
+  arr = byteBuffer.array();
+  stateChangeLock.unlock();
+  // Please note that it is safe to release the lock and read into the 
read ahead buffer

[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80278/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80278 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80278/testReport)**
 for PR 18668 at commit 
[`55729fa`](https://github.com/apache/spark/commit/55729fa343e285f3711154dc0adcb4585a5e6f1f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80280 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80280/testReport)**
 for PR 18828 at commit 
[`861f255`](https://github.com/apache/spark/commit/861f255638e0b301e3a21b5a0bc491fbfc9537d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80279 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80279/testReport)**
 for PR 18828 at commit 
[`84e76b9`](https://github.com/apache/spark/commit/84e76b938517363de7d20adb9f2b87486293edac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80272/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80272 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80272/testReport)**
 for PR 18317 at commit 
[`89cbac8`](https://github.com/apache/spark/commit/89cbac877eefe0e0dc5d5956331c228be0382f72).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18749
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18749
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80274/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18749
  
**[Test build #80274 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80274/testReport)**
 for PR 18749 at commit 
[`974eab2`](https://github.com/apache/spark/commit/974eab27d77169c7bc7205595e3702b32865).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop.*` prop...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80278/testReport)**
 for PR 18668 at commit 
[`55729fa`](https://github.com/apache/spark/commit/55729fa343e285f3711154dc0adcb4585a5e6f1f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....

2017-08-04 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131513659
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -99,17 +100,30 @@ class SparkHadoopUtil extends Logging {
   hadoopConf.set("fs.s3a.session.token", sessionToken)
 }
   }
-  // Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
-  conf.getAll.foreach { case (key, value) =>
-if (key.startsWith("spark.hadoop.")) {
-  hadoopConf.set(key.substring("spark.hadoop.".length), value)
-}
-  }
+  appendSparkHadoopConfigs(conf, hadoopConf)
   val bufferSize = conf.get("spark.buffer.size", "65536")
   hadoopConf.set("io.file.buffer.size", bufferSize)
 }
   }
 
+  def appendSparkHadoopConfigs(conf: SparkConf, hadoopConf: 
Configuration): Unit = {
+// Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
+conf.getAll.foreach { case (key, value) =>
+  if (key.startsWith("spark.hadoop.")) {
+hadoopConf.set(key.substring("spark.hadoop.".length), value)
+  }
+}
+  }
+
+  def appendSparkHadoopConfigs(propMap: HashMap[String, String]): Unit = {
+// Copy any "spark.hadoop.foo=bar" system properties into conf as 
"foo=bar"
+sys.props.foreach { case (key, value) =>
+  if (key.startsWith("spark.hadoop.")) {
+propMap.put(key.substring("spark.hadoop.".length), value)
+  }
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21637][SPARK-21451][SQL]get `spark.hadoop....

2017-08-04 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131513657
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -404,6 +405,8 @@ private[spark] object HiveUtils extends Logging {
 propMap.put(ConfVars.METASTORE_EVENT_LISTENERS.varname, "")
 propMap.put(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname, "")
 
+SparkHadoopUtil.get.appendSparkHadoopConfigs(propMap)
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80273/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80273 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80273/testReport)**
 for PR 18317 at commit 
[`d84269d`](https://github.com/apache/spark/commit/d84269d7208ccacb20795d5a46fcca52584bcbef).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18851
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80275/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18851
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18851
  
**[Test build #80275 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80275/testReport)**
 for PR 18851 at commit 
[`9911176`](https://github.com/apache/spark/commit/9911176d901a6bdfadec3ef68b28e6dd2c82be9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18828
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80276/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80276 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80276/testReport)**
 for PR 18828 at commit 
[`8f2f91d`](https://github.com/apache/spark/commit/8f2f91d9dfd0e19e0f60fc70f3f10ef73a2d0019).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread 10110346

Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/18779
  
I learned a lot from you, thanks all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80277 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80277/testReport)**
 for PR 18779 at commit 
[`2bf42b1`](https://github.com/apache/spark/commit/2bf42b11cec794d19ed8f2fd000a9cd7aeb159bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18841: [SPARK-21635][SQL] ACOS(2) and ASIN(2) should be ...

2017-08-04 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18841#discussion_r131512180
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -170,29 +193,29 @@ case class Pi() extends LeafMathExpression(math.Pi, 
"PI")
 
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of 
`expr` if -1<=`expr`<=1 or NaN otherwise.",
+  usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of 
`expr` if -1<=`expr`<=1 or NULL otherwise.",
   extended = """
 Examples:
   > SELECT _FUNC_(1);
0.0
   > SELECT _FUNC_(2);
-   NaN
+   NULL
--- End diff --

I checked the Hive patch. I saw there is a comment that questions about NaN 
in Hive. In SparkSQL, I think NaN is used in some expressions, e.g., IsNaN, 
NaNvl. Maybe we should keep the existing behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18731
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80271/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18731
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18731
  
**[Test build #80271 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80271/testReport)**
 for PR 18731 at commit 
[`6071daf`](https://github.com/apache/spark/commit/6071dafb1269f54c9246ebe1cdccaad721365971).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18849
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80270/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18849
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18849
  
**[Test build #80270 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80270/testReport)**
 for PR 18849 at commit 
[`7ccf474`](https://github.com/apache/spark/commit/7ccf4743024a8a447a4b05369f6ebf237cf88c4f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18841: [SPARK-21635][SQL] ACOS(2) and ASIN(2) should be ...

2017-08-04 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18841#discussion_r131511957
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -170,29 +193,29 @@ case class Pi() extends LeafMathExpression(math.Pi, 
"PI")
 
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of 
`expr` if -1<=`expr`<=1 or NaN otherwise.",
+  usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of 
`expr` if -1<=`expr`<=1 or NULL otherwise.",
   extended = """
 Examples:
   > SELECT _FUNC_(1);
0.0
   > SELECT _FUNC_(2);
-   NaN
+   NULL
--- End diff --

As we already explicitly define the result value (NaN), I'm not sure we 
should change it like this. Compatibility issue might be arisen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18828: [SPARK-21619][SQL] Fail the execution of canonicalized p...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18828
  
**[Test build #80276 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80276/testReport)**
 for PR 18828 at commit 
[`8f2f91d`](https://github.com/apache/spark/commit/8f2f91d9dfd0e19e0f60fc70f3f10ef73a2d0019).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18851
  
**[Test build #80275 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80275/testReport)**
 for PR 18851 at commit 
[`9911176`](https://github.com/apache/spark/commit/9911176d901a6bdfadec3ef68b28e6dd2c82be9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-08-04 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18851
  
cc @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18851: [SPARK-21644][SQL] LocalLimit.maxRows is defined ...

2017-08-04 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/18851

[SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly

## What changes were proposed in this pull request?
The definition of `maxRows` in `LocalLimit` operator was simply wrong. This 
patch introduces a new `maxRowsPerPartition` method and uses that in pruning. 
The patch also adds more documentation on why we need local limit vs global 
limit.

Note that this previously has never been a bug because the way the code is 
structured, but future use of the maxRows could lead to bugs.

## How was this patch tested?
Should be covered by existing test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-21644

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18851


commit 9911176d901a6bdfadec3ef68b28e6dd2c82be9e
Author: Reynold Xin 
Date:   2017-08-05T01:19:26Z

[SPARK-21644][SQL] LocalLimit.maxRows is defined incorrectly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18844
  
Actually why do we need this? Can't you just add Error?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18749
  
**[Test build #80274 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80274/testReport)**
 for PR 18749 at commit 
[`974eab2`](https://github.com/apache/spark/commit/974eab27d77169c7bc7205595e3702b32865).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18830: [SPARK-21621][Core] Reset numRecordsWritten after DiskBl...

2017-08-04 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/18830
  
Thanks for reviewing. Hi @jiangxb1987, seems the test didn't triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-08-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
I think of opening a discussion in the mailing list if you are suggesting 
something. I don't think we should talk about this here.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80273 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80273/testReport)**
 for PR 18317 at commit 
[`d84269d`](https://github.com/apache/spark/commit/d84269d7208ccacb20795d5a46fcca52584bcbef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80272 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80272/testReport)**
 for PR 18317 at commit 
[`89cbac8`](https://github.com/apache/spark/commit/89cbac877eefe0e0dc5d5956331c228be0382f72).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18659
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18659
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80265/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18659
  
**[Test build #80265 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80265/testReport)**
 for PR 18659 at commit 
[`a01a2d3`](https://github.com/apache/spark/commit/a01a2d3a338083b043e7183957f416e7aefee527).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18844
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18844
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80266/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18844
  
**[Test build #80266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80266/testReport)**
 for PR 18844 at commit 
[`f8bcf47`](https://github.com/apache/spark/commit/f8bcf477303b48f5a9746ca8cd2a05f1ab6852ea).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18742
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80269/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80269 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80269/testReport)**
 for PR 18742 at commit 
[`e8c3041`](https://github.com/apache/spark/commit/e8c3041ce38c707b854a661b751ed3350e8f0371).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80268/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80268 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80268/testReport)**
 for PR 18317 at commit 
[`a83a3d2`](https://github.com/apache/spark/commit/a83a3d20694126a1d1b4c4b9c28e8362713b60f8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18848
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18848
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80263/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80267/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80267 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80267/testReport)**
 for PR 18317 at commit 
[`42740be`](https://github.com/apache/spark/commit/42740be3b174c7b6978e77105084e0d8d70fdaa3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18848
  
**[Test build #80263 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80263/testReport)**
 for PR 18848 at commit 
[`9d9b57a`](https://github.com/apache/spark/commit/9d9b57a68d6d0504f4b58fd3fc99d6f678e4f213).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-08-04 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r131506100
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self):
 pdf = df.toPandas()
 self.spark.conf.set("spark.sql.execution.arrow.enable", "true")
 pdf_arrow = df.toPandas()
+# need to remove timezone for comparison
+pdf_arrow["7_timestamp_t"] = \
+pdf_arrow["7_timestamp_t"].apply(lambda ts: 
ts.tz_localize(None))
--- End diff --

Which way do you mean, with or without Arrow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18731
  
**[Test build #80271 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80271/testReport)**
 for PR 18731 at commit 
[`6071daf`](https://github.com/apache/spark/commit/6071dafb1269f54c9246ebe1cdccaad721365971).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18850: Remove dropping bus logic.

2017-08-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18850
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18850: Remove dropping bus logic.

2017-08-04 Thread dirkraft

GitHub user dirkraft opened a pull request:

https://github.com/apache/spark/pull/18850

Remove dropping bus logic.

Log warning when event uptake takes too long, but block anyways. Since 
dropping events is no longer possible, removed that code.

## What changes were proposed in this pull request?

LiveListenerBus events are now never dropped. I believe that critical 
components communicate through this bus. I will also add the argument that a 
buggy spark UI (because of dropped events) is just as useless, and so I am 
unaware of any kind of event which can be lossy.

## How was this patch tested?

Basically we ran it in our production environment. Our past couple spark 
runs now reliably complete with this change since crucial events never get 
dropped, but it is unclear how much lag this change might be contributing to 
the overall run time.

We have partition counts in the thousands and executors in the hundreds.



Perhaps the events can be tagged as critical (`queue.put`) or not 
(`queue.offer`), but this small change is meant to get spark stable again. We 
have turned up the eventqueue size to 1,000,000 (and fiddled with all available 
settings), but it still isn't enough. With some probability, enough crucial 
events are dropped and leaves the spark job hung indefinitely unable to recover 
(sometimes it does). The large eventqueue also maxed out our driver process 
with 64GB of memory, so that's pretty much untenable. 

With this change and over tens (hundreds?) of millions of events flowing 
through this bus, only 1100 triggered the slow warning, usually around 20ms 
with a max of 100ms. The current working hypothesis is that the events tend to 
arrive in bursts and so quickly overwhelm the queue and then quickly empty out.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dirkraft/spark dont-drop-events-ever

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18850


commit 56c4c1d87716b6f629be83fff8197b36a607a9ae
Author: Jason Dunkelberger 
Date:   2017-08-04T23:21:02Z

Remove droppy logic. Log warning when event uptake takes too long, but 
block anyways.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18849: [SPARK-21617][SQL] Store correct table metadata w...

2017-08-04 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18849#discussion_r131504834
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -908,7 +909,13 @@ private[hive] object HiveClientImpl {
 }
 // after SPARK-19279, it is not allowed to create a hive table with an 
empty schema,
 // so here we should not add a default col schema
-if (schema.isEmpty && DDLUtils.isDatasourceTable(table)) {
+//
+// Because HiveExternalCatalog sometimes writes back "raw" tables that 
have not been
+// completely translated to Spark's view, the provider information 
needs to be looked
+// up in two places.
+val provider = table.provider.orElse(
--- End diff --

This change would have fixed the second exception in the bug (about storing 
an empty schema); but the code was just ending up in that situation because of 
the other problems this PR is fixing. This change shouldn't be needed for the 
fix, but I included it for further correctness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18849
  
**[Test build #80270 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80270/testReport)**
 for PR 18849 at commit 
[`7ccf474`](https://github.com/apache/spark/commit/7ccf4743024a8a447a4b05369f6ebf237cf88c4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-04 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18849
  
This is a corrected version of #18824 after I tracked the actual failure 
and looked at the suggested code paths in the original review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18849: [SPARK-21617][SQL] Store correct table metadata w...

2017-08-04 Thread vanzin

GitHub user vanzin opened a pull request:

https://github.com/apache/spark/pull/18849

[SPARK-21617][SQL] Store correct table metadata when altering schema in 
Hive metastore.

HiveExternalCatalog.alterTableSchema takes a shortcut by modifying the raw
Hive table metadata instead of the full Spark view; that means it needs to
be aware of whether the table is Hive-compatible or not.

For compatible tables, the current "replace the schema" code is the correct
path, except that an exception in that path should result in an error, and
not in retrying in a different way.

For non-compatible tables, Spark should just update the table properties,
and leave the schema stored in the raw table untouched.

Because Spark doesn't explicitly store metadata about whether a table is
Hive-compatible or not, a new property was added just to make that explicit.
The code tries to detect old DS tables that don't have the property and do
the right thing.

These changes also uncovered a problem with the way case-sensitive DS tables
were being saved to the Hive metastore; the metastore is case-insensitive,
and the code was treating these tables as Hive-compatible if the data source
had a Hive counterpart (e.g. for parquet). In this scenario, the schema
could be corrupted when being updated from Spark if conflicting columns 
existed
ignoring case. The change fixes this by making case-sensitive DS-tables not
Hive-compatible.

Tested with existing and added unit tests (plus internal tests with a 2.1 
metastore).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vanzin/spark SPARK-21617

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18849


commit aae3abd673adc7ff939d842e49d566fa722403a3
Author: Marcelo Vanzin 
Date:   2017-08-02T21:47:34Z

[SPARK-21617][SQL] Store correct metadata in Hive for altered DS table.

This change fixes two issues:
- when loading table metadata from Hive, restore the "provider" field of
  CatalogTable so DS tables can be identified.
- when altering a DS table in the Hive metastore, make sure to not alter
  the table's schema, since the DS table's schema is stored as a table
  property in those cases.

Also added a new unit test for this issue which fails without this change.

commit 2350b105a599dde849e44bde50aa6d13812e4f83
Author: Marcelo Vanzin 
Date:   2017-08-04T22:49:31Z

Fix 2.1 DDL suite to not use SparkSession.

commit 7ccf4743024a8a447a4b05369f6ebf237cf88c4f
Author: Marcelo Vanzin 
Date:   2017-08-04T22:57:44Z

Proper fix.

HiveExternalCatalog.alterTableSchema takes a shortcut by modifying the raw
Hive table metadata instead of the full Spark view; that means it needs to
be aware of whether the table is Hive-compatible or not.

For compatible tables, the current "replace the schema" code is the correct
path, except that an exception in that path should result in an error, and
not in retrying in a different way.

For non-compatible tables, Spark should just update the table properties,
and leave the schema stored in the raw table untouched.

Because Spark doesn't explicitly store metadata about whether a table is
Hive-compatible or not, a new property was added just to make that explicit.
The code tries to detect old DS tables that don't have the property and do
the right thing.

These changes also uncovered a problem with the way case-sensitive DS tables
were being saved to the Hive metastore; the metastore is case-insensitive,
and the code was treating these tables as Hive-compatible if the data source
had a Hive counterpart (e.g. for parquet). In this scenario, the schema
could be corrupted when being updated from Spark if conflicting columns 
existed
ignoring case. The change fixes this by making case-sensitive DS-tables not
Hive-compatible.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18640
  
I just checked the dependency size. They look pretty reasonable, roughly 2 
MBs in total (although I do worry in the future whether ORC would bring in a 
lot more jars).

cc @omalley any guidance on this topic?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18742: [Spark-21542][ML][Python]Python persistence helper funct...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18742
  
**[Test build #80269 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80269/testReport)**
 for PR 18742 at commit 
[`e8c3041`](https://github.com/apache/spark/commit/e8c3041ce38c707b854a661b751ed3350e8f0371).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r131503699
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self):
 pdf = df.toPandas()
 self.spark.conf.set("spark.sql.execution.arrow.enable", "true")
 pdf_arrow = df.toPandas()
+# need to remove timezone for comparison
+pdf_arrow["7_timestamp_t"] = \
+pdf_arrow["7_timestamp_t"].apply(lambda ts: 
ts.tz_localize(None))
--- End diff --

Actually, I have a question. How does Pandas DataFrame consume the time 
zone we create? Will our output impact them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-08-04 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r131502049
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,289 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import com.google.common.primitives.Ints;
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private ReentrantLock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean isReadInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean isReadAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
Executors.newSingleThreadExecutor();
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private final byte[] oneByte = new byte[1];
+
+  /**
+   * Creates a ReadAheadInputStream with the specified buffer 
size and read-ahead
+   * threshold
+   *
+   * @param   inputStream The underlying input stream.
+   * @param   bufferSizeInBytes   The buffer size.
+   * @param   readAheadThresholdInBytes   If the active buffer has less 
data than the read-ahead
+   *  threshold, an async read is 
triggered.
+   */
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
+Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes 
should be greater than 0");
+Preconditions.checkArgument(readAheadThresholdInBytes > 0 && 
readAheadThresholdInBytes < bufferSizeInBytes,
+"readAheadThresholdInBytes should be 
greater than 0 and less than bufferSizeInBytes" );
+activeBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+this.readAheadThresholdInBytes = readAheadThresholdInBytes;
+this.underlyingInputStream = inputStream;
+activeBuffer.flip();
+readAheadBuffer.flip();
+  }
+
+  private boolean isEndOfStream() {
+if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 
&& endOfStream) {
+  return true;
+}
+return  false;
+  }
+
+
+  private void readAsync(final ByteBuffer byteBuffer) throws IOException {
+stateChangeLock.lock();
+if (endOfStream ||

[GitHub] spark issue #18519: [SPARK-16742] Mesos Kerberos Support

2017-08-04 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18519
  
I've been pretty busy, probably can get to this sometime next week.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18848: [SPARK-21374][CORE] Fix reading globbed paths from S3 in...

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18848
  
LGTM pending Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-08-04 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r131496971
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3036,6 +3052,9 @@ def test_toPandas_arrow_toggle(self):
 pdf = df.toPandas()
 self.spark.conf.set("spark.sql.execution.arrow.enable", "true")
 pdf_arrow = df.toPandas()
+# need to remove timezone for comparison
+pdf_arrow["7_timestamp_t"] = \
+pdf_arrow["7_timestamp_t"].apply(lambda ts: 
ts.tz_localize(None))
--- End diff --

Yes, this test is verify behavior is the same with/without Arrow - that is 
why I made it - and I think we all agree that is the goal.  I'm trying to 
understand if you are suggesting that this new conf, only if enabled, will add 
the session local timezone to the Pandas DataFrame made in 
 `toPandas()` with and without Arrow enabled?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18640
  
Until now, I think ORC is the same with most of other data sources(CSV, 
JDBC, JSON, PARQUET, TEXT) which live inside `sql/core` now. If that is an 
architectural plan of Apache Spark 2.3, I will. Are we going to move out all 
data sources into separate modules, e.g., `datasources/parquet`, in timeframe 
of Spark 2.3?

Or, is there any other reason I don't catch here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18841: [SPARK-21635][SQL] ACOS(2) and ASIN(2) should be ...

2017-08-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18841#discussion_r131495498
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -170,29 +193,29 @@ case class Pi() extends LeafMathExpression(math.Pi, 
"PI")
 
 // scalastyle:off line.size.limit
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of 
`expr` if -1<=`expr`<=1 or NaN otherwise.",
+  usage = "_FUNC_(expr) - Returns the inverse cosine (a.k.a. arccosine) of 
`expr` if -1<=`expr`<=1 or NULL otherwise.",
--- End diff --

I did check Oracle. It output an error message.
```
ORA-01428: argument '3' is out of range
```

Each database has different behaviors. Maybe we can keep the existing one 
unchanged. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/18317
  
@jiangxb1987, @kiszk Addressed review comments, lmk what you guys think.

BTW, this idea can be applied to other places when we block on reading the 
input stream like HDFS reading. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #80268 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80268/testReport)**
 for PR 18317 at commit 
[`a83a3d2`](https://github.com/apache/spark/commit/a83a3d20694126a1d1b4c4b9c28e8362713b60f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-08-04 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r131494491
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,279 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private Lock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean isReadInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean isReadAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
Executors.newSingleThreadExecutor();
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private final byte[] oneByte = new byte[1];
+
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
--- End diff --

done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-08-04 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r131494505
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,279 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private Lock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean isReadInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean isReadAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
Executors.newSingleThreadExecutor();
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private final byte[] oneByte = new byte[1];
+
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
+Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes 
should be greater than 0");
+Preconditions.checkArgument(readAheadThresholdInBytes > 0 && 
readAheadThresholdInBytes < bufferSizeInBytes,
+"readAheadThresholdInBytes should be 
greater than 0 and less than bufferSizeInBytes" );
+activeBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+this.readAheadThresholdInBytes = readAheadThresholdInBytes;
+this.underlyingInputStream = inputStream;
+activeBuffer.flip();
+readAheadBuffer.flip();
+  }
+
+  private boolean isEndOfStream() {
+if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 
&& endOfStream) {
+  return true;
+}
+return  false;
+  }
+
+
+  private void readAsync(final ByteBuffer byteBuffer) throws IOException {
+stateChangeLock.lock();
+if (endOfStream || isReadInProgress) {
+  stateChangeLock.unlock();
+  return;
+}
+byteBuffer.position(0);
+byteBuffer.flip();
+isReadInProgress = true;
+stateChangeLock.unlock();
+executorService.execute(() -> {
+  byte[] arr;
+  stateChangeLock.lock();
+  arr = byteBuffer.array();
+  stateChangeLock.unlock();
+  // Please note that it is safe to release the lock and read into the 
read ahead buffer

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-08-04 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r131494464
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,279 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import org.apache.spark.storage.StorageUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private Lock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean isReadInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean isReadAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
Executors.newSingleThreadExecutor();
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private final byte[] oneByte = new byte[1];
+
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
+Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes 
should be greater than 0");
+Preconditions.checkArgument(readAheadThresholdInBytes > 0 && 
readAheadThresholdInBytes < bufferSizeInBytes,
+"readAheadThresholdInBytes should be 
greater than 0 and less than bufferSizeInBytes" );
+activeBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+this.readAheadThresholdInBytes = readAheadThresholdInBytes;
+this.underlyingInputStream = inputStream;
+activeBuffer.flip();
+readAheadBuffer.flip();
+  }
+
+  private boolean isEndOfStream() {
+if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 
&& endOfStream) {
+  return true;
+}
+return  false;
+  }
+
+
+  private void readAsync(final ByteBuffer byteBuffer) throws IOException {
+stateChangeLock.lock();
+if (endOfStream || isReadInProgress) {
+  stateChangeLock.unlock();
+  return;
+}
+byteBuffer.position(0);
+byteBuffer.flip();
+isReadInProgress = true;
+stateChangeLock.unlock();
+executorService.execute(() -> {
+  byte[] arr;
+  stateChangeLock.lock();
+  arr = byteBuffer.array();
+  stateChangeLock.unlock();
+  // Please note that it is safe to release the lock and read into the 
read ahead buffer

[GitHub] spark issue #18844: [SPARK-21640] Add errorifexists as a valid string for Er...

2017-08-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18844
  
**[Test build #80266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80266/testReport)**
 for PR 18844 at commit 
[`f8bcf47`](https://github.com/apache/spark/commit/f8bcf477303b48f5a9746ca8cd2a05f1ab6852ea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 393 matches

Mail list logo