date:20161211

[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16248
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16248
  
**[Test build #70009 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70009/consoleFull)**
 for PR 16248 at commit 
[`3e5034d`](https://github.com/apache/spark/commit/3e5034d18aa1edfe77310a8b52bccd2cd30ef130).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16248
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70009/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16104
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16104
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70006/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16104
  
**[Test build #70006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70006/consoleFull)**
 for PR 16104 at commit 
[`8607425`](https://github.com/apache/spark/commit/8607425d025944204ae38c38679a9204ffd1c144).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16219: [SPARK-18790][SS] Keep a general offset history o...

2016-12-11 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16219


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...

2016-12-11 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16219
  
Thanks! Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #70010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70010/consoleFull)**
 for PR 13909 at commit 
[`f418062`](https://github.com/apache/spark/commit/f418062e8c54732c4b78716d27b8c699ac9df980).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16219
  
**[Test build #3493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3493/consoleFull)**
 for PR 16219 at commit 
[`0830349`](https://github.com/apache/spark/commit/083034925d068c1c7c9123d97fc3e647da4faee4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70003/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16220
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16220
  
**[Test build #70003 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70003/consoleFull)**
 for PR 16220 at commit 
[`4be4149`](https://github.com/apache/spark/commit/4be4149d81d9860445ce4b53ae5951c1467632f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class CartesianDeserializer(Serializer):`
  * `class PairDeserializer(Serializer):`
  * `case class FileStreamSourceOffset(logOffset: Long) extends Offset `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16251
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70002/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16251
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16251
  
**[Test build #70002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70002/consoleFull)**
 for PR 16251 at commit 
[`58a57d4`](https://github.com/apache/spark/commit/58a57d4004c45ff2290b95ec8c70ef95828d379b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r91893984
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -331,7 +332,7 @@ private[spark] class MemoryStore(
 var unrollMemoryUsedByThisBlock = 0L
 // Underlying buffer for unrolling the block
 val redirectableStream = new RedirectableOutputStream
-val bbos = new 
ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator)
+val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator)
--- End diff --

Don't we need to add check for the size? It still exposes to overflow by 
converting `pageSizeBytes` from long to int, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r91892123
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: 
T, id: Long)
 }
 // Note: use getSizeAsKb (not bytes) to maintain compatibility if no 
units are provided
 blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt 
* 1024
--- End diff --

`spark.broadcast.blockSize` has special meaning. I don't think we should 
replace it with `pageSizeBytes`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16248
  
**[Test build #70009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70009/consoleFull)**
 for PR 16248 at commit 
[`3e5034d`](https://github.com/apache/spark/commit/3e5034d18aa1edfe77310a8b52bccd2cd30ef130).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16248: [SPARK-18810][SPARKR] SparkR install.spark does n...

2016-12-11 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16248#discussion_r91891490
  
--- Diff: R/pkg/R/utils.R ---
@@ -851,3 +851,12 @@ rbindRaws <- function(inputData){
   out[!rawcolumns] <- lapply(out[!rawcolumns], unlist)
   out
 }
+
+# Get basename without extension from URL
+basenameSansExtFromUrl <- function(url) {
--- End diff --

My concern was to bring in another dependencies just for this (it's in the 
tools)

The regex was in fact copy-paste from file_path_sans_ext (hence the name) 
except for the compression part which is what you are referring to.
I could copy that over as well. Would you prefer `compression` be TRUE 
(default is FALSE) to remove `.gz`?

```
> library(tools)
> file_path_sans_ext
function (x, compression = FALSE)
{
if (compression)
x <- sub("[.](gz|bz2|xz)$", "", x)
sub("([^.]+)\\.[[:alnum:]]+$", "\\1", x)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16030
  
the new behavior LGTM, but I'm not sure if we still need to keep the old 
behavior


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...

2016-12-11 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16219
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16142: [SPARK-18716][CORE] Restrict the disk usage of sp...

2016-12-11 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16142#discussion_r91889826
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala ---
@@ -90,6 +91,10 @@ private[spark] class EventLoggingListener(
* Creates the log file in the configured log directory.
*/
   def start() {
+val statusList = Option(fileSystem.listStatus(new 
Path(logBaseDir))).map(_.toSeq)
+  .getOrElse(Seq[FileStatus]())
+EventLoggingListener.cleanRedundantLogFiles(sparkConf, fileSystem, 
statusList)
--- End diff --

Make sense.  I will revert related changes first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-12-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14638
  
Do you mean to replace the current whole `TableReader.scala` which was 
introduced in SPARK-1251 ? I guessed Spark chose this direct access approach 
for the performance issue at that time.

Yes. This option is targeting only `TextInputFormat`s. For non-file based 
hive tables like Orc/Parquet, this option is ignored.
```scala
val isTextInputFormatTable = 
classOf[TextInputFormat].isAssignableFrom(hiveTable.getInputFormatClass)
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r9169
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: 
T, id: Long)
 }
 // Note: use getSizeAsKb (not bytes) to maintain compatibility if no 
units are provided
 blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt 
* 1024
+chunkSize = SparkEnv.get.memoryManager.pageSizeBytes.toInt
 checksumEnabled = conf.getBoolean("spark.broadcast.checksum", true)
--- End diff --

@JoshRosen We use `SparkEnv.get.memoryManager.pageSizeBytes` as chunk size. 
As `SparkEnv.get.memoryManager.pageSizeBytes` returns `Long`, there is still 
underlying integer overflow issue, isn't it? Besides, users will never know the 
 low level details and the effect to chunk size when modify `pageSizeBytes`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15915
  
**[Test build #70008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70008/consoleFull)**
 for PR 15915 at commit 
[`8551892`](https://github.com/apache/spark/commit/85518921494bb4e24fcd913bafba45025da126cd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15915
  
**[Test build #70007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70007/consoleFull)**
 for PR 15915 at commit 
[`45aeddb`](https://github.com/apache/spark/commit/45aeddb95984fb9e3940bea3e6227977f44033e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16245
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16245
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/6/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...

2016-12-11 Thread kapilsingh5050

Github user kapilsingh5050 commented on the issue:

https://github.com/apache/spark/pull/16146
  
Yes, I'll do that but the test failures here are different. I'm still to 
figure out the root cause.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16245
  
**[Test build #6 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)**
 for PR 16245 at commit 
[`66a3d98`](https://github.com/apache/spark/commit/66a3d983d978b902858a34dde992640a489f5351).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16104
  
**[Test build #70006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70006/consoleFull)**
 for PR 16104 at commit 
[`8607425`](https://github.com/apache/spark/commit/8607425d025944204ae38c38679a9204ffd1c144).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...

2016-12-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16146
  
hi @kapilsingh5050  can you also include 
https://github.com/apache/spark/pull/16244? It does fix the maven tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...

2016-12-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16104
  
LGTM 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16104: [SPARK-18675][SQL] CTAS for hive serde table shou...

2016-12-11 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16104#discussion_r91886939
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -121,21 +121,61 @@ case class InsertIntoHiveTable(
 return dir
   }
 
-  private def getExternalScratchDir(extURI: URI, hadoopConf: 
Configuration): Path = {
-getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, 
extURI.getPath), hadoopConf)
+  private def getExternalScratchDir(extURI: URI): Path = {
+getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, 
extURI.getPath))
   }
 
-  def getExternalTmpPath(path: Path, hadoopConf: Configuration): Path = {
+  def getExternalTmpPath(path: Path): Path = {
+val hiveVersion = 
externalCatalog.asInstanceOf[HiveExternalCatalog].client.version.fullVersion
+if (hiveVersion.startsWith("0.12") ||
+  hiveVersion.startsWith("0.13") ||
+  hiveVersion.startsWith("0.14") ||
+  hiveVersion.startsWith("1.0")) {
+  oldStyleExternalTempPath(path)
+} else if (hiveVersion.startsWith("1.1") || 
hiveVersion.startsWith("1.2")) {
+  newStyleExternalTempPath(path)
+} else {
+  throw new IllegalStateException("Unsupported hive version: " + 
hiveVersion)
--- End diff --

uh, I see. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16245
  
**[Test build #70005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70005/consoleFull)**
 for PR 16245 at commit 
[`63c50b8`](https://github.com/apache/spark/commit/63c50b8066a77506c6751710d5b5b5edb77ca933).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16104: [SPARK-18675][SQL] CTAS for hive serde table shou...

2016-12-11 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16104#discussion_r91886458
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -121,21 +121,61 @@ case class InsertIntoHiveTable(
 return dir
   }
 
-  private def getExternalScratchDir(extURI: URI, hadoopConf: 
Configuration): Path = {
-getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, 
extURI.getPath), hadoopConf)
+  private def getExternalScratchDir(extURI: URI): Path = {
+getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, 
extURI.getPath))
   }
 
-  def getExternalTmpPath(path: Path, hadoopConf: Configuration): Path = {
+  def getExternalTmpPath(path: Path): Path = {
+val hiveVersion = 
externalCatalog.asInstanceOf[HiveExternalCatalog].client.version.fullVersion
+if (hiveVersion.startsWith("0.12") ||
+  hiveVersion.startsWith("0.13") ||
+  hiveVersion.startsWith("0.14") ||
+  hiveVersion.startsWith("1.0")) {
+  oldStyleExternalTempPath(path)
+} else if (hiveVersion.startsWith("1.1") || 
hiveVersion.startsWith("1.2")) {
+  newStyleExternalTempPath(path)
+} else {
+  throw new IllegalStateException("Unsupported hive version: " + 
hiveVersion)
--- End diff --

We will fail in other places any way, e.g. 
`IsolatedClientLoader.hiveVersion`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-11 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91886381
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -432,6 +458,78 @@ private[spark] class Executor(
   }
 
   /**
+   * Supervises the killing / cancellation of a task by sending the 
interrupted flag, optionally
+   * sending a Thread.interrupt(), and monitoring the task until it 
finishes.
+   */
+  private class TaskReaper(
+  taskRunner: TaskRunner,
+  val interruptThread: Boolean)
+extends Runnable {
+
+private[this] val taskId: Long = taskRunner.taskId
+
+private[this] val killPollingFrequencyMs: Long =
+  conf.getTimeAsMs("spark.task.killPollingFrequency", "10s")
--- End diff --

+1 on the naming suggestion; I'll do this tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16252: [SPARK-18827][Core] Fix cannot read broadcast on disk

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16252
  
**[Test build #70004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70004/consoleFull)**
 for PR 16252 at commit 
[`58acc06`](https://github.com/apache/spark/commit/58acc06148e243420ab12ced77749be1767c4bc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...

2016-12-11 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/16189
  
@lins05, I'll see if there's a way to get a nicer executor exit status to 
be reported back to the driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-11 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16189#discussion_r91886287
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -432,6 +458,78 @@ private[spark] class Executor(
   }
 
   /**
+   * Supervises the killing / cancellation of a task by sending the 
interrupted flag, optionally
+   * sending a Thread.interrupt(), and monitoring the task until it 
finishes.
+   */
+  private class TaskReaper(
+  taskRunner: TaskRunner,
+  val interruptThread: Boolean)
+extends Runnable {
+
+private[this] val taskId: Long = taskRunner.taskId
+
+private[this] val killPollingFrequencyMs: Long =
+  conf.getTimeAsMs("spark.task.killPollingFrequency", "10s")
+
+private[this] val killTimeoutMs: Long = 
conf.getTimeAsMs("spark.task.killTimeout", "2m")
+
+private[this] val takeThreadDump: Boolean =
+  conf.getBoolean("spark.task.threadDumpKilledTasks", true)
+
+override def run(): Unit = {
+  val startTimeMs = System.currentTimeMillis()
+  def elapsedTimeMs = System.currentTimeMillis() - startTimeMs
+  try {
+while (!taskRunner.isFinished && (elapsedTimeMs < killTimeoutMs || 
killTimeoutMs <= 0)) {
+  taskRunner.kill(interruptThread = interruptThread)
--- End diff --

That's a good point. I'll update this tomorrow to only interrupt once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-12-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14638
  
Is there any high-level API that we can use to read hive tables? The 
current hive table reader is so low-level that we have to support features like 
`skip.header.line.count`, and I think it doesn't work well with non-file based 
hive tables.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r91886133
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,25 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Filter(pred, child) =>
+  // Reverses the expressions to get the suffix deterministic 
expressions in the predicate.
+  // E.g., the original expressions are 'a > 1, rand(0), 'b > 2, 'c > 
3.
+  // The reversed expressions are 'c > 3, 'b > 2, rand(0), 'a > 1.
+  // The suffix deterministic expressions are 'c > 3, 'b > 2.
+  val (deterministicExprs, others) = 
splitConjunctivePredicates(pred).reverse
--- End diff --

the split is widely used in optimizer. i think i may rewrite this reverse 
and span to alleviate performance concern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16252: [SPARK-18827][Core] Fix cannot read broadcast on ...

2016-12-11 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/16252

[SPARK-18827][Core] Fix cannot read broadcast on disk

## What changes were proposed in this pull request?
 Fix cannot read broadcast on disk

## How was this patch tested?

Add unit test



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-18827

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16252


commit 58acc06148e243420ab12ced77749be1767c4bc0
Author: Yuming Wang 
Date:   2016-12-12T05:44:20Z

Fix cannot read broadcast on disk




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15915
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15915
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70001/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15915
  
**[Test build #70001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70001/consoleFull)**
 for PR 15915 at commit 
[`b021557`](https://github.com/apache/spark/commit/b02155798061255ef04cf61a911a0ff6467a6a7a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69998/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16135
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16135
  
**[Test build #69998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69998/consoleFull)**
 for PR 16135 at commit 
[`16c47c5`](https://github.com/apache/spark/commit/16c47c5e5ada1ec17555e679fa424d5b93e082c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16219
  
**[Test build #3493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3493/consoleFull)**
 for PR 16219 at commit 
[`0830349`](https://github.com/apache/spark/commit/083034925d068c1c7c9123d97fc3e647da4faee4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16220
  
**[Test build #70003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70003/consoleFull)**
 for PR 16220 at commit 
[`4be4149`](https://github.com/apache/spark/commit/4be4149d81d9860445ce4b53ae5951c1467632f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...

2016-12-11 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16220
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16251
  
**[Test build #70002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70002/consoleFull)**
 for PR 16251 at commit 
[`58a57d4`](https://github.com/apache/spark/commit/58a57d4004c45ff2290b95ec8c70ef95828d379b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16251: [SPARK-18826][SS]Add 'newestFirst' option to File...

2016-12-11 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/16251

[SPARK-18826][SS]Add 'newestFirst' option to FileStreamSource

## What changes were proposed in this pull request?

When starting a stream with a lot of backfill and maxFilesPerTrigger, the 
user could often want to start with most recent files first. This would let you 
keep low latency for recent data and slowly backfill historical data.

This PR adds a new option `newestFirst` to control this behavior. When it's 
true, `FileStreamSource` will sort the files by the modified time from newest 
to oldest, and take the first `maxFilesPerTrigger` files as a new batch.

## How was this patch tested?

The added test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark newest-first

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16251


commit 58a57d4004c45ff2290b95ec8c70ef95828d379b
Author: Shixiong Zhu 
Date:   2016-12-12T05:15:10Z

Add 'newestFirst' option to FileStreamSource




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16248: [SPARK-18810][SPARKR] SparkR install.spark does n...

2016-12-11 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16248#discussion_r91883327
  
--- Diff: R/pkg/R/utils.R ---
@@ -851,3 +851,12 @@ rbindRaws <- function(inputData){
   out[!rawcolumns] <- lapply(out[!rawcolumns], unlist)
   out
 }
+
+# Get basename without extension from URL
+basenameSansExtFromUrl <- function(url) {
--- End diff --

can we use file_path_sans_ext [1] for removing the extension ? I worry we 
might publish it as `.tar.gz` someday and then removing just the last `.` will 
be insufficient

[1] 
https://stat.ethz.ch/R-manual/R-patched/library/tools/html/fileutils.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16248: [SPARK-18810][SPARKR] SparkR install.spark does n...

2016-12-11 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16248#discussion_r91883219
  
--- Diff: R/pkg/R/install.R ---
@@ -104,7 +113,12 @@ install.spark <- function(hadoopVersion = "2.7", 
mirrorUrl = NULL,
   if (tarExists && !overwrite) {
 message("tar file found.")
   } else {
-robustDownloadTar(mirrorUrl, version, hadoopVersion, packageName, 
packageLocalPath)
+if (releaseUrl != "") {
+  message("Downloading from alternate URL:\n- ", releaseUrl)
+  downloadUrl(releaseUrl, packageLocalPath, paste0("Fetch failed from 
", mirrorUrl))
--- End diff --

this should be `releaseUrl` instead of `mirrorUrl` in the `paste0` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16249
  
Can we open a JIRA for this ? Its good to track this change 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16214: [SPARK-18325][SPARKR] Add example for using nativ...

2016-12-11 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16214#discussion_r91881854
  
--- Diff: examples/src/main/r/native-r-package.R ---
@@ -0,0 +1,68 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This example illustrates how to install third-party R packages to 
executors
+# in your SparkR jobs distributed by "spark.lapply".
+#
+# Note: This example will install packages to a temporary directory on 
your machine.
+#   The directory will be removed automatically when the example exit.
+#   You environment should be connected to internet to run this 
example,
+#   otherwise, you should change "repos" to your private repository 
url.
+#   And the environment need to have necessary tools such as gcc to 
compile
+#   and install R package "e1071".
+#
+# To run this example use
+# ./bin/spark-submit examples/src/main/r/native-r-package.R
+
+# Load SparkR library into your R session
+library(SparkR)
+
+# Initialize SparkSession
+sparkR.session(appName = "SparkR-native-r-package-example")
+
+# $example on$
+# The directory where the third-party R packages are installed.
+libDir <- paste0(tempdir(), "/", "Rlib")
+dir.create(libDir)
+
+# Downloaded e1071 package source code to a directory
+packagesDir <- paste0(tempdir(), "/", "packages")
+dir.create(packagesDir)
+download.packages("e1071", packagesDir, repos = 
"https://cran.r-project.org";)
+filename <- list.files(packagesDir, "^e1071")
+packagesPath <- file.path(packagesDir, filename)
+# Add the third-party R package to be downloaded with this Spark job on 
every node.
+spark.addFile(packagesPath)
+
+path <- spark.getSparkFiles(filename)
+costs <- exp(seq(from = log(1), to = log(1000), length.out = 5))
+train <- function(cost) {
+if("e1071" %in% rownames(installed.packages(lib = libDir)) == FALSE) {
+install.packages(path, repos = NULL, type = "source")
--- End diff --

Yeah, we have the package content, but it's source package rather than 
binary package, so we can not use ```library``` to load the package. This is 
the pain point for this example. If we illustrate this example with binary 
package, we should provide scripts for different os version, and it require all 
nodes in users' cluster should have the same architecture. So I use source 
package, I think it's a more universal example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16214
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/7/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16214
  
**[Test build #7 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/7/consoleFull)**
 for PR 16214 at commit 
[`d3ec5fa`](https://github.com/apache/spark/commit/d3ec5fabf686c4a96a5032d716e5ef1eff7fb8c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2016-12-11 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r91881557
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,25 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Filter(pred, child) =>
+  // Reverses the expressions to get the suffix deterministic 
expressions in the predicate.
+  // E.g., the original expressions are 'a > 1, rand(0), 'b > 2, 'c > 
3.
+  // The reversed expressions are 'c > 3, 'b > 2, rand(0), 'a > 1.
+  // The suffix deterministic expressions are 'c > 3, 'b > 2.
+  val (deterministicExprs, others) = 
splitConjunctivePredicates(pred).reverse
--- End diff --

how is the performance of this split, reverse, and span?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r91881499
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -57,6 +60,12 @@ public BufferHolder(UnsafeRow row, int initialSize) {
 this.row.pointTo(buffer, buffer.length);
   }
 
+  public BufferHolder(int initialSizeInBytes) {
--- End diff --

This is a special use of `BufferHolder`. Better to add few comments 
explaining it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r91881279
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,93 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
+val array = ctx.freshName("array")
 
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val primitiveTypeName = if (isPrimitiveArray) 
ctx.primitiveTypeName(et) else ""
+val (preprocess, arrayData, arrayWriter) =
+  genArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+ev.copy(code =
+  preprocess +
   ctx.splitExpressions(
 ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
+evals.zipWithIndex.map { case (eval, i) =>
+  eval.code +
+(if (isPrimitiveArray) {
+  (if (!children(i).nullable) {
+s"\n$arrayWriter.write($i, ${eval.value});"
+  } else {
+s"""
+if (${eval.isNull}) {
+  $arrayWriter.setNull$primitiveTypeName($i);
+} else {
+  $arrayWriter.write($i, ${eval.value});
+}
+   """
+  })
 } else {
-  $values[$i] = ${eval.value};
-}
-   """
+  s"""
+  if (${eval.isNull}) {
+$array[$i] = null;
+  } else {
+$array[$i] = ${eval.value};
+  }
+ """
+})
 }) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object genArrayData {
--- End diff --

Name convention:  genArrayData -> GenArrayData.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...

2016-12-11 Thread daisukebe

Github user daisukebe commented on the issue:

https://github.com/apache/spark/pull/16195
  
@vanzin , 2.0 already has this capability per 
https://issues.apache.org/jira/browse/SPARK-529, thus my patch targets on 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16214: [SPARK-18325][SPARKR] Add example for using nativ...

2016-12-11 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16214#discussion_r91881006
  
--- Diff: docs/sparkr.md ---
@@ -472,21 +472,17 @@ should fit in a single machine. If that is not the 
case they can do something li
 `dapply`
 
 
-{% highlight r %}
-# Perform distributed training of multiple models with spark.lapply. Here, 
we pass
-# a read-only list of arguments which specifies family the generalized 
linear model should be.
--- End diff --

Sounds good, updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r91880888
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -18,8 +18,11 @@
 package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
--- End diff --

and this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r91880870
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -18,8 +18,11 @@
 package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.Platform;
 
+import static 
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.calculateHeaderPortionInBytes;
--- End diff --

Unnecessary import?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...

2016-12-11 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15915#discussion_r91880781
  
--- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala 
---
@@ -223,8 +222,10 @@ private[spark] abstract class MemoryManager(
   case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize
 }
 val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / 
safetyFactor)
-val default = math.min(maxPageSize, math.max(minPageSize, size))
-conf.getSizeAsBytes("spark.buffer.pageSize", default)
+val maxPageSize = math.min(64L * minPageSize, math.max(minPageSize, 
size))
+val userSetting = conf.getSizeAsBytes("spark.buffer.pageSize")
+// In case of too large page size.
+math.min(userSetting, maxPageSize)
   }
--- End diff --

@JoshRosen The `SparkEnv.memoryManager.pageSizeBytes` returns `Long`, if we 
reuse it as chunk size, there is still underlying integer overflow, isn't it? 
Here, I restricted the upper limit of page size in case of too large.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SQL][WIP] Add optimizer rule to reorder Filter predicat...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16245
  
**[Test build #6 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)**
 for PR 16245 at commit 
[`66a3d98`](https://github.com/apache/spark/commit/66a3d983d978b902858a34dde992640a489f5351).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16214
  
**[Test build #7 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/7/consoleFull)**
 for PR 16214 at commit 
[`d3ec5fa`](https://github.com/apache/spark/commit/d3ec5fabf686c4a96a5032d716e5ef1eff7fb8c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15915
  
**[Test build #70001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70001/consoleFull)**
 for PR 15915 at commit 
[`b021557`](https://github.com/apache/spark/commit/b02155798061255ef04cf61a911a0ff6467a6a7a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16135
  
**[Test build #69998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69998/consoleFull)**
 for PR 16135 at commit 
[`16c47c5`](https://github.com/apache/spark/commit/16c47c5e5ada1ec17555e679fa424d5b93e082c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16135: [SPARK-18700][SQL] Add StripedLock for each table...

2016-12-11 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/16135#discussion_r91879183
  
--- Diff: 
core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala ---
@@ -105,6 +111,7 @@ object HiveCatalogMetrics extends Source {
 METRIC_FILE_CACHE_HITS.dec(METRIC_FILE_CACHE_HITS.getCount())
 METRIC_HIVE_CLIENT_CALLS.dec(METRIC_HIVE_CLIENT_CALLS.getCount())
 
METRIC_PARALLEL_LISTING_JOB_COUNT.dec(METRIC_PARALLEL_LISTING_JOB_COUNT.getCount())
+
METRIC_DATASOUCE_TABLE_CACHE_HITS.dec(METRIC_DATASOUCE_TABLE_CACHE_HITS.getCount())
--- End diff --

e...sorry, this new added metric will delete next patch like before comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-12-11 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14638
  
Thank you so much, @jamartinh , @srowen , @HyukjinKwon , and @gatorsmile .

We can distinguish the two existing problems separately here.

First, **a)** Spark returns incorrect result for an existing Hive table 
already with `skip.header.line.count` table property. This is the most common 
use case which this issue aimed to solve.

Second, more ridiculously, **b)** Spark can create a table with 
`skip.header.line.count` table property and only Hive returns the correct 
result from that table.

**SPARK (Current master branch)**
```scala
scala> sql("CREATE TABLE t2 (id INT, value VARCHAR(10)) ROW FORMAT 
DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES('skip.header.line.count'='1')")

scala> sql("LOAD DATA LOCAL INPATH '/data/test.csv' OVERWRITE INTO TABLE 
t2")

scala> sql("SELECT * FROM t2").show
++-+
|  id|value|
++-+
|null|   c2|
|   1|a|
|   2|b|
++-+
```
**Hive**
```scala
hive> select * from t2;
OK
1   a
2   b
```

@gatorsmile . I totally agree on the Apache Spark development direction.  
But, IMO, `TBLPROPERTIES` or `OPTION` is not a proper issue in this PR. It's 
because this PR only updates `TableReader.scala` to support the existing table 
property, case **a)**. For `TBLPROPERTIES`, I simply used that because it's 
already supported on Spark. I can update the PR description in order to focus 
on **a)** instead of **b)**.

Someday later, Apache Spark may delete(or block) `TBLPROPERTIES` SQL syntax 
in favor of `OPTION` syntax. It's okay. It's just a kind of regression on 
purpose. No problem at all. However, even in that case, we had better read the 
Hive table with `skip.header.line.count` correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16086: [SPARK-18653][SQL] Fix incorrect space padding for unico...

2016-12-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/16086
  
I am thinking about an simpler approach. However, it is fine to close for 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15297: [SPARK-9862]Handling data skew

2016-12-11 Thread YuhuWang2002

Github user YuhuWang2002 commented on the issue:

https://github.com/apache/spark/pull/15297
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15620: [SPARK-18091] [SQL] Deep if expressions cause Generated ...

2016-12-11 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15620
  
the test is good now 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.0-test-maven-hadoop-2.2/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11045: [SPARK-8321][SQL][WIP] Authorization Support(on all oper...

2016-12-11 Thread winningsix

Github user winningsix commented on the issue:

https://github.com/apache/spark/pull/11045
  
@yaooqinn yes, the validation is working on server side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-12-11 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/15915
  
@srowen Sorry for the delay, I will update it as soon as possible on the 
basis of comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15904: [SPARK-18470][STREAMING][WIP] Provide Spark Streaming Mo...

2016-12-11 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/15904
  
@vanzin Sorry for the delay, I will update as soon as possible on the basis 
of your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...

2016-12-11 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16240
  
Possible optimization: Instead of conversions using `to`, we can use 
`Builder`s. This way we could get rid of the conversion overhead. This would 
require adding a new codegen method that would operate similarly to 
`MapObjects` but use a provided `Builder` to build the collection directly.

I will wait for a response to this PR before attempting any more 
modifications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69996/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16249
  
**[Test build #69996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69996/consoleFull)**
 for PR 16249 at commit 
[`d237526`](https://github.com/apache/spark/commit/d237526a2aec8f2e5f57172f9933c8c2d1963d39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16180
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69997/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16180
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16180
  
**[Test build #69997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69997/consoleFull)**
 for PR 16180 at commit 
[`0b31d6c`](https://github.com/apache/spark/commit/0b31d6cc2bc245f5270b0de5f33cc5a66ad9f135).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13909
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #69995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69995/consoleFull)**
 for PR 13909 at commit 
[`438944b`](https://github.com/apache/spark/commit/438944b0cc79d824898d44032674cb77395b59fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed

2016-12-11 Thread anabranch

Github user anabranch commented on the issue:

https://github.com/apache/spark/pull/16180
  
@srowen completed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16180
  
**[Test build #69997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69997/consoleFull)**
 for PR 16180 at commit 
[`0b31d6c`](https://github.com/apache/spark/commit/0b31d6cc2bc245f5270b0de5f33cc5a66ad9f135).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Di...

2016-12-11 Thread anabranch

Github user anabranch commented on a diff in the pull request:

https://github.com/apache/spark/pull/16180#discussion_r91865764
  
--- Diff: docs/programming-guide.md ---
@@ -1345,14 +1345,17 @@ therefore be efficiently supported in parallel. 
They can be used to implement co
 MapReduce) or sums. Spark natively supports accumulators of numeric types, 
and programmers
 can add support for new types.
 
-If accumulators are created with a name, they will be
-displayed in Spark's UI. This can be useful for understanding the progress 
of
-running stages (NOTE: this is not yet supported in Python).
+As a user, you can create `Accumulators` that are both named and unnamed. 
Named accumulators will display in Spark's UI along with their running totals 
during execution. As seen in the image below, a named accumulator (in this 
instance `counter`) will display
--- End diff --

made these clarifications. I think it is important however to call out that 
they can be named or unnamed, so I just rephrased that.  Just because something 
can have a name is not clear enough to me that it can also be unnamed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69994/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R

2016-12-11 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16249
  
**[Test build #69994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69994/consoleFull)**
 for PR 16249 at commit 
[`550eaa9`](https://github.com/apache/spark/commit/550eaa9e551f171cd4bbda3cf4ff7bb1c70a61fd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16250
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...

2016-12-11 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69991/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 233 matches

Mail list logo