date:20160819

[GitHub] spark issue #14625: [SPARK-17045] [SQL] Build/move Join-related test cases i...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64120/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14625: [SPARK-17045] [SQL] Build/move Join-related test cases i...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14625
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14625: [SPARK-17045] [SQL] Build/move Join-related test cases i...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14625
  
**[Test build #64120 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64120/consoleFull)**
 for PR 14625 at commit 
[`bf55624`](https://github.com/apache/spark/commit/bf556240e0f01cdd12f53a9407d8811ec30380d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14635: [SPARK-17052] [SQL] Remove Duplicate Test Cases auto_joi...

2016-08-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14635
  
cc @cloud-fan @rxin Could you check whether this PR is reasonable? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14727: [SPARK-17166] [SQL] Store Table Properties in CTAS that ...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14727
  
**[Test build #64125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64125/consoleFull)**
 for PR 14727 at commit 
[`bffc412`](https://github.com/apache/spark/commit/bffc412b4ce50ffc63da0f6b05d82f7dd52a97fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14727: [SPARK-17166] [SQL] Store Table Properties in CTAS that ...

2016-08-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14727
  
cc @cloud-fan @yhuai This is what we discussed in another PR. Could you 
please review whether this is a right fix? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14577: [SPARK-16986][WEB UI] Make 'Started' time, 'Completed' t...

2016-08-19 Thread Sherry302

Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14577
  
Hi, @srowen Thanks a lot for the comments. Sorry for the late reply. You 
are right. I will check how other pages format the date. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14727: [SPARK-17166] [SQL] Store Table Properties Specif...

2016-08-19 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14727

[SPARK-17166] [SQL] Store Table Properties Specified in CTAS after 
Conversion to Data Source Tables

## What changes were proposed in this pull request?

CTAS lost table properties after conversion to data source tables. For 
example, 
```SQL
CREATE TABLE t TBLPROPERTIES('prop1' = 'c', 'prop2' = 'd') AS SELECT 1 as 
a, 1 as b
```
The output of `DESC FORMATTED t` does not have the related properties. 
```
|Table Parameters:   |  
|   |
|  rawDataSize   |-1
|   |
|  numFiles  |1 
|   |
|  transient_lastDdlTime |1471670983
|   |
|  totalSize |496   
|   |
|  spark.sql.sources.provider|parquet   
|   |
|  EXTERNAL  |FALSE 
|   |
|  COLUMN_STATS_ACCURATE |false 
|   |
|  numRows   |-1
|   |
||  
|   |
|# Storage Information   |  
|   |
|SerDe Library:  
|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
   |   |
|InputFormat:
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat  
   |   |
|OutputFormat:   
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat 
   |   |
|Compressed: |No
|   |
|Storage Desc Parameters:|  
|   |
|  serialization.format  |1 
|   |
|  path  
|file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/warehouse-f3aa2927-6464-4a35-a715-1300dde6c614/t|
   |
```

After the fix, the properties specified by users are stored as serde 
properties, since the table properties are used for storing table schemas and 
system generated properties. 

```
|Table Parameters:   |  
|   |
|  rawDataSize   |-1
|   |
|  numFiles  |1 
|   |
|  transient_lastDdlTime |1471672182
|   |
|  totalSize |496   
|   |
|  spark.sql.sources.provider|parquet   
|   |
|  EXTERNAL  |FALSE 
|   |
|  COLUMN_STATS_ACCURATE |false 
|   |
|  numRows   |-1
|   |
||

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14682
  
**[Test build #64124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64124/consoleFull)**
 for PR 14682 at commit 
[`e7fe68b`](https://github.com/apache/spark/commit/e7fe68b002594a294b199317be3e2d8fc250eb4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14697: [SPARK-17124][SQL] RelationalGroupedDataset.agg should p...

2016-08-19 Thread petermaxlee

Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/14697
  
I  updated the description.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14682: [SPARK-17104][SQL] LogicalRelation.newInstance sh...

2016-08-19 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14682#discussion_r75573056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -79,11 +79,18 @@ case class LogicalRelation(
   /** Used to lookup original attribute capitalization */
   val attributeMap: AttributeMap[AttributeReference] = 
AttributeMap(output.map(o => (o, o)))
 
-  def newInstance(): this.type =
+  /**
+   * Returns a new instance of this LogicalRelation. According to the 
semantics of
+   * MultiInstanceRelation, this method should returns a copy of this 
object with
+   * unique expression ids. Thus we don't respect the 
`expectedOutputAttributes` and
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-19 Thread tejasapatil

Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/14726#discussion_r75573049
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -22,15 +22,21 @@
 import com.google.common.io.ByteStreams;
 import com.google.common.io.Closeables;
 
+import org.apache.spark.SparkEnv;
 import org.apache.spark.serializer.SerializerManager;
 import org.apache.spark.storage.BlockId;
 import org.apache.spark.unsafe.Platform;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Reads spill files written by {@link UnsafeSorterSpillWriter} (see that 
class for a description
  * of the file format).
  */
 public final class UnsafeSorterSpillReader extends UnsafeSorterIterator 
implements Closeable {
+  private static final Logger logger = 
LoggerFactory.getLogger(UnsafeSorterSpillReader.class);
+  private static final int DEFAULT_BUFFER_SIZE_BYTES = 1024 * 1024; // 1 MB
--- End diff --

@rxin : In response to [0], I have changed to 1 MB. As per my experiments, 
1 MB gave good perf and we are using it as default for all prod jobs.

One concern / proposal: 

With the change, UnsafeSorterSpillReader would consume more memory than 
before as the buffer would increase from 8k to 1 MB. Overall per 
UnsafeSorterSpillReader object footprint would grow from 2.5 MB to 3.6 MB (I 
have profiled to the number. See [1]). In case of job(s) which spill a lot, 
there would be lot of these spill readers created (in the screenshot, there 
were 400+ readers). Current merging approach is to open all the spill files at 
the same time and merge them all at once using a priority queue. Having lots of 
these objects in memory can lead to OOMs as there is no accounting for buffers 
allocated inside UnsafeSorterSpillReader (even without this change, snappy 
already had its own buffers for compressed and uncompressed data). Also, from 
disk point of view, having lots of file open at the same time would lead to 
random seeks and won't play well with OS's cache for disk reads. One might say 
that users should tune the job so that the spills are lesser but it might not 
be o
 bvious for people who do not understand the system internals. Also, for 
pipelines the data changes everyday and one setting may not work everytime. 

Should we add some kinda hierarchical merging wherein spill files are 
iteratively merged in batches ? It could be turned on when there are say more 
than 100 spill files to be merged. AFAIK, Hadoop has this.

[0] : https://github.com/apache/spark/pull/14475#discussion_r75440822
[1] : https://postimg.org/image/cs5zr6lyx/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14697: [SPARK-17124][SQL] RelationalGroupedDataset.agg should p...

2016-08-19 Thread petermaxlee

Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/14697
  
For example, run both count and sum for a column. Let me update the 
description.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14475
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14475
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64118/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14475
  
**[Test build #64118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64118/consoleFull)**
 for PR 14475 at commit 
[`950bb21`](https://github.com/apache/spark/commit/950bb21d1f8f3e98b6a8ef00606c9b6c3e30f659).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14709: [SPARK-17150][SQL] Support SQL generation for inl...

2016-08-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14709


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14709: [SPARK-17150][SQL] Support SQL generation for inline tab...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14709
  
thanks, merging to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14475
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14475
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64117/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14697: [SPARK-17124][SQL] RelationalGroupedDataset.agg should p...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14697
  
what do you mean by `allow multiple aggregates per column` in the title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14475
  
**[Test build #64117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64117/consoleFull)**
 for PR 14475 at commit 
[`6b8fc48`](https://github.com/apache/spark/commit/6b8fc487dd5324ae589d75d271da18c54110cf4a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14692: [SPARK-17115] [SQL] decrease the threshold when s...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14692#discussion_r75572921
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -584,15 +584,18 @@ class CodegenContext {
* @param expressions the codes to evaluate expressions.
*/
   def splitExpressions(row: String, expressions: Seq[String]): String = {
-if (row == null) {
+if (row == null || currentVars != null) {
--- End diff --

When will `row == null`? I understand `currentVars != null` means we are in 
whole stage codegen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-08-19 Thread Sherry302

Github user Sherry302 commented on the issue:

https://github.com/apache/spark/pull/14659
  
Hi, @srowen . Thank you so much for the review. Sorry for the test
failure and late update. The failure reasons are that âjobIDâ were
none or there was no âspark.app.nameâ in sparkConf. I have updated the 
PR to set
default values to âjobIDâ and âspark.app.nameâ. When a real 
application runs on
Spark, it will always have âjobIDâ and âspark.app.nameâ. 

What's the use case for this?
When users run Spark applications on Yarn on HDFS, Sparkâs
caller contexts will be written into hdfs-audit.log. The Spark caller 
contexts
are JobID_stageID_stageAttemptId_taskID_attemptNumbe and applicationsâ 
name. 

The caller context can help users to better diagnose and understand how 
specific
applications impacting parts of the Hadoop system and potential problems 
they
may be creating (e.g. overloading NN). As HDFS mentioned in HDFS-9184, for a
given HDFS operation, it's very helpful to track which upper level job 
issues
it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14726: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14726
  
**[Test build #64123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64123/consoleFull)**
 for PR 14726 at commit 
[`c4f37b6`](https://github.com/apache/spark/commit/c4f37b6c8d3f1a8a565b1f215f55a501edece778).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/14475
  
Continuing to https://github.com/apache/spark/pull/14726


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-19 Thread tejasapatil

GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/14726

[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`

## What changes were proposed in this pull request?

Jira: https://issues.apache.org/jira/browse/SPARK-16862

`BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k 
buffer to read data off disk. This PR makes it configurable to improve on disk 
reads. I have made the default value to be 1 MB as with that value I observed 
improved performance.

## How was this patch tested?

I am relying on the existing unit tests.

## Performance

After deploying this change to prod and setting the config to 1 mb, there 
was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark spill_buffer_2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14726.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14726


commit c4f37b6c8d3f1a8a565b1f215f55a501edece778
Author: Tejas Patil 
Date:   2016-08-20T05:06:03Z

[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...

2016-08-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14155
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14712
  
a high-level question: Looks like the current design depends on some 
features of hive metastore, e.g. the `STATS_GENERATED_VIA_STATS_TASK` flag. Is 
it possible that we just treat hive metastore as a persistent level? So that 
the statistics can still work if Spark SQL has its own metastore in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14155: [SPARK-16498][SQL] move hive hack for data source table ...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14155
  
**[Test build #64122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64122/consoleFull)**
 for PR 14155 at commit 
[`38b838a`](https://github.com/apache/spark/commit/38b838a9d27d5e11bad5f5e7040fe2d6d2e56216).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75572799
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,14 +90,30 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
+val needUpdate = new mutable.HashMap[String, String]()
+if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+  needUpdate += (AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)
+}
+if (!noscan) {
+  val oldRowCount = 
tableParameters.get(AnalyzeTableCommand.ROW_COUNT).map(_.toLong)
+.getOrElse(-1L)
+  val newRowCount = sparkSession.table(tableName).count()
+
+  if (newRowCount >= 0 && newRowCount != oldRowCount) {
+needUpdate += (AnalyzeTableCommand.ROW_COUNT -> 
newRowCount.toString)
+  }
+}
+// Update the Hive metastore if the above parameters of the table 
is different than those
 // recorded in the Hive metastore.
 // This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+if (needUpdate.nonEmpty) {
+  // need to set this parameter so that we can store other 
parameters like "numRows" into
+  // Hive metastore
+  
needUpdate.put(AnalyzeTableCommand.STATS_GENERATED_VIA_STATS_TASK,
--- End diff --

@viirya yeah, thanks for the advice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14682: [SPARK-17104][SQL] LogicalRelation.newInstance sh...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14682#discussion_r75572798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -79,11 +79,18 @@ case class LogicalRelation(
   /** Used to lookup original attribute capitalization */
   val attributeMap: AttributeMap[AttributeReference] = 
AttributeMap(output.map(o => (o, o)))
 
-  def newInstance(): this.type =
+  /**
+   * Returns a new instance of this LogicalRelation. According to the 
semantics of
+   * MultiInstanceRelation, this method should returns a copy of this 
object with
+   * unique expression ids. Thus we don't respect the 
`expectedOutputAttributes` and
--- End diff --

update the doc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14724: [SPARK-17162] Range does not support SQL generati...

2016-08-19 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14724#discussion_r75572751
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -205,6 +205,9 @@ class SQLBuilder private (
 case p: ScriptTransformation =>
   scriptTransformationToSQL(p)
 
+case Range(start, end, step, numPartitions, output) =>
+  s"SELECT id AS `${output.head.name}` FROM range($start, $end, $step, 
$numPartitions)"
--- End diff --

while you are at it, can you move this into a toSQL function in Range?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75572741
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -175,7 +127,8 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 } else {
   val qualifiedTable =
 MetastoreRelation(
-  qualifiedTableName.database, qualifiedTableName.name)(table, 
client, sparkSession)
+  qualifiedTableName.database, qualifiedTableName.name)(
+  table.copy(provider = Some("hive")), client, sparkSession)
--- End diff --

Then we will restore table metadata from table properties twice. As this 
class will be removed soon, I don't want to change too much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14155: [SPARK-16498][SQL] move hive hack for data source...

2016-08-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14155#discussion_r75572713
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -200,22 +375,77 @@ private[spark] class HiveExternalCatalog(client: 
HiveClient, hadoopConf: Configu
* Alter a table whose name that matches the one specified in 
`tableDefinition`,
* assuming the table exists.
*
-   * Note: As of now, this only supports altering table properties, serde 
properties,
-   * and num buckets!
+   * Note: As of now, this doesn't support altering table schema, 
partition column names and bucket
+   * specification. We will ignore them even if users do specify different 
values for these fields.
*/
   override def alterTable(tableDefinition: CatalogTable): Unit = 
withClient {
 assert(tableDefinition.identifier.database.isDefined)
 val db = tableDefinition.identifier.database.get
 requireTableExists(db, tableDefinition.identifier.table)
-client.alterTable(tableDefinition)
+verifyTableProperties(tableDefinition)
+
+if (tableDefinition.provider == Some("hive") || 
tableDefinition.tableType == VIEW) {
+  client.alterTable(tableDefinition)
+} else {
+  val oldDef = client.getTable(db, tableDefinition.identifier.table)
+  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
+  // to retain the spark specific format if it is.
+  // Also add table meta properties to table properties, to retain the 
data source table format.
+  val newDef = tableDefinition.copy(
+schema = oldDef.schema,
+partitionColumnNames = oldDef.partitionColumnNames,
+bucketSpec = oldDef.bucketSpec,
+properties = tableMetadataToProperties(tableDefinition) ++ 
tableDefinition.properties)
+
+  client.alterTable(newDef)
+}
   }
 
   override def getTable(db: String, table: String): CatalogTable = 
withClient {
-client.getTable(db, table)
+restoreTableMetadata(client.getTable(db, table))
   }
 
   override def getTableOption(db: String, table: String): 
Option[CatalogTable] = withClient {
-client.getTableOption(db, table)
+client.getTableOption(db, table).map(restoreTableMetadata)
+  }
+
+  /**
+   * Restores table metadata from the table properties if it's a datasouce 
table. This method is
+   * kind of a opposite version of [[createTable]].
+   *
+   * It reads table schema, provider, partition column names and bucket 
specification from table
+   * properties, and filter out these special entries from table 
properties.
+   */
+  private def restoreTableMetadata(table: CatalogTable): CatalogTable = {
+if (table.tableType == VIEW) {
+  table
+} else {
+  getProviderFromTableProperties(table).map { provider =>
+// SPARK-15269: Persisted data source tables always store the 
location URI as a storage
+// property named "path" instead of standard Hive `dataLocation`, 
because Hive only
+// allows directory paths as location URIs while Spark SQL data 
source tables also
+// allows file paths. So the standard Hive `dataLocation` is 
meaningless for Spark SQL
+// data source tables.
+// Spark SQL may also save external data source in Hive compatible 
format when
+// possible, so that these tables can be directly accessed by 
Hive. For these tables,
+// `dataLocation` is still necessary. Here we also check for input 
format because only
+// these Hive compatible tables set this field.
+val storage = if (table.tableType == EXTERNAL && 
table.storage.inputFormat.isEmpty) {
+  table.storage.copy(locationUri = None)
+} else {
+  table.storage
+}
+table.copy(
+  storage = storage,
+  schema = getSchemaFromTableProperties(table),
+  provider = Some(provider),
+  partitionColumnNames = 
getPartitionColumnsFromTableProperties(table),
+  bucketSpec = getBucketSpecFromTableProperties(table),
+  properties = getOriginalTableProperties(table))
--- End diff --

The previous code also store options to serde properties, I'm not going to 
fix everything in this PR, and I'm not sure if it's a real problem, but let's 
continue the discussion in follow-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with

[GitHub] spark pull request #11502: [SPARK-13659] Refactor BlockStore put*() APIs to ...

2016-08-19 Thread pzz2011

Github user pzz2011 commented on a diff in the pull request:

https://github.com/apache/spark/pull/11502#discussion_r75572673
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -432,98 +432,105 @@ private[spark] class BlockManager(
 logDebug(s"Block $blockId was not found")
 None
   case Some(info) =>
-val level = info.level
-logDebug(s"Level for block $blockId is $level")
-
-// Look for the block in memory
-if (level.useMemory) {
-  logDebug(s"Getting block $blockId from memory")
-  val result = if (asBlockResult) {
-memoryStore.getValues(blockId).map { iter =>
-  val ci = CompletionIterator[Any, Iterator[Any]](iter, 
releaseLock(blockId))
-  new BlockResult(ci, DataReadMethod.Memory, info.size)
-}
-  } else {
-memoryStore.getBytes(blockId)
-  }
-  result match {
-case Some(values) =>
-  return result
-case None =>
-  logDebug(s"Block $blockId not found in memory")
-  }
+doGetLocal(blockId, info, asBlockResult)
+}
+  }
+
+  private def doGetLocal(
+  blockId: BlockId,
+  info: BlockInfo,
+  asBlockResult: Boolean): Option[Any] = {
+val level = info.level
+logDebug(s"Level for block $blockId is $level")
+
+// Look for the block in memory
+if (level.useMemory) {
+  logDebug(s"Getting block $blockId from memory")
+  val result = if (asBlockResult) {
--- End diff --

Hey? excuse me, @JoshRosen  can I ask you a question(may be a little 
stupid...)? what's the `asBlockResult` mean? why here should use it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13680
  
**[Test build #64121 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64121/consoleFull)**
 for PR 13680 at commit 
[`f418f9c`](https://github.com/apache/spark/commit/f418f9cf7a35ef8c2c8ed93cf73487aac275e772).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14625: [SPARK-17045] [SQL] Build/move Join-related test cases i...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14625
  
**[Test build #64120 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64120/consoleFull)**
 for PR 14625 at commit 
[`bf55624`](https://github.com/apache/spark/commit/bf556240e0f01cdd12f53a9407d8811ec30380d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14683: [SPARK-16968]Document additional options in jdbc Writer

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14683
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14683: [SPARK-16968]Document additional options in jdbc Writer

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14683
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64119/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14683: [SPARK-16968]Document additional options in jdbc Writer

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14683
  
**[Test build #64119 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64119/consoleFull)**
 for PR 14683 at commit 
[`8595ece`](https://github.com/apache/spark/commit/8595ece40d18611b003b70f4e62d90c615349abd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14705: [SPARK-16508][SparkR] Fix CRAN undocumented/duplicated a...

2016-08-19 Thread junyangq

Github user junyangq commented on the issue:

https://github.com/apache/spark/pull/14705
  
Thanks @felixcheung!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14683: [SPARK-16968]Document additional options in jdbc Writer

2016-08-19 Thread GraceH

Github user GraceH commented on the issue:

https://github.com/apache/spark/pull/14683
  
@srowen. I have updated the patch accordingly. please let me know your 
comments. anything missing, please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75572040
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +170,57 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  test("generate table-level statistics") {
+def checkTableStats(
+statsSeq: Seq[Statistics],
+sizeInBytes: Int,
+estimatedSize: Int,
+rowCount: Int): Unit = {
+  assert(statsSeq.size === 1)
+  assert(statsSeq.head.sizeInBytes === BigInt(sizeInBytes))
+  assert(statsSeq.head.estimatedSize === Some(BigInt(estimatedSize)))
+  assert(statsSeq.head.rowCount === Some(BigInt(rowCount)))
+}
+
+sql("CREATE TABLE analyzeTable (key STRING, value STRING)").collect()
+sql("CREATE TABLE parquetTable (key STRING, value STRING) STORED AS 
PARQUET").collect()
+sql("CREATE TABLE orcTable (key STRING, value STRING) STORED AS 
ORC").collect()
+
+sql("INSERT INTO TABLE analyzeTable SELECT * FROM src").collect()
+sql("INSERT INTO TABLE parquetTable SELECT * FROM src").collect()
+sql("INSERT INTO TABLE orcTable SELECT * FROM src").collect()
+sql("INSERT INTO TABLE orcTable SELECT * FROM src").collect()
+
+sql("ANALYZE TABLE analyzeTable COMPUTE STATISTICS")
+sql("ANALYZE TABLE parquetTable COMPUTE STATISTICS")
+sql("ANALYZE TABLE orcTable COMPUTE STATISTICS")
+
+var df = sql("SELECT * FROM analyzeTable")
+var stats = df.queryExecution.analyzed.collect { case mr: 
MetastoreRelation =>
+  mr.statistics
+}
+checkTableStats(stats, 5812, 5812, 500)
+
+// test statistics of LogicalRelation inherited from MetastoreRelation
+df = sql("SELECT * FROM parquetTable")
+stats = df.queryExecution.analyzed.collect { case rel: LogicalRelation 
=>
+  rel.statistics
+}
+checkTableStats(stats, 4236, 4236, 500)
+
+sql("SET spark.sql.hive.convertMetastoreOrc=true").collect()
--- End diff --

Please use `withSQLConf` .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64116/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14719
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14683: [SPARK-16968]Document additional options in jdbc Writer

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14683
  
**[Test build #64119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64119/consoleFull)**
 for PR 14683 at commit 
[`8595ece`](https://github.com/apache/spark/commit/8595ece40d18611b003b70f4e62d90c615349abd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14719
  
**[Test build #64116 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64116/consoleFull)**
 for PR 14719 at commit 
[`91cb915`](https://github.com/apache/spark/commit/91cb915b4e6c3c4d24fab3f1e772e7e361d4c088).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75571934
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -342,7 +342,9 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
 
   logicalRelation
 }
-result.copy(expectedOutputAttributes = Some(metastoreRelation.output))
+val logicalRel = result.copy(expectedOutputAttributes = 
Some(metastoreRelation.output))
+logicalRel.inheritedStats = Some(metastoreRelation.statistics)
--- End diff --

I agreed with @cloud-fan. This looks hacky.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64114/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14682
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14682
  
**[Test build #64114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64114/consoleFull)**
 for PR 14682 at commit 
[`e243323`](https://github.com/apache/spark/commit/e243323cb04880c20fb40e1aed8b4a28022d5540).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14475: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-19 Thread tejasapatil

Github user tejasapatil closed the pull request at:

https://github.com/apache/spark/pull/14475


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14475
  
**[Test build #64118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64118/consoleFull)**
 for PR 14475 at commit 
[`950bb21`](https://github.com/apache/spark/commit/950bb21d1f8f3e98b6a8ef00606c9b6c3e30f659).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75571723
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,14 +90,30 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
+val needUpdate = new mutable.HashMap[String, String]()
+if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+  needUpdate += (AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)
+}
+if (!noscan) {
+  val oldRowCount = 
tableParameters.get(AnalyzeTableCommand.ROW_COUNT).map(_.toLong)
+.getOrElse(-1L)
+  val newRowCount = sparkSession.table(tableName).count()
+
+  if (newRowCount >= 0 && newRowCount != oldRowCount) {
+needUpdate += (AnalyzeTableCommand.ROW_COUNT -> 
newRowCount.toString)
+  }
+}
+// Update the Hive metastore if the above parameters of the table 
is different than those
 // recorded in the Hive metastore.
 // This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+if (needUpdate.nonEmpty) {
+  // need to set this parameter so that we can store other 
parameters like "numRows" into
+  // Hive metastore
+  
needUpdate.put(AnalyzeTableCommand.STATS_GENERATED_VIA_STATS_TASK,
--- End diff --

Looks `STATS_GENERATED_VIA_STATS_TASK` is only needed to set when `noscan` 
is `false`. So is it better to move this to above block `if (!noscan)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75571695
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,14 +90,30 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
+val needUpdate = new mutable.HashMap[String, String]()
+if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+  needUpdate += (AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)
+}
+if (!noscan) {
+  val oldRowCount = 
tableParameters.get(AnalyzeTableCommand.ROW_COUNT).map(_.toLong)
+.getOrElse(-1L)
+  val newRowCount = sparkSession.table(tableName).count()
+
+  if (newRowCount >= 0 && newRowCount != oldRowCount) {
+needUpdate += (AnalyzeTableCommand.ROW_COUNT -> 
newRowCount.toString)
+  }
+}
+// Update the Hive metastore if the above parameters of the table 
is different than those
 // recorded in the Hive metastore.
 // This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+if (needUpdate.nonEmpty) {
+  // need to set this parameter so that we can store other 
parameters like "numRows" into
+  // Hive metastore
+  
needUpdate.put(AnalyzeTableCommand.STATS_GENERATED_VIA_STATS_TASK,
--- End diff --

I saw it. The value of `numRows` is `-1`, as shown in 
https://github.com/apache/spark/pull/14712#discussion_r75540560


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75571686
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,14 +90,30 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
+val needUpdate = new mutable.HashMap[String, String]()
+if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+  needUpdate += (AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)
+}
+if (!noscan) {
+  val oldRowCount = 
tableParameters.get(AnalyzeTableCommand.ROW_COUNT).map(_.toLong)
+.getOrElse(-1L)
+  val newRowCount = sparkSession.table(tableName).count()
+
+  if (newRowCount >= 0 && newRowCount != oldRowCount) {
+needUpdate += (AnalyzeTableCommand.ROW_COUNT -> 
newRowCount.toString)
+  }
+}
+// Update the Hive metastore if the above parameters of the table 
is different than those
 // recorded in the Hive metastore.
 // This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+if (needUpdate.nonEmpty) {
+  // need to set this parameter so that we can store other 
parameters like "numRows" into
+  // Hive metastore
+  
needUpdate.put(AnalyzeTableCommand.STATS_GENERATED_VIA_STATS_TASK,
--- End diff --

The code comment can be improved a little. Above comment looks better than 
the one in current code change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14475
  
**[Test build #64117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64117/consoleFull)**
 for PR 14475 at commit 
[`6b8fc48`](https://github.com/apache/spark/commit/6b8fc487dd5324ae589d75d271da18c54110cf4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14721: [SPARK-17158][SQL] Change error message for out o...

2016-08-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14721


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14635: [SPARK-17052] [SQL] Remove Duplicate Test Cases auto_joi...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14635
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64115/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14635: [SPARK-17052] [SQL] Remove Duplicate Test Cases auto_joi...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14635
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14635: [SPARK-17052] [SQL] Remove Duplicate Test Cases auto_joi...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14635
  
**[Test build #64115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64115/consoleFull)**
 for PR 14635 at commit 
[`8b8725c`](https://github.com/apache/spark/commit/8b8725cb28f8f4564f4ee0b168363282df09564f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14721
  
Merging in master/2.0. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13428: [SPARK-12666][CORE] SparkSubmit packages fix for when 'd...

2016-08-19 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/13428
  
LGTM; I'll merge when I get home tonight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/14475
  
Yeah. I have been stuck with other things so could not clean it up. Will 
try again. In worst case close this PR and send a new one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14721
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14721
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64112/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14721
  
**[Test build #3230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3230/consoleFull)**
 for PR 14721 at commit 
[`19582ff`](https://github.com/apache/spark/commit/19582ff633932c3ec0a6804bec9314a3390a6404).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14721
  
**[Test build #64112 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64112/consoleFull)**
 for PR 14721 at commit 
[`19582ff`](https://github.com/apache/spark/commit/19582ff633932c3ec0a6804bec9314a3390a6404).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14719
  
**[Test build #64116 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64116/consoleFull)**
 for PR 14719 at commit 
[`91cb915`](https://github.com/apache/spark/commit/91cb915b4e6c3c4d24fab3f1e772e7e361d4c088).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14635: [SPARK-17052] [SQL] Remove Duplicate Test Cases auto_joi...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14635
  
**[Test build #64115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64115/consoleFull)**
 for PR 14635 at commit 
[`8b8725c`](https://github.com/apache/spark/commit/8b8725cb28f8f4564f4ee0b168363282df09564f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14682
  
**[Test build #64114 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64114/consoleFull)**
 for PR 14682 at commit 
[`e243323`](https://github.com/apache/spark/commit/e243323cb04880c20fb40e1aed8b4a28022d5540).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14682: [SPARK-17104][SQL] LogicalRelation.newInstance should fo...

2016-08-19 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14682
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75569715
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -168,6 +170,57 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   TableIdentifier("tempTable"), ignoreIfNotExists = true, purge = 
false)
   }
 
+  test("generate table-level statistics") {
+def checkTableStats(
+statsSeq: Seq[Statistics],
+sizeInBytes: Int,
+estimatedSize: Int,
+rowCount: Int): Unit = {
+  assert(statsSeq.size === 1)
+  assert(statsSeq.head.sizeInBytes === BigInt(sizeInBytes))
+  assert(statsSeq.head.estimatedSize === Some(BigInt(estimatedSize)))
+  assert(statsSeq.head.rowCount === Some(BigInt(rowCount)))
+}
+
+sql("CREATE TABLE analyzeTable (key STRING, value STRING)").collect()
--- End diff --

I'll modify unit tests based on your comments, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14724: [SPARK-17162] Range does not support SQL generation

2016-08-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14724
  
Can you make the options for logical plan optional?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14708: [SPARK-17149][SQL] array.sql for testing array re...

2016-08-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14708


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75569609
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala ---
@@ -141,7 +142,16 @@ private[hive] case class MetastoreRelation(
   sparkSession.sessionState.conf.defaultSizeInBytes
 })
 }
-  )
+val tableParams = hiveQlTable.getParameters
+val rowCount = tableParams.get(AnalyzeTableCommand.ROW_COUNT)
+if (rowCount != null && rowCount.toLong >=0) {
--- End diff --

ok, thx, i'll fix this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75569605
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -108,4 +126,8 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 
 object AnalyzeTableCommand {
   val TOTAL_SIZE_FIELD = "totalSize"
+  // same as org.apache.hadoop.hive.common.StatsSetupConst
+  val ROW_COUNT = "numRows"
+  val STATS_GENERATED_VIA_STATS_TASK = "STATS_GENERATED_VIA_STATS_TASK"
+  val TRUE = "true"
--- End diff --

ok, i'll remove this. I just copied it from StatsSetupConst :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14708: [SPARK-17149][SQL] array.sql for testing array related f...

2016-08-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14708
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14517
  
**[Test build #64113 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64113/consoleFull)**
 for PR 14517 at commit 
[`dfef36b`](https://github.com/apache/spark/commit/dfef36b6fafd24369b94a492285a48e7b4aad12c).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14517
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64113/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14517
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75569351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -33,7 +34,7 @@ import 
org.apache.spark.sql.catalyst.catalog.{CatalogRelation, CatalogTable}
  * Right now, it only supports Hive tables and it only updates the size of 
a Hive table
  * in the Hive metastore.
  */
-case class AnalyzeTableCommand(tableName: String) extends RunnableCommand {
+case class AnalyzeTableCommand(tableName: String, noscan: Boolean = true) 
extends RunnableCommand {
--- End diff --

Recalculation incurs high cost, it should be triggered by uses like DBAs. 
We can have an mechanism to incrementally update the stats in the future, but 
that will need some well designed algorithms (especially for histograms) and 
definition of confidence interval.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14724: [SPARK-17162] Range does not support SQL generation

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14724
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64110/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14724: [SPARK-17162] Range does not support SQL generation

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14724
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14724: [SPARK-17162] Range does not support SQL generation

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14724
  
**[Test build #64110 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64110/consoleFull)**
 for PR 14724 at commit 
[`e0e12e3`](https://github.com/apache/spark/commit/e0e12e36de949cb2715e1aad893b3eeb0007b0f0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75569106
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala
 ---
@@ -88,14 +90,30 @@ case class AnalyzeTableCommand(tableName: String) 
extends RunnableCommand {
 }
   }.getOrElse(0L)
 
-// Update the Hive metastore if the total size of the table is 
different than the size
+val needUpdate = new mutable.HashMap[String, String]()
+if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+  needUpdate += (AnalyzeTableCommand.TOTAL_SIZE_FIELD -> 
newTotalSize.toString)
+}
+if (!noscan) {
+  val oldRowCount = 
tableParameters.get(AnalyzeTableCommand.ROW_COUNT).map(_.toLong)
+.getOrElse(-1L)
+  val newRowCount = sparkSession.table(tableName).count()
+
+  if (newRowCount >= 0 && newRowCount != oldRowCount) {
+needUpdate += (AnalyzeTableCommand.ROW_COUNT -> 
newRowCount.toString)
+  }
+}
+// Update the Hive metastore if the above parameters of the table 
is different than those
 // recorded in the Hive metastore.
 // This logic is based on 
org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats().
-if (newTotalSize > 0 && newTotalSize != oldTotalSize) {
+if (needUpdate.nonEmpty) {
+  // need to set this parameter so that we can store other 
parameters like "numRows" into
+  // Hive metastore
+  
needUpdate.put(AnalyzeTableCommand.STATS_GENERATED_VIA_STATS_TASK,
--- End diff --

If we don't set this parameter, "numRows" can not be stored into Hive 
metastore. We need to do this in Spark so that we can persist our statistics in 
metastore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14517: [SPARK-16931][PYTHON] PySpark APIS for bucketBy and sort...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14517
  
**[Test build #64113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64113/consoleFull)**
 for PR 14517 at commit 
[`dfef36b`](https://github.com/apache/spark/commit/dfef36b6fafd24369b94a492285a48e7b4aad12c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14712: [SPARK-17072] [SQL] support table-level statistic...

2016-08-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/14712#discussion_r75568863
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -99,9 +99,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   ctx.identifier.getText.toLowerCase == "noscan") {
--- End diff --

@cloud-fan noscan won't scan files, it only collects statistics like total 
size. Without noscan, we will collect other stats like row count and column 
level stats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14475
  
Looks like the diff is messed up?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14721
  
**[Test build #3230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3230/consoleFull)**
 for PR 14721 at commit 
[`19582ff`](https://github.com/apache/spark/commit/19582ff633932c3ec0a6804bec9314a3390a6404).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14721: [SPARK-17158][SQL] Change error message for out of range...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14721
  
**[Test build #64112 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64112/consoleFull)**
 for PR 14721 at commit 
[`19582ff`](https://github.com/apache/spark/commit/19582ff633932c3ec0a6804bec9314a3390a6404).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14426
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14426
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64108/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14426
  
**[Test build #64108 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64108/consoleFull)**
 for PR 14426 at commit 
[`71954e2`](https://github.com/apache/spark/commit/71954e21ba63dc019103a060f7a4ba63a69ce0c9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Hint(name: String, parameters: Seq[String], child: 
LogicalPlan) extends UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14579: [SPARK-16921][PYSPARK] RDD/DataFrame persist()/ca...

2016-08-19 Thread MechCoder

Github user MechCoder commented on a diff in the pull request:

https://github.com/apache/spark/pull/14579#discussion_r75567101
  
--- Diff: python/pyspark/rdd.py ---
@@ -188,6 +188,12 @@ def __init__(self, jrdd, ctx, 
jrdd_deserializer=AutoBatchedSerializer(PickleSeri
 self._id = jrdd.id()
 self.partitioner = None
 
+def __enter__(self):
--- End diff --

yeas, also known as the "If you don't know what to do; raise an Error" 
approach :p 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14637: [WIP] [SPARK-16967] move mesos to module

2016-08-19 Thread mgummelt

Github user mgummelt commented on the issue:

https://github.com/apache/spark/pull/14637
  
mima seems to be upset about my removal of MESOS_REGEX from 
`SparkMasterRegex`, but I don't understand why, as it's a private class.  
Should I add an entry to MimaExcludes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2016-08-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14725
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64111/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2016-08-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14725
  
**[Test build #64111 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64111/consoleFull)**
 for PR 14725 at commit 
[`f9672bf`](https://github.com/apache/spark/commit/f9672bfe34b1b5f5ea14700d2aaaee055f5323f8).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 709 matches

Mail list logo