[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16938
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73405/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16938
  
**[Test build #73405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73405/testReport)**
 for PR 16938 at commit 
[`1f2ce17`](https://github.com/apache/spark/commit/1f2ce17e3d2eca92bc01b6a22e908bd8fd1d9592).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17035: [SPARK-19705][SQL] Preferred location supporting HDFS ca...

2017-02-23 Thread highfei2011
Github user highfei2011 commented on the issue:

https://github.com/apache/spark/pull/17035
  
PreferredLocation calculation is more complex, reflected in the code which 
part of it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-02-23 Thread watermen
Github user watermen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r102892814
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputStatistics.scala ---
@@ -23,5 +23,9 @@ package org.apache.spark
  * @param shuffleId ID of the shuffle
  * @param bytesByPartitionId approximate number of output bytes for each 
map output partition
  *   (may be inexact due to use of compressed map statuses)
+ * @param numberOfOutput number of output for each pre-map output partition
  */
-private[spark] class MapOutputStatistics(val shuffleId: Int, val 
bytesByPartitionId: Array[Long])
+private[spark] class MapOutputStatistics(
+val shuffleId: Int,
+val bytesByPartitionId: Array[Long],
+val numberOfOutput: Array[Int])
--- End diff --

Here, maybe Long is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17000: [SPARK-18946][ML] sliceAggregate which is a new aggregat...

2017-02-23 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17000
  
cc @yanboliang - it seems actually similar in effect to the VL-BFGS work 
with RDD-based coefficients?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17056
  
**[Test build #73410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73410/testReport)**
 for PR 17056 at commit 
[`a378b3e`](https://github.com/apache/spark/commit/a378b3ef08cead4c915096f11de5bd371a405fef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash

2017-02-23 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/17056
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistices to ...

2017-02-23 Thread witgo
Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r102891969
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -39,16 +40,18 @@ private[spark] sealed trait MapStatus {
* necessary for correctness, since block fetchers are allowed to skip 
zero-size blocks.
*/
   def getSizeForBlock(reduceId: Int): Long
+
+  def numberOfOutput: Int
--- End diff --

The number of output may be greater than 2G?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-...

2017-02-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/17056

[SPARK-17495] [SQL] Support Decimal type in Hive-hash

## What changes were proposed in this pull request?

Hive hash to support Decimal datatype. [Hive internally normalises 
decimals](https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveDecimalV1.java#L307)
 and I have ported that logic as-is to HiveHash.

Generated code (in case any reviewer wants to examine):

```
/* 031 */   protected void processNext() throws java.io.IOException {
/* 032 */ while (inputadapter_input.hasNext() && !stopEarly()) {
/* 033 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 034 */   project_value = 0;
/* 035 */
/* 036 */   boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
/* 037 */   Decimal inputadapter_value = inputadapter_isNull ? null : 
(inputadapter_row.getDecimal(0, 38, 0));
/* 038 */   if (!inputadapter_isNull) {
/* 039 */ project_childHash = 
org.apache.spark.sql.catalyst.expressions.HiveHashFunction.normalizeDecimal(
/* 040 */   inputadapter_value.toJavaBigDecimal(), true).hashCode();
/* 041 */   }
/* 042 */   project_value = (31 * project_value) + project_childHash;
/* 043 */   project_childHash = 0;
/* 044 */   project_rowWriter.write(0, project_value);
/* 045 */   append(project_result);
/* 046 */   if (shouldStop()) return;
/* 047 */ }
/* 048 */   }
```

## How was this patch tested?

Added unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark SPARK-17495_decimal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17056


commit a378b3ef08cead4c915096f11de5bd371a405fef
Author: Tejas Patil 
Date:   2017-02-24T07:35:16Z

[SPARK-17495] [SQL] Support Decimal type




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17050
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17053: [SPARK-18939][SQL] Timezone support in partition values.

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17053
  
**[Test build #73409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73409/testReport)**
 for PR 17053 at commit 
[`c563a9a`](https://github.com/apache/spark/commit/c563a9a91e5ce872e10c7bfa528e9ea4688e333b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17055: [SPARK-19723][SQL]create datasource table with an non-ex...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17055
  
**[Test build #73408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73408/testReport)**
 for PR 17055 at commit 
[`89eb03a`](https://github.com/apache/spark/commit/89eb03ad763538ec84cdd447cb51079881b4f9ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17050
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73397/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17050
  
**[Test build #73397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73397/testReport)**
 for PR 17050 at commit 
[`3cde705`](https://github.com/apache/spark/commit/3cde705c6baa1e4a869149f3ca289a5c1e3a3000).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16938
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73400/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17055: [SPARK-19723][SQL]create datasource table with an...

2017-02-23 Thread windpiger
GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/17055

[SPARK-19723][SQL]create datasource table with an non-existent location 
should work

## What changes were proposed in this pull request?

This JIRA is a follow up work after SPARK-19583

As we discussed in that [PR|https://github.com/apache/spark/pull/16938] 

The following DDL for datasource table with an non-existent location should 
work:
```
CREATE TABLE ... (PARTITIONED BY ...) LOCATION path
```
Currently it will throw exception that path not exists

## How was this patch tested?
unit test added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark CTDataSourcePathNotExists

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17055.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17055


commit 89eb03ad763538ec84cdd447cb51079881b4f9ac
Author: windpiger 
Date:   2017-02-24T07:33:23Z

[SPARK-19723][SQL]create datasource table with an non-existent location 
should work




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16938
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16938
  
**[Test build #73400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73400/testReport)**
 for PR 16938 at commit 
[`afa1313`](https://github.com/apache/spark/commit/afa13136d6d24313c8f18bb7ed175bf45079476a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...

2017-02-23 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17053#discussion_r102890922
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -251,7 +251,8 @@ abstract class ExternalCatalog {
   def listPartitionsByFilter(
   db: String,
   table: String,
-  predicates: Seq[Expression]): Seq[CatalogTablePartition]
+  predicates: Seq[Expression],
+  defaultTimeZoneId: String): Seq[CatalogTablePartition]
--- End diff --

Thank you, I'll add it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17001
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73396/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73396/testReport)**
 for PR 17001 at commit 
[`9c0773b`](https://github.com/apache/spark/commit/9c0773b1d477d39f29ec44f2dcfe34d129706efe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17054: Refactored the code to remove redundency of count operat...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17054
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17054: Refactored the code to remove redundency of count...

2017-02-23 Thread HarshSharma8
GitHub user HarshSharma8 opened a pull request:

https://github.com/apache/spark/pull/17054

Refactored the code to remove redundency of count operation

## What changes were proposed in this pull request?

Removed the redundant count operation which is generating same result when 
it not required to be performed twice.

## How was this patch tested?

Its already a duplicate operation to be performed, so its already tested.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HarshSharma8/spark remove/redundency

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17054.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17054


commit 14785f52e5f4048ea687e97e7044b3de00716d89
Author: Harsh Sharma 
Date:   2017-02-24T07:15:14Z

Refactored the code to remove redundency of count operation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17052
  
**[Test build #73407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73407/testReport)**
 for PR 17052 at commit 
[`e8a24e1`](https://github.com/apache/spark/commit/e8a24e1cc5f1a638ca23b00adbbcd909db28549d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...

2017-02-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17053#discussion_r102889140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -251,7 +251,8 @@ abstract class ExternalCatalog {
   def listPartitionsByFilter(
   db: String,
   table: String,
-  predicates: Seq[Expression]): Seq[CatalogTablePartition]
+  predicates: Seq[Expression],
+  defaultTimeZoneId: String): Seq[CatalogTablePartition]
--- End diff --

we need to document what a timezone id is here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread lins05
Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17051#discussion_r10239
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -398,6 +398,27 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
 // For all other SQL types, we compare the entire object directly.
 assert(filteredStats.attributeStats(ar) == expectedColStats)
 }
-  }
 
+// If the filter has a binary operator (including those nested inside
+// AND/OR/NOT), swap the sides of the attribte and the literal, 
reverse the
+// operator, and then check again.
+val rewrittenFilter = filterNode transformExpressionsDown {
+  case op @ EqualTo(ar: AttributeReference, l: Literal) =>
--- End diff --

Emm, we not only switch the side of the attr and the literal, but also 
reversed the operator, e.g. `LessThan` would be changed to `GreaterThan`. So I 
guess we can't use `withNewChildren` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.di...

2017-02-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16996


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16996
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16594
  
LGTM, pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread lins05
Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17051#discussion_r102888024
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -398,6 +398,27 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
 // For all other SQL types, we compare the entire object directly.
 assert(filteredStats.attributeStats(ar) == expectedColStats)
 }
-  }
 
+// If the filter has a binary operator (including those nested inside
+// AND/OR/NOT), swap the sides of the attribte and the literal, 
reverse the
+// operator, and then check again.
+val rewrittenFilter = filterNode transformExpressionsDown {
+  case op @ EqualTo(ar: AttributeReference, l: Literal) =>
--- End diff --

👍 

I tried to find something like this but failed to, so I resorted to the 
current code.  Thanks for the tip!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17053: [SPARK-18939][SQL] Timezone support in partition values.

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17053
  
**[Test build #73406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73406/testReport)**
 for PR 17053 at commit 
[`49da287`](https://github.com/apache/spark/commit/49da287e174cf20e78c3ff0ef122d2ae0c34).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17051#discussion_r102887733
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala
 ---
@@ -398,6 +398,27 @@ class FilterEstimationSuite extends 
StatsEstimationTestBase {
 // For all other SQL types, we compare the entire object directly.
 assert(filteredStats.attributeStats(ar) == expectedColStats)
 }
-  }
 
+// If the filter has a binary operator (including those nested inside
+// AND/OR/NOT), swap the sides of the attribte and the literal, 
reverse the
+// operator, and then check again.
+val rewrittenFilter = filterNode transformExpressionsDown {
+  case op @ EqualTo(ar: AttributeReference, l: Literal) =>
--- End diff --

nit: `case b @  BinaryComparison(ar: AttributeReference, l: Literal) => 
b.withNewChildren(l, ar)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16938
  
**[Test build #73405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73405/testReport)**
 for PR 16938 at commit 
[`1f2ce17`](https://github.com/apache/spark/commit/1f2ce17e3d2eca92bc01b6a22e908bd8fd1d9592).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17051: [SPARK-17075][SQL] Follow up: fix file line ending and i...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17051
  
**[Test build #73404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73404/testReport)**
 for PR 17051 at commit 
[`8881d58`](https://github.com/apache/spark/commit/8881d58ad65fb7f32a74610561230e3e800611a9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17053: [SPARK-18939][SQL] Timezone support in partition ...

2017-02-23 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/17053

[SPARK-18939][SQL] Timezone support in partition values.

## What changes were proposed in this pull request?

This is a follow-up pr of #16308 and #16750.

This pr enables timezone support in partition values.

We should use `timeZone` option introduced at #16750 to parse/format 
partition values of the `TimestampType`.

For example, if you have timestamp `"2016-01-01 00:00:00"` in `GMT` which 
will be used for partition values, the values written by the default timezone 
option, which is `"GMT"` because the session local timezone is `"GMT"` here, 
are:

```scala
scala> spark.conf.set("spark.sql.session.timeZone", "GMT")

scala> val df = Seq((1, new java.sql.Timestamp(145160640L))).toDF("i", 
"ts")
df: org.apache.spark.sql.DataFrame = [i: int, ts: timestamp]

scala> df.show()
+---+---+
|  i| ts|
+---+---+
|  1|2016-01-01 00:00:00|
+---+---+

scala> df.write.partitionBy("ts").save("/path/to/gmtpartition")
```

```sh
$ ls /path/to/gmtpartition/
_SUCCESSts=2016-01-01 00%3A00%3A00
```

whereas setting the option to `"PST"`, they are:

```scala
scala> df.write.option("timeZone", 
"PST").partitionBy("ts").save("/path/to/pstpartition")
```

```sh
$ ls /path/to/pstpartition/
_SUCCESSts=2015-12-31 16%3A00%3A00
```

We can properly read the partition values if the session local timezone and 
the timezone of the partition values are the same:

```scala
scala> spark.read.load("/path/to/gmtpartition").show()
+---+---+
|  i| ts|
+---+---+
|  1|2016-01-01 00:00:00|
+---+---+
```

And even if the timezones are different, we can properly read the values 
with setting corrent timezone option:

```scala
// wrong result
scala> spark.read.load("/path/to/pstpartition").show()
+---+---+
|  i| ts|
+---+---+
|  1|2015-12-31 16:00:00|
+---+---+

// correct result
scala> spark.read.option("timeZone", 
"PST").load("/path/to/pstpartition").show()
+---+---+
|  i| ts|
+---+---+
|  1|2016-01-01 00:00:00|
+---+---+
```

## How was this patch tested?

Existing tests and added some tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-18939

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17053


commit 54e33690093d97d33de48f9665020a6296a8a909
Author: Takuya UESHIN 
Date:   2017-02-14T07:12:10Z

Modify FileFormatWriter to use timezone option.

commit 2f0ca106cf60d57389e1725f3f61a784dbe98f70
Author: Takuya UESHIN 
Date:   2017-02-14T09:13:22Z

Use timeZone option for PartitioningAwareFileIndex.

commit 0e70ce6fe3b28c9448834f0dbb0c30f6a39669a2
Author: Takuya UESHIN 
Date:   2017-02-17T09:26:52Z

Use stringSchema to make tests more explicitly.

commit dae7eba86f3e1e3cb38c8b56c1f684374b9355f1
Author: Takuya UESHIN 
Date:   2017-02-20T07:57:09Z

Use correct timezone for partition values for OptimizeMetadataOnlyQuery..

commit 49da287e174cf20e78c3ff0ef122d2ae0c34
Author: Takuya UESHIN 
Date:   2017-02-23T03:07:54Z

Use correct timezone for partition values.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread lins05
Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17051#discussion_r102887655
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -1,511 +1,509 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.plans.logical.statsEstimation
-
-import java.sql.{Date, Timestamp}
-
-import scala.collection.immutable.{HashSet, Map}
-import scala.collection.mutable
-
-import org.apache.spark.internal.Logging
-import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.plans.logical._
-import org.apache.spark.sql.catalyst.util.DateTimeUtils
-import org.apache.spark.sql.types._
-
-case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) 
extends Logging {
-
-  /**
-   * We use a mutable colStats because we need to update the corresponding 
ColumnStat
-   * for a column after we apply a predicate condition.  For example, 
column c has
-   * [min, max] value as [0, 100].  In a range condition such as (c > 40 
AND c <= 50),
-   * we need to set the column's [min, max] value to [40, 100] after we 
evaluate the
-   * first condition c > 40.  We need to set the column's [min, max] value 
to [40, 50]
-   * after we evaluate the second condition c <= 50.
-   */
-  private var mutableColStats: mutable.Map[ExprId, ColumnStat] = 
mutable.Map.empty
-
-  /**
-   * Returns an option of Statistics for a Filter logical plan node.
-   * For a given compound expression condition, this method computes 
filter selectivity
-   * (or the percentage of rows meeting the filter condition), which
-   * is used to compute row count, size in bytes, and the updated 
statistics after a given
-   * predicated is applied.
-   *
-   * @return Option[Statistics] When there is no statistics collected, it 
returns None.
-   */
-  def estimate: Option[Statistics] = {
-// We first copy child node's statistics and then modify it based on 
filter selectivity.
-val stats: Statistics = plan.child.stats(catalystConf)
-if (stats.rowCount.isEmpty) return None
-
-// save a mutable copy of colStats so that we can later change it 
recursively
-mutableColStats = mutable.Map(stats.attributeStats.map(kv => 
(kv._1.exprId, kv._2)).toSeq: _*)
-
-// estimate selectivity of this filter predicate
-val filterSelectivity: Double = 
calculateFilterSelectivity(plan.condition) match {
-  case Some(percent) => percent
-  // for not-supported condition, set filter selectivity to a 
conservative estimate 100%
-  case None => 1.0
-}
-
-// attributeStats has mapping Attribute-to-ColumnStat.
-// mutableColStats has mapping ExprId-to-ColumnStat.
-// We use an ExprId-to-Attribute map to facilitate the mapping 
Attribute-to-ColumnStat
-val expridToAttrMap: Map[ExprId, Attribute] =
-  stats.attributeStats.map(kv => (kv._1.exprId, kv._1))
-// copy mutableColStats contents to an immutable AttributeMap.
-val mutableAttributeStats: mutable.Map[Attribute, ColumnStat] =
-  mutableColStats.map(kv => expridToAttrMap(kv._1) -> kv._2)
-val newColStats = AttributeMap(mutableAttributeStats.toSeq)
-
-val filteredRowCount: BigInt =
-  EstimationUtils.ceil(BigDecimal(stats.rowCount.get) * 
filterSelectivity)
-val filteredSizeInBytes =
-  EstimationUtils.getOutputSize(plan.output, filteredRowCount, 
newColStats)
-
-Some(stats.copy(sizeInBytes = filteredSizeInBytes, rowCount = 
Some(filteredRowCount),
-  attributeStats = newColStats))
-  }
-
-  /**
-   * Returns a percentage of rows meeting a compound condition in Filter 
node.
-   * A compound condition is decomposed into multiple single conditions 
linked with AND, OR, NOT.
-   * For 

[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread lins05
Github user lins05 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17051#discussion_r102887355
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -1,511 +1,509 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.plans.logical.statsEstimation
-
-import java.sql.{Date, Timestamp}
-
-import scala.collection.immutable.{HashSet, Map}
-import scala.collection.mutable
-
-import org.apache.spark.internal.Logging
-import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.plans.logical._
-import org.apache.spark.sql.catalyst.util.DateTimeUtils
-import org.apache.spark.sql.types._
-
-case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) 
extends Logging {
-
-  /**
-   * We use a mutable colStats because we need to update the corresponding 
ColumnStat
-   * for a column after we apply a predicate condition.  For example, 
column c has
-   * [min, max] value as [0, 100].  In a range condition such as (c > 40 
AND c <= 50),
-   * we need to set the column's [min, max] value to [40, 100] after we 
evaluate the
-   * first condition c > 40.  We need to set the column's [min, max] value 
to [40, 50]
-   * after we evaluate the second condition c <= 50.
-   */
-  private var mutableColStats: mutable.Map[ExprId, ColumnStat] = 
mutable.Map.empty
-
-  /**
-   * Returns an option of Statistics for a Filter logical plan node.
-   * For a given compound expression condition, this method computes 
filter selectivity
-   * (or the percentage of rows meeting the filter condition), which
-   * is used to compute row count, size in bytes, and the updated 
statistics after a given
-   * predicated is applied.
-   *
-   * @return Option[Statistics] When there is no statistics collected, it 
returns None.
-   */
-  def estimate: Option[Statistics] = {
-// We first copy child node's statistics and then modify it based on 
filter selectivity.
-val stats: Statistics = plan.child.stats(catalystConf)
-if (stats.rowCount.isEmpty) return None
-
-// save a mutable copy of colStats so that we can later change it 
recursively
-mutableColStats = mutable.Map(stats.attributeStats.map(kv => 
(kv._1.exprId, kv._2)).toSeq: _*)
-
-// estimate selectivity of this filter predicate
-val filterSelectivity: Double = 
calculateFilterSelectivity(plan.condition) match {
-  case Some(percent) => percent
-  // for not-supported condition, set filter selectivity to a 
conservative estimate 100%
-  case None => 1.0
-}
-
-// attributeStats has mapping Attribute-to-ColumnStat.
-// mutableColStats has mapping ExprId-to-ColumnStat.
-// We use an ExprId-to-Attribute map to facilitate the mapping 
Attribute-to-ColumnStat
-val expridToAttrMap: Map[ExprId, Attribute] =
-  stats.attributeStats.map(kv => (kv._1.exprId, kv._1))
-// copy mutableColStats contents to an immutable AttributeMap.
-val mutableAttributeStats: mutable.Map[Attribute, ColumnStat] =
-  mutableColStats.map(kv => expridToAttrMap(kv._1) -> kv._2)
-val newColStats = AttributeMap(mutableAttributeStats.toSeq)
-
-val filteredRowCount: BigInt =
-  EstimationUtils.ceil(BigDecimal(stats.rowCount.get) * 
filterSelectivity)
-val filteredSizeInBytes =
-  EstimationUtils.getOutputSize(plan.output, filteredRowCount, 
newColStats)
-
-Some(stats.copy(sizeInBytes = filteredSizeInBytes, rowCount = 
Some(filteredRowCount),
-  attributeStats = newColStats))
-  }
-
-  /**
-   * Returns a percentage of rows meeting a compound condition in Filter 
node.
-   * A compound condition is decomposed into multiple single conditions 
linked with AND, OR, NOT.
-   * For 

[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17052
  
**[Test build #73403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73403/testReport)**
 for PR 17052 at commit 
[`9eb57b7`](https://github.com/apache/spark/commit/9eb57b7294f2636e370be86cf975509917fdd861).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17052: [SPARK-19690][SS] Join a streaming DataFrame with...

2017-02-23 Thread uncleGen
GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17052

[SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which 
has an aggregation may not work

## What changes were proposed in this pull request?

`StatefulAggregationStrategy` should check logicplan is streaming or not

Test code:

```
case class Record(key: Int, value: String)
val df = spark.createDataFrame((1 to 100).map(i => Record(i, 
s"value_$i"))).groupBy("value").count
val lines = spark.readStream.format("socket").option("host", 
"localhost").option("port", "").load 
val words = lines.as[String].flatMap(_.split(" ")) 
val result = words.join(df, "value")
```

before pr:

```
== Physical Plan ==
*Project [value#13, count#19L]
+- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight
   :- *Filter isnotnull(value#13)
   :  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#13]
   : +- MapPartitions , obj#12: java.lang.String
   :+- DeserializeToObject value#5.toString, obj#11: 
java.lang.String
   :   +- StreamingRelation textSocket, [value#5]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, 
true]))
  +- *HashAggregate(keys=[value#1], functions=[count(1)])
 +- StateStoreSave [value#1], OperatorStateId(,0,0), 
Append, 0
+- *HashAggregate(keys=[value#1], functions=[merge_count(1)])
   +- StateStoreRestore [value#1], 
OperatorStateId(,0,0)
  +- *HashAggregate(keys=[value#1], 
functions=[merge_count(1)])
 +- Exchange hashpartitioning(value#1, 200)
+- *HashAggregate(keys=[value#1], 
functions=[partial_count(1)])
   +- *Project [value#1]
  +- *Filter isnotnull(value#1)
 +- LocalTableScan [key#0, value#1]
```

after pr:

```
== Physical Plan ==
*Project [value#13, count#19L]
+- *BroadcastHashJoin [value#13], [value#1], Inner, BuildRight
   :- *Filter isnotnull(value#13)
   :  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#13]
   : +- MapPartitions , obj#12: java.lang.String
   :+- DeserializeToObject value#5.toString, obj#11: 
java.lang.String
   :   +- StreamingRelation textSocket, [value#5]
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, 
true]))
  +- *HashAggregate(keys=[value#1], functions=[count(1)])
 +- Exchange hashpartitioning(value#1, 200)
+- *HashAggregate(keys=[value#1], functions=[partial_count(1)])
   +- *Project [value#1]
  +- *Filter isnotnull(value#1)
 +- LocalTableScan [key#0, value#1]
```

## How was this patch tested?

add new unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-19690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17052


commit e45b06e2495e09c6d7e7a50ee509044b526bf8d0
Author: uncleGen 
Date:   2017-02-22T10:18:31Z

Join a streaming DataFrame with a batch DataFrame which has an aggregation 
may not work

commit 9eb57b7294f2636e370be86cf975509917fdd861
Author: uncleGen 
Date:   2017-02-24T06:38:41Z

code clean




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-02-23 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16696
  
@cloud-fan @gatorsmile I've updated this pr and also added test cases, 
please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-23 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102887155
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
 FORMAT: 'FORMAT';
 LOGICAL: 'LOGICAL';
 CODEGEN: 'CODEGEN';
+COST: 'COST';
--- End diff --

Thanks! Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17051#discussion_r102887105
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala
 ---
@@ -1,511 +1,509 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.catalyst.plans.logical.statsEstimation
-
-import java.sql.{Date, Timestamp}
-
-import scala.collection.immutable.{HashSet, Map}
-import scala.collection.mutable
-
-import org.apache.spark.internal.Logging
-import org.apache.spark.sql.catalyst.CatalystConf
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.catalyst.plans.logical._
-import org.apache.spark.sql.catalyst.util.DateTimeUtils
-import org.apache.spark.sql.types._
-
-case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) 
extends Logging {
-
-  /**
-   * We use a mutable colStats because we need to update the corresponding 
ColumnStat
-   * for a column after we apply a predicate condition.  For example, 
column c has
-   * [min, max] value as [0, 100].  In a range condition such as (c > 40 
AND c <= 50),
-   * we need to set the column's [min, max] value to [40, 100] after we 
evaluate the
-   * first condition c > 40.  We need to set the column's [min, max] value 
to [40, 50]
-   * after we evaluate the second condition c <= 50.
-   */
-  private var mutableColStats: mutable.Map[ExprId, ColumnStat] = 
mutable.Map.empty
-
-  /**
-   * Returns an option of Statistics for a Filter logical plan node.
-   * For a given compound expression condition, this method computes 
filter selectivity
-   * (or the percentage of rows meeting the filter condition), which
-   * is used to compute row count, size in bytes, and the updated 
statistics after a given
-   * predicated is applied.
-   *
-   * @return Option[Statistics] When there is no statistics collected, it 
returns None.
-   */
-  def estimate: Option[Statistics] = {
-// We first copy child node's statistics and then modify it based on 
filter selectivity.
-val stats: Statistics = plan.child.stats(catalystConf)
-if (stats.rowCount.isEmpty) return None
-
-// save a mutable copy of colStats so that we can later change it 
recursively
-mutableColStats = mutable.Map(stats.attributeStats.map(kv => 
(kv._1.exprId, kv._2)).toSeq: _*)
-
-// estimate selectivity of this filter predicate
-val filterSelectivity: Double = 
calculateFilterSelectivity(plan.condition) match {
-  case Some(percent) => percent
-  // for not-supported condition, set filter selectivity to a 
conservative estimate 100%
-  case None => 1.0
-}
-
-// attributeStats has mapping Attribute-to-ColumnStat.
-// mutableColStats has mapping ExprId-to-ColumnStat.
-// We use an ExprId-to-Attribute map to facilitate the mapping 
Attribute-to-ColumnStat
-val expridToAttrMap: Map[ExprId, Attribute] =
-  stats.attributeStats.map(kv => (kv._1.exprId, kv._1))
-// copy mutableColStats contents to an immutable AttributeMap.
-val mutableAttributeStats: mutable.Map[Attribute, ColumnStat] =
-  mutableColStats.map(kv => expridToAttrMap(kv._1) -> kv._2)
-val newColStats = AttributeMap(mutableAttributeStats.toSeq)
-
-val filteredRowCount: BigInt =
-  EstimationUtils.ceil(BigDecimal(stats.rowCount.get) * 
filterSelectivity)
-val filteredSizeInBytes =
-  EstimationUtils.getOutputSize(plan.output, filteredRowCount, 
newColStats)
-
-Some(stats.copy(sizeInBytes = filteredSizeInBytes, rowCount = 
Some(filteredRowCount),
-  attributeStats = newColStats))
-  }
-
-  /**
-   * Returns a percentage of rows meeting a compound condition in Filter 
node.
-   * A compound condition is decomposed into multiple single conditions 
linked with AND, OR, NOT.
-   * 

[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16944
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73393/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16594
  
**[Test build #73402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73402/testReport)**
 for PR 16594 at commit 
[`6e10f84`](https://github.com/apache/spark/commit/6e10f840fed50b7e48898e73967bc35a29a6e23b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16944
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16944
  
**[Test build #73393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73393/testReport)**
 for PR 16944 at commit 
[`9b0b2bb`](https://github.com/apache/spark/commit/9b0b2bb3fbc7db9e71b3342014b729568290dffd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17051: [SPARK-17075][SQL] Follow up: fix file line ending and i...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17051
  
**[Test build #73401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73401/testReport)**
 for PR 17051 at commit 
[`0f56d0f`](https://github.com/apache/spark/commit/0f56d0f1003268e4945ec5a427bbcc4bb7061a49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17051: [SPARK-17075][SQL] Follow up: fix file line endin...

2017-02-23 Thread lins05
GitHub user lins05 opened a pull request:

https://github.com/apache/spark/pull/17051

[SPARK-17075][SQL] Follow up: fix file line ending and improve the tests

## What changes were proposed in this pull request?

Fixed the line ending of `FilterEstimation.scala`. Also improved the tests 
to cover more cases.

## How was this patch tested?

Existing unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lins05/spark fix-cbo-filter-file-encoding

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17051


commit ee6d9915b26254db176a5aa34c1d59e304e201e0
Author: Shuai Lin 
Date:   2017-02-24T05:59:41Z

[SPARK-17075][SQL] Follow up: fix file line ending and improve the tests.

commit 0f56d0f1003268e4945ec5a427bbcc4bb7061a49
Author: Shuai Lin 
Date:   2017-02-24T05:58:37Z

Use transformExpressionsDown to rewrite the filter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17051: [SPARK-17075][SQL] Follow up: fix file line ending and i...

2017-02-23 Thread lins05
Github user lins05 commented on the issue:

https://github.com/apache/spark/pull/17051
  
cc @ron8hu @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16938
  
**[Test build #73400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73400/testReport)**
 for PR 16938 at commit 
[`afa1313`](https://github.com/apache/spark/commit/afa13136d6d24313c8f18bb7ed175bf45079476a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16938
  
**[Test build #73399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73399/testReport)**
 for PR 16938 at commit 
[`8559e4e`](https://github.com/apache/spark/commit/8559e4e8f9b8e8f773f4d336866a01ff15c9fc5e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17049
  
**[Test build #73398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73398/testReport)**
 for PR 17049 at commit 
[`c31b2b0`](https://github.com/apache/spark/commit/c31b2b068a945ef8ca39532292989e7c205b9951).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/17049
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17049#discussion_r102882775
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -71,6 +75,242 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkConsistencyBetweenInterpretedAndCodegen(Crc32, BinaryType)
   }
 
+
+  def checkHiveHash(value: Any, dataType: DataType, expected: Long): Unit 
= {
+// Note : All expected hashes need to be computed using Hive 1.2.1
+val actual = HiveHashFunction.hash(value, dataType, seed = 0)
+assert(actual == expected)
--- End diff --

Added clue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17049#discussion_r102882772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -781,12 +780,12 @@ object HiveHashFunction extends 
InterpretedHashFunction {
 var i = 0
 val length = struct.numFields
 while (i < length) {
-  result = (31 * result) + hash(struct.get(i, types(i)), types(i), 
seed + 1).toInt
+  result = (31 * result) + hash(struct.get(i, types(i)), types(i), 
0).toInt
--- End diff --

The `seed` is something used in murmur3 hash and hive hash does not need 
it. See original impl in Hive codebase : 
https://github.com/apache/hive/blob/4ba713ccd85c3706d195aeef9476e6e6363f1c21/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638

Since the methods related to hashing in Spark already had `seed`, I had to 
add it in hive-hash. When I compute the hash, I always need to set `seed` to 0 
which is what is done here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17049#discussion_r102881875
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -781,12 +780,12 @@ object HiveHashFunction extends 
InterpretedHashFunction {
 var i = 0
 val length = struct.numFields
 while (i < length) {
-  result = (31 * result) + hash(struct.get(i, types(i)), types(i), 
seed + 1).toInt
+  result = (31 * result) + hash(struct.get(i, types(i)), types(i), 
0).toInt
--- End diff --

Could you explain the reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16696
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73391/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16696
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16696: [SPARK-19350] [SQL] Cardinality estimation of Limit and ...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16696
  
**[Test build #73391 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73391/testReport)**
 for PR 16696 at commit 
[`5692939`](https://github.com/apache/spark/commit/56929391719053e72791abe127b10a3316b51141).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of Elimin...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17050
  
**[Test build #73397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73397/testReport)**
 for PR 17050 at commit 
[`3cde705`](https://github.com/apache/spark/commit/3cde705c6baa1e4a869149f3ca289a5c1e3a3000).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73390/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #73390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73390/testReport)**
 for PR 15125 at commit 
[`11bc349`](https://github.com/apache/spark/commit/11bc349e55eaa5f687d376d1a05f3509459dbecd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17050: [SPARK-19722] [SQL] [MINOR] Clean up the usage of...

2017-02-23 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/17050

[SPARK-19722] [SQL] [MINOR] Clean up the usage of EliminateSubqueryAliases

### What changes were proposed in this pull request?
In the PR https://github.com/apache/spark/pull/11403, we introduced the 
function `canonicalized` for eliminating useless subqueries. We can simply 
replace the call of rule `EliminateSubqueryAliases` by the function 
`canonicalized`. 

After we changed the view resolution and management, the current reason why 
we keep `EliminateSubqueryAliases ` in optimizer becomes out-of-dated. Thus, 
this PR also update the reason to `eager analysis of Dataset`. 

### How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark eliminateSubquery

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17050


commit d75b94c1abb4b60444a4191319c787a50a061bf9
Author: Xiao Li 
Date:   2017-02-24T05:08:02Z

fix.

commit 3cde705c6baa1e4a869149f3ca289a5c1e3a3000
Author: Xiao Li 
Date:   2017-02-24T05:23:05Z

clean




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17049
  
Looks good except that comment.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17049#discussion_r102881054
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -71,6 +75,242 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkConsistencyBetweenInterpretedAndCodegen(Crc32, BinaryType)
   }
 
+
+  def checkHiveHash(value: Any, dataType: DataType, expected: Long): Unit 
= {
+// Note : All expected hashes need to be computed using Hive 1.2.1
+val actual = HiveHashFunction.hash(value, dataType, seed = 0)
+assert(actual == expected)
--- End diff --

we should add a clue; otherwise we will never be able to tell what's going 
on if the tests fail on those randomized vlaues.

```
withClue(s"value is $value") {
  assert(..
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...

2017-02-23 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15047
  
@gatorsmile + @rxin : I had made a note of your comments but was not able 
to get to it that time because I had other time critical projects to be worked 
on. I have put out a PR which improves the unit test coverage : 
https://github.com/apache/spark/pull/17049


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17049
  
**[Test build #73395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73395/testReport)**
 for PR 17049 at commit 
[`c589350`](https://github.com/apache/spark/commit/c5893502f52d073f30344a9fa8c4e11287207959).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17049
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73395/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17049
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] [SQL] Add more tests for hive hash

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17049
  
**[Test build #73395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73395/testReport)**
 for PR 17049 at commit 
[`c589350`](https://github.com/apache/spark/commit/c5893502f52d073f30344a9fa8c4e11287207959).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17001
  
**[Test build #73396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73396/testReport)**
 for PR 17001 at commit 
[`9c0773b`](https://github.com/apache/spark/commit/9c0773b1d477d39f29ec44f2dcfe34d129706efe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17049: [SPARK-17495] Add more tests for hive hash

2017-02-23 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/17049
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17049: [SPARK-17495] Add more tests for hive hash

2017-02-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/17049

[SPARK-17495] Add more tests for hive hash

## What changes were proposed in this pull request?

This PR adds tests hive-hash by comparing the outputs generated against 
Hive 1.2.1. Following datatypes are covered by this PR:
- null
- boolean
- byte
- short
- int
- long
- float
- double
- string
- array
- map
- struct

Datatypes that I have _NOT_ covered but I will work on separately are:
- Decimal
- Calendar

## How was this patch tested?

NA

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark SPARK-17495_remaining_types

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17049.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17049


commit c5893502f52d073f30344a9fa8c4e11287207959
Author: Tejas Patil 
Date:   2016-10-24T04:17:07Z

Add more tests for hive hash




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17001: [SPARK-19667][SQL]create table with hiveenabled in defau...

2017-02-23 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/17001
  
yes,it is for HiveExternalCatalog.
 when I do this [PR](https://github.com/apache/spark/pull/16996), I found 
the logic.

>The hive.metastore.warehouse.dir in sparkConf still take effect in Spark, 
it is not useless.
  The reason is that:
  1.when we run spark with HiveEnabled, it will create ShareState
  2.when create ShareState, it will create a HiveExternalCatalog

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L85
3.when create HiveExternalCatalog, it will Create HiveClientImpl by 
HiveUtils

https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L65
4.when create HiveClientImpl, it will call SessionState.start(state)
and then in the SessionState.start(state), it will create a default 
database using hive.metastore.warehouse.dir in hiveConf which is created in 
HiveClientImpl 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L189
while the hiveConf created in HiveClientImpl from hadoopConf and sparkConf, 
and sparkConf will overwrite the value of the same key in hadoopConf. So it 
means that it actually will use hive.metastore.warehouse.dir in sparkConf to 
create the default database, if we does not overwrite the value in sparkConf in 
SharedState, the database location is not we expected which is the warehouse 
path. So here sparkContext.conf.set("hive.metastore.warehouse.dir", 
sparkWarehouseDir) should be retained

**we can also find that,the default database does not created in 
SharedState, here condition is false, will not hit the create database logic. 
it has been created when we init the HiveClientImpl

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96**


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17048
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73394/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17048
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17048
  
**[Test build #73394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73394/testReport)**
 for PR 17048 at commit 
[`adeb5b7`](https://github.com/apache/spark/commit/adeb5b7ea313662a6ab0803acbda1ec8b88bac9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17038
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73387/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17038: [SPARK-19707][Core] Improve the invalid path check for s...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17038
  
**[Test build #73387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73387/testReport)**
 for PR 17038 at commit 
[`db5c287`](https://github.com/apache/spark/commit/db5c287e1223522de9c17391c3ea3025c938158e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17048
  
**[Test build #73394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73394/testReport)**
 for PR 17048 at commit 
[`adeb5b7`](https://github.com/apache/spark/commit/adeb5b7ea313662a6ab0803acbda1ec8b88bac9f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy method to ma...

2017-02-23 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/17048
  
ping @jkbradley , backport for branch2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17048: [SPARK-14772][PYTHON][ML] Fixed Params.copy metho...

2017-02-23 Thread BryanCutler
GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/17048

[SPARK-14772][PYTHON][ML] Fixed Params.copy method to match Scala 
implementation

## What changes were proposed in this pull request?
Fixed the PySpark Params.copy method to behave like the Scala 
implementation.  The main issue was that it did not account for the 
_defaultParamMap and merged it into the explicitly created param map.

## How was this patch tested?
Added new unit test to verify the copy method behaves correctly for copying 
uid, explicitly created params, and default params.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark 
pyspark-ml-param_copy-Scala_sync-SPARK-14772-2_1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17048


commit adeb5b7ea313662a6ab0803acbda1ec8b88bac9f
Author: Bryan Cutler 
Date:   2017-02-01T23:19:57Z

fixed Params.copy method to account for _defaultParamMap and match Scala 
implementation

modified test case to include an explicitly set param

reworked test to be Python 2.6 compatible




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-23 Thread kunalkhamar
Github user kunalkhamar commented on the issue:

https://github.com/apache/spark/pull/16826
  
jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16996
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73389/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17047
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73385/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16996
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17047
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16996
  
**[Test build #73389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73389/testReport)**
 for PR 16996 at commit 
[`86deb62`](https://github.com/apache/spark/commit/86deb6233faa3b64c999786741a0b0cf3cbbe457).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive information...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17047
  
**[Test build #73385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73385/testReport)**
 for PR 17047 at commit 
[`000efb1`](https://github.com/apache/spark/commit/000efb1e3152f837e01ce1f80ae108c596f9baa5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-23 Thread kunalkhamar
Github user kunalkhamar commented on the issue:

https://github.com/apache/spark/pull/16826
  
jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16826
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16826
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73388/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16826
  
**[Test build #73388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73388/testReport)**
 for PR 16826 at commit 
[`16824f9`](https://github.com/apache/spark/commit/16824f916e87fd90706f9dfd7b7dd81d87b732dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...

2017-02-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16944
  
**[Test build #73393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73393/testReport)**
 for PR 16944 at commit 
[`9b0b2bb`](https://github.com/apache/spark/commit/9b0b2bb3fbc7db9e71b3342014b729568290dffd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17036: [SPARK-19706][pyspark] add Column.contains in pyspark

2017-02-23 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/17036
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16395: [SPARK-17075][SQL] implemented filter estimation

2017-02-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16395


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >