[GitHub] spark pull request #16016: Branch 2.1

2016-11-25 Thread horo90
Github user horo90 closed the pull request at:

https://github.com/apache/spark/pull/16016


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16016: Branch 2.1

2016-11-25 Thread horo90
GitHub user horo90 reopened a pull request:

https://github.com/apache/spark/pull/16016

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16016


commit 39d2fdb51233ed9b1aaf3adaa3267853f5e58c0f
Author: frreiss 
Date:   2016-11-02T06:00:17Z

[SPARK-17475][STREAMING] Delete CRC files if the filesystem doesn't use 
checksum files

## What changes were proposed in this pull request?

When the metadata logs for various parts of Structured Streaming are stored 
on non-HDFS filesystems such as NFS or ext4, the HDFSMetadataLog class leaves 
hidden HDFS-style checksum (CRC) files in the log directory, one file per 
batch. This PR modifies HDFSMetadataLog so that it detects the use of a 
filesystem that doesn't use CRC files and removes the CRC files.
## How was this patch tested?

Modified an existing test case in HDFSMetadataLogSuite to check whether 
HDFSMetadataLog correctly removes CRC files on the local POSIX filesystem.  Ran 
the entire regression suite.

Author: frreiss 

Closes #15027 from frreiss/fred-17475.

(cherry picked from commit 620da3b4828b3580c7ed7339b2a07938e6be1bb1)
Signed-off-by: Reynold Xin 

commit e6509c2459e7ece3c3c6bcd143b8cc71f8f4d5c8
Author: Eric Liang 
Date:   2016-11-02T06:15:10Z

[SPARK-18183][SPARK-18184] Fix INSERT [INTO|OVERWRITE] TABLE ... PARTITION 
for Datasource tables

There are a couple issues with the current 2.1 behavior when inserting into 
Datasource tables with partitions managed by Hive.

(1) OVERWRITE TABLE ... PARTITION will actually overwrite the entire table 
instead of just the specified partition.
(2) INSERT|OVERWRITE does not work with partitions that have custom 
locations.

This PR fixes both of these issues for Datasource tables managed by Hive. 
The behavior for legacy tables or when `manageFilesourcePartitions = false` is 
unchanged.

There is one other issue in that INSERT OVERWRITE with dynamic partitions 
will overwrite the entire table instead of just the updated partitions, but 
this behavior is pretty complicated to implement for Datasource tables. We 
should address that in a future release.

Unit tests.

Author: Eric Liang 

Closes #15705 from ericl/sc-4942.

(cherry picked from commit abefe2ec428dc24a4112c623fb6fbe4b2ca60a2b)
Signed-off-by: Reynold Xin 

commit 85dd073743946383438aabb9f1281e6075f25cc5
Author: Reynold Xin 
Date:   2016-11-02T06:37:03Z

[SPARK-18192] Support all file formats in structured streaming

## What changes were proposed in this pull request?
This patch adds support for all file formats in structured streaming sinks. 
This is actually a very small change thanks to all the previous refactoring 
done using the new internal commit protocol API.

## How was this patch tested?
Updated FileStreamSinkSuite to add test cases for json, text, and parquet.

Author: Reynold Xin 

Closes #15711 from rxin/SPARK-18192.

(cherry picked from commit a36653c5b7b2719f8bfddf4ddfc6e1b828ac9af1)
Signed-off-by: Reynold Xin 

commit 4c4bf87acf2516a72b59f4e760413f80640dca1e
Author: CodingCat 
Date:   2016-11-02T06:39:53Z

[SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

## What changes were proposed in this pull request?

The PR fixes the bug that the QueryStartedEvent is not logged

the postToAll() in the original code is actually calling 
StreamingQueryListenerBus.postToAll() which has no listener at allwe shall 
post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local 
listeners as well as the listeners registered in LiveListenerBus

zsxwing
## How was this patch tested?

The following snapshot shows that QueryStartedEvent has been logged 
correctly



[GitHub] spark pull request #16016: Branch 2.1

2016-11-25 Thread horo90
Github user horo90 closed the pull request at:

https://github.com/apache/spark/pull/16016


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...

2016-11-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16003
  
How about Spark 2.1 altering the table metadata created by Spark 2.0? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15975
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15975
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69180/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15975
  
**[Test build #69180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69180/consoleFull)**
 for PR 15975 at commit 
[`1b0caea`](https://github.com/apache/spark/commit/1b0caea20bd233ffda5113c11234d8fd57f6faa3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16015
  
**[Test build #69181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69181/consoleFull)**
 for PR 16015 at commit 
[`9d965e7`](https://github.com/apache/spark/commit/9d965e74be85dcb1ae75ee102ee63a15c411a4d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16015
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16015
  
The only one test failure is irrelevant to this PR.
```
[info] - set spark.sql.warehouse.dir *** FAILED *** (5 minutes, 0 seconds)
[info]   Timeout of './bin/spark-submit' '--class' 
'org.apache.spark.sql.hive.SetWarehouseLocationTest' '--name' 
'SetSparkWarehouseLocationTest' '--master' 'local-cluster[2,1,1024]' '--conf' 
'spark.ui.enabled=false' '--conf' 'spark.master.rest.enabled=false' 
'--driver-java-options' '-Dderby.system.durability=test' 
'file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-27a1c717-99bc-44c6-8af7-710c8440c14d/testJar-1480135147576.jar'
 See the log4j logs for more detail.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15975
  
@gatorsmile NP. Thank you for informing that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16016: Branch 2.1

2016-11-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16016
  
Cloud you please close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16016: Branch 2.1

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16016
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16016: Branch 2.1

2016-11-25 Thread horo90
GitHub user horo90 opened a pull request:

https://github.com/apache/spark/pull/16016

Branch 2.1

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16016


commit 39d2fdb51233ed9b1aaf3adaa3267853f5e58c0f
Author: frreiss 
Date:   2016-11-02T06:00:17Z

[SPARK-17475][STREAMING] Delete CRC files if the filesystem doesn't use 
checksum files

## What changes were proposed in this pull request?

When the metadata logs for various parts of Structured Streaming are stored 
on non-HDFS filesystems such as NFS or ext4, the HDFSMetadataLog class leaves 
hidden HDFS-style checksum (CRC) files in the log directory, one file per 
batch. This PR modifies HDFSMetadataLog so that it detects the use of a 
filesystem that doesn't use CRC files and removes the CRC files.
## How was this patch tested?

Modified an existing test case in HDFSMetadataLogSuite to check whether 
HDFSMetadataLog correctly removes CRC files on the local POSIX filesystem.  Ran 
the entire regression suite.

Author: frreiss 

Closes #15027 from frreiss/fred-17475.

(cherry picked from commit 620da3b4828b3580c7ed7339b2a07938e6be1bb1)
Signed-off-by: Reynold Xin 

commit e6509c2459e7ece3c3c6bcd143b8cc71f8f4d5c8
Author: Eric Liang 
Date:   2016-11-02T06:15:10Z

[SPARK-18183][SPARK-18184] Fix INSERT [INTO|OVERWRITE] TABLE ... PARTITION 
for Datasource tables

There are a couple issues with the current 2.1 behavior when inserting into 
Datasource tables with partitions managed by Hive.

(1) OVERWRITE TABLE ... PARTITION will actually overwrite the entire table 
instead of just the specified partition.
(2) INSERT|OVERWRITE does not work with partitions that have custom 
locations.

This PR fixes both of these issues for Datasource tables managed by Hive. 
The behavior for legacy tables or when `manageFilesourcePartitions = false` is 
unchanged.

There is one other issue in that INSERT OVERWRITE with dynamic partitions 
will overwrite the entire table instead of just the updated partitions, but 
this behavior is pretty complicated to implement for Datasource tables. We 
should address that in a future release.

Unit tests.

Author: Eric Liang 

Closes #15705 from ericl/sc-4942.

(cherry picked from commit abefe2ec428dc24a4112c623fb6fbe4b2ca60a2b)
Signed-off-by: Reynold Xin 

commit 85dd073743946383438aabb9f1281e6075f25cc5
Author: Reynold Xin 
Date:   2016-11-02T06:37:03Z

[SPARK-18192] Support all file formats in structured streaming

## What changes were proposed in this pull request?
This patch adds support for all file formats in structured streaming sinks. 
This is actually a very small change thanks to all the previous refactoring 
done using the new internal commit protocol API.

## How was this patch tested?
Updated FileStreamSinkSuite to add test cases for json, text, and parquet.

Author: Reynold Xin 

Closes #15711 from rxin/SPARK-18192.

(cherry picked from commit a36653c5b7b2719f8bfddf4ddfc6e1b828ac9af1)
Signed-off-by: Reynold Xin 

commit 4c4bf87acf2516a72b59f4e760413f80640dca1e
Author: CodingCat 
Date:   2016-11-02T06:39:53Z

[SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

## What changes were proposed in this pull request?

The PR fixes the bug that the QueryStartedEvent is not logged

the postToAll() in the original code is actually calling 
StreamingQueryListenerBus.postToAll() which has no listener at allwe shall 
post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local 
listeners as well as the listeners registered in LiveListenerBus

zsxwing
## How was this patch tested?

The following snapshot shows that QueryStartedEvent has been logged 
correctly


![image](https://cloud.githubusercontent.com/assets/678008/19821553/007a7d28-9d2d-11e6-9f13-49851559cdaa.png)
 

[GitHub] spark issue #15662: [SPARK-18141][SQL] Fix to quote column names in the pred...

2016-11-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15662
  
@sureshthalamati Could you resolve the conflict? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15662: [SPARK-18141][SQL] Fix to quote column names in t...

2016-11-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15662#discussion_r89666271
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -172,7 +172,7 @@ class JDBCSuite extends SparkFunSuite
   """.stripMargin.replaceAll("\n", " "))
 
 conn.prepareStatement(
-  "create table test.emp(name TEXT(32) NOT NULL," +
+  "create table test.emp(\"Name\" TEXT(32) NOT NULL," +
--- End diff --

This is an unnecessary change, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69179/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16013
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69179/consoleFull)**
 for PR 16013 at commit 
[`73fcd35`](https://github.com/apache/spark/commit/73fcd355a565c5ea433b1f8ca11e08ee6c3f2a9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15975
  
**[Test build #69180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69180/consoleFull)**
 for PR 15975 at commit 
[`1b0caea`](https://github.com/apache/spark/commit/1b0caea20bd233ffda5113c11234d8fd57f6faa3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15975
  
@dongjoon-hyun Will not add test cases for the write path in this PR, 
because it requires code changes on the source codes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16015
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16015
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69178/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16015
  
**[Test build #69178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69178/consoleFull)**
 for PR 16015 at commit 
[`9d965e7`](https://github.com/apache/spark/commit/9d965e74be85dcb1ae75ee102ee63a15c411a4d8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class OuterReference(e: NamedExpression)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89665558
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala 
---
@@ -155,7 +155,7 @@ class DoubleRDDFunctions(self: RDD[Double]) extends 
Logging with Serializable {
* to the right except for the last which is closed
*  e.g. for the array
*  [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50]
-   *  e.g 1<=x<10 , 10<=x<20, 20<=x<=50
+   *  e.g 1=x10 , 10=x20, 20=x=50
--- End diff --

This originally gives an error as below

```
[error] .../java/org/apache/spark/rdd/DoubleRDDFunctions.java:73: error: 
malformed HTML
[error]*  e.g 1<=x<10, 10<=x<20, 20<=x<=50
[error]^
[error] .../java/org/apache/spark/rdd/DoubleRDDFunctions.java:73: error: 
malformed HTML
[error]*  e.g 1<=x<10, 10<=x<20, 20<=x<=50
[error]   ^
[error] .../java/org/apache/spark/rdd/DoubleRDDFunctions.java:73: error: 
malformed HTML
[error]*  e.g 1<=x<10, 10<=x<20, 20<=x<=50
[error]  ^
...
```

However, after fixing it as above, 

This is being printed as they are in javadoc (not in scaladoc)

https://cloud.githubusercontent.com/assets/6477701/20638079/e17d0742-b3de-11e6-820d-d2ac85d09947.png;>

It seems we should find another approach to deal with this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16007: [SPARK-18583][SQL] Fix nullability of InputFileNa...

2016-11-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16007


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16007
  
Merging in master/branch-2.1. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89665095
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -2063,6 +2063,7 @@ class SparkContext(config: SparkConf) extends Logging 
{
* @param jobId the job ID to cancel
* @throws InterruptedException if the cancel message cannot be sent
--- End diff --

It seems fine

- Scala

  https://cloud.githubusercontent.com/assets/6477701/20637897/1a78be2a-b3d9-11e6-939a-47c202a50037.png;>

- Java

  https://cloud.githubusercontent.com/assets/6477701/20637898/1eded54e-b3d9-11e6-90a5-5b9c34ec0831.png;>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89664964
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulator.scala ---
@@ -26,7 +26,7 @@ package org.apache.spark
  *
  * An accumulator is created from an initial value `v` by calling
  * [[SparkContext#accumulator SparkContext.accumulator]].
- * Tasks running on the cluster can then add to it using the 
[[Accumulable#+= +=]] operator.
+ * Tasks running on the cluster can then add to it using the `+=` operator.
--- End diff --

After this PR it still prints the same.

- Scala

  https://cloud.githubusercontent.com/assets/6477701/20637848/2d670926-b3d7-11e6-8665-a9f3852545c2.png;>

- Java

  https://cloud.githubusercontent.com/assets/6477701/20637849/322b675e-b3d7-11e6-925b-a9160f06bbc8.png;>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89664921
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -262,8 +262,9 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging with Seria
   /**
* Get a time parameter as seconds; throws a NoSuchElementException if 
it's not set. If no
* suffix is provided then seconds are assumed.
-   * @throws NoSuchElementException
+   * @throws java.util.NoSuchElementException
--- End diff --

This is interesting. Using `@throws NoSuchElementException` complains as 
below:

```
[error]   location: class VectorIndexerModel
[error] .../java/org/apache/spark/SparkConf.java:226: error: reference not 
found
[error]* @throws NoSuchElementException
[error]  ^
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...

2016-11-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16013#discussion_r89664911
  
--- Diff: core/src/main/scala/org/apache/spark/Accumulator.scala ---
@@ -26,7 +26,7 @@ package org.apache.spark
  *
  * An accumulator is created from an initial value `v` by calling
  * [[SparkContext#accumulator SparkContext.accumulator]].
- * Tasks running on the cluster can then add to it using the 
[[Accumulable#+= +=]] operator.
+ * Tasks running on the cluster can then add to it using the `+=` operator.
--- End diff --

I just decided to keep original format rather than trying to make this 
pretty.

The original was as below:

- Scala
  https://cloud.githubusercontent.com/assets/6477701/20637823/6f1c8914-b3d6-11e6-83f4-87355205d4c1.png;>

- Java
  https://cloud.githubusercontent.com/assets/6477701/20637824/6f1cfce6-b3d6-11e6-93d7-2bae071f5753.png;>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16013
  
**[Test build #69179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69179/consoleFull)**
 for PR 16013 at commit 
[`73fcd35`](https://github.com/apache/spark/commit/73fcd355a565c5ea433b1f8ca11e08ee6c3f2a9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...

2016-11-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15916
  
Forget to say, of course, this example will thrown the exception only 
running in "test".

Other developers would possibly encounter this when they write test codes 
in the future. If we could provide more info in this error message, we could 
save their time to investigate this.

What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16015: [SPARK-17251][SQL] Improve `OuterReference` to be `Named...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16015
  
**[Test build #69178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69178/consoleFull)**
 for PR 16015 at commit 
[`9d965e7`](https://github.com/apache/spark/commit/9d965e74be85dcb1ae75ee102ee63a15c411a4d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16012: [SPARK-17251][SQL] Support `OuterReference` in projectio...

2016-11-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16012
  
Thank you for review, @hvanhovell and @nsyca .
I agree with you. We need enough time for this.
So, the option one for 2.1 is spun off into #16015.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16015: [SPARK-17251][SQL] Improve `OuterReference` to be...

2016-11-25 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/16015

[SPARK-17251][SQL] Improve `OuterReference` to be `NamedExpression`

## What changes were proposed in this pull request?

Currently, `OuterReference` is not `NamedExpression`. So, it raises 
'ClassCastException` when it used in projection lists of IN correlated 
subqueries. This PR aims to support that by making `OuterReference` as 
`NamedExpression` to show correct error messages.

```scala
scala> sql("CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES 1, 2 AS t1(a)")
scala> sql("CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES 1 AS t2(b)")
scala> sql("SELECT a FROM t1 WHERE a IN (SELECT a FROM t2)").show
java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.OuterReference cannot be cast to 
org.apache.spark.sql.catalyst.expressions.NamedExpression
```

## How was this patch tested?

Pass the Jenkins test with new test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-17251-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16015.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16015


commit 9d965e74be85dcb1ae75ee102ee63a15c411a4d8
Author: Dongjoon Hyun 
Date:   2016-11-26T03:24:29Z

[SPARK-17251][SQL] Improve `OuterReference` to be `NamedExpression`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16007
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69177/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16007
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16007
  
**[Test build #69177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69177/consoleFull)**
 for PR 16007 at commit 
[`2657d95`](https://github.com/apache/spark/commit/2657d955741299431f708c99584514e999ef90c4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15975
  
Will update it tonight. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...

2016-11-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15916
  
The test case I added in this pr:

val rng = new scala.util.Random(42)
val data = sparkContext.parallelize(Seq.tabulate(100) { i =>
  Row(Array.fill(10)(rng.nextInt(10)))
})
val schema = StructType(Seq(
  StructField("arr", DataTypes.createArrayType(DataTypes.IntegerType))
))
val df = spark.createDataFrame(data, schema)
val exploded = df.select(struct(col("*")).as("star"), 
explode(col("arr")).as("a"))
val joined = exploded.join(exploded, "a").drop("a").distinct()
joined.show()

would thrown an exception like this:

[info] - SPARK-18487: Consume all elements for show/take to avoid 
memory leak *** FAILED *** (1 second, 73 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted due to stage 
failure: Task 0 in stage 179.0 failed 1 times, most recent failure: Lost task 
0.0 in stage 179.0 (TID 501, localhost, executor driver): 
org.apache.spark.SparkException: Managed memory leak detected; size = 33816576 
bytes, TID = 501
[info]  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:295)
[info]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]  at java.lang.Thread.run(Thread.java:745)

I submitted this pr because @sethah encountered this exception during his 
test. I think it might be other developers hit this in the future. If they 
don't know this part, from the error message they would think a memory leak 
happened. In order to avoid this and provide more useful info, I'd like to 
modify this error message too.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread nsyca
Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89664100
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -989,7 +989,7 @@ class Analyzer(
   withPosition(u) {
 try {
   outer.resolve(nameParts, resolver) match {
-case Some(outerAttr) => OuterReference(outerAttr)
+case Some(outerAttr) => OuterReference(outerAttr)()
--- End diff --

Another interesting case to consider:
sql
select ...
from   t1
where  t1.c1 in (select sum(t1.c2) from t2)


If we support correlated columns in SELECT clause, do we build the 
Aggregate on T2 or T1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15358
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69176/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15358
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15358
  
**[Test build #69176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69176/consoleFull)**
 for PR 15358 at commit 
[`45e0ee3`](https://github.com/apache/spark/commit/45e0ee31347752b8a5f5bbf325a536b0aae1a3e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16014
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69175/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16014
  
**[Test build #69175 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69175/consoleFull)**
 for PR 16014 at commit 
[`7977139`](https://github.com/apache/spark/commit/79771392f7a8c7fe4ed90b20aec05e5e65304975).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16007
  
Thanks - LGTM.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16007
  
**[Test build #69177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69177/consoleFull)**
 for PR 16007 at commit 
[`2657d95`](https://github.com/apache/spark/commit/2657d955741299431f708c99584514e999ef90c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16007
  
I see, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16007
  
Yes! That's what I meant -- change it false and add some documentation and 
one require to force that contract.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-25 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16014
  
@shivaram



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15916
  
Can you show an example of a leak that would happen in Executor but not in 
the callback? Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...

2016-11-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15916
  
@rxin BTW, I see you merged #15989 to downgrade error message level in 
TaskMemoryManager.

I'd like to modify the error message in Executor too, because the current 
one is little confusing to developers if they don't know this part exactly and 
they would think there is memory leak happened.

What do you think? If it is ok for you, I'd submit a pr for it.

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16007
  
@rxin Sorry but finally we can change the nullable value to `false`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15358
  
**[Test build #69176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69176/consoleFull)**
 for PR 15358 at commit 
[`45e0ee3`](https://github.com/apache/spark/commit/45e0ee31347752b8a5f5bbf325a536b0aae1a3e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15975
  
@gatorsmile did you update this?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread a-roberts
Github user a-roberts commented on the issue:

https://github.com/apache/spark/pull/15736
  
I've conducted a lot of performance tests and gathered .hcd files so I can 
investigate this next week, but it looks like either the first commit is the 
best for performance or my current configuration with this benchmark results in 
us being unable to infer if our changes really make a difference.

Sharing some raw data, the format is as follows.

Benchmark name, date, time, data size in bytes (the same each run), the 
elapsed time and the throughput (bytes per second).

**With the above suggestions for Partitioned*Buffer**
```
ScalaSparkPagerank 2016-11-25 18:49:23 25992811549.577  
 5242917  
ScalaSparkPagerank 2016-11-25 18:56:55 25992811549.946  
 5204182  
ScalaSparkPagerank 2016-11-25 19:00:04 25992811546.510  
 5588650  
ScalaSparkPagerank 2016-11-25 19:02:23 25992811549.018  
 5302707  
ScalaSparkPagerank 2016-11-25 19:05:25 25992811549.270  
 5275585  
```

**Vanilla, no changes at all**
```
ScalaSparkPagerank 2016-11-25 19:08:45 25992811548.068  
 5407508  
ScalaSparkPagerank 2016-11-25 19:11:20 25992811547.712  
 5447856  
ScalaSparkPagerank 2016-11-25 19:13:50 25992811544.517  
 5838850  
ScalaSparkPagerank 2016-11-25 19:16:07 25992811549.942  
 5204599  
ScalaSparkPagerank 2016-11-25 19:19:08 25992811548.521  
 5357023  
```

**Original commit**
```
ScalaSparkPagerank 2016-11-25 19:47:59 25992811545.486  
 5714464  
ScalaSparkPagerank 2016-11-25 19:50:48 25992811548.507  
 5358569  
ScalaSparkPagerank 2016-11-25 19:53:09 25992811547.063  
 5522982  
ScalaSparkPagerank 2016-11-25 19:56:58 25992811546.154  
 5631757  
ScalaSparkPagerank 2016-11-25 20:00:01 25992811548.935  
 5311701
```

In Healthcenter I do see that these methods are still great candidates for 
optimisation as they are all very commonly used.

Open to more suggestions, I have exclusive access to lots of hardware, can 
easily churn out more custom builds and have lots of profiling software we can 
use. I'll be committing code for the SizeEstimator soon as that's a good 
candidate for optimisation here as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...

2016-11-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15994#discussion_r89662083
  
--- Diff: project/MimaExcludes.scala ---
@@ -529,6 +529,7 @@ object MimaExcludes {
   
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.mllib.evaluation.MulticlassMetrics.this"),
   
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.mllib.evaluation.RegressionMetrics.this"),
   
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameNaFunctions.this"),
+  
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameNaFunctions.fill"),
--- End diff --

The thing is they are not backward compatible at bytecode level, so 
applications will break if they are not rebuilt.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16007
  
Alright I looked more into this -- I think your approach might be better 
actually.

Can you add an require in `InputFileNameHolder.setInputFileName` to verify 
the input is not null, and then document in InputFileNameHolder to say the 
returned value should never be null, and empty string if it is unknown?

Then we can change the nullable value to true for this expression.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16007
  
@rxin And also we should modify the generated code to check the value is 
null or not, shouldn't we?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if p...

2016-11-25 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16008#discussion_r89661757
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -61,7 +61,6 @@ case class CreateArray(children: Seq[Expression]) extends 
Expression {
 ctx.addMutableState("Object[]", values, s"this.$values = null;")
 
 ev.copy(code = s"""
-  final boolean ${ev.isNull} = false;
--- End diff --

can you explain how this change improves the code? I'd think it is no-op 
but maybe it's not the case?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15916: [SPARK-18487][SQL] Add completion listener to Has...

2016-11-25 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/15916


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15916: [SPARK-18487][SQL] Add completion listener to HashAggreg...

2016-11-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15916
  
@rxin Thanks. Appreciate your feedback. I could close this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16007
  
I see, I'll revert this and add the comment. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16009: [SPARK-18318][ML] ML, Graph 2.1 QA: API: New Scal...

2016-11-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16009#discussion_r89661577
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala ---
@@ -49,15 +49,13 @@ private[feature] trait ChiSqSelectorParams extends 
Params
*
* @group param
*/
-  @Since("1.6.0")
--- End diff --

why are the`@since` removed, btw?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16007
  
I wouldn't change the default as it might break compatibility. That said, I 
don't think it is safe to just set this to non-nullable because it is a very 
implicit assumption, and setting it to be nullable is never "wrong". I'd add 
some comment explaining why it is nullable (e.g. "It depends on the semantics 
of the caller for InputFileNameHolder, and there is no guarantee that it won't 
be null")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-11-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16014#discussion_r89661428
  
--- Diff: dev/create-release/release-build.sh ---
@@ -189,6 +189,9 @@ if [[ "$1" == "package" ]]; then
   SHA512 $PYTHON_DIST_NAME > \
   $PYTHON_DIST_NAME.sha
 
+echo "Copying R source package"
+cp spark-$SPARK_VERSION-bin-$NAME/R/SparkR_$SPARK_VERSION.tar.gz .
--- End diff --

this is the source package we should release to CRAN


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-11-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16014#discussion_r89661414
  
--- Diff: R/pkg/NAMESPACE ---
@@ -3,7 +3,7 @@
 importFrom("methods", "setGeneric", "setMethod", "setOldClass")
 importFrom("methods", "is", "new", "signature", "show")
 importFrom("stats", "gaussian", "setNames")
-importFrom("utils", "download.file", "object.size", "packageVersion", 
"untar")
+importFrom("utils", "download.file", "object.size", "packageVersion", 
"tail", "untar")
--- End diff --

This was regressed from a recent commit. check-cran.sh actually is flagging 
this in an existing NOTE but we only check for # of NOTE (which is still 1), 
and so this went in undetected.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-11-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16014#discussion_r89661364
  
--- Diff: R/pkg/DESCRIPTION ---
@@ -1,28 +1,27 @@
 Package: SparkR
 Type: Package
-Title: R Frontend for Apache Spark
 Version: 2.1.0
-Date: 2016-11-06
--- End diff --

this is removed - I tried but haven't found a way to update this 
automatically, (I guess this could be in the 
[release-tag](https://github.com/apache/spark/blob/master/dev/create-release/release-tag.sh)
 script though)
But more importantly, seems like many (most?) packages do not have this in 
their DESCRIPTION.

In any case, release date are stamped when releasing to CRAN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16014: [SPARK-18590][SPARKR] build R source package when making...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16014
  
**[Test build #69175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69175/consoleFull)**
 for PR 16014 at commit 
[`7977139`](https://github.com/apache/spark/commit/79771392f7a8c7fe4ed90b20aec05e5e65304975).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16014: [SPARK-18590][SPARKR] build R source package when...

2016-11-25 Thread felixcheung
GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/16014

[SPARK-18590][SPARKR] build R source package when making distribution

## What changes were proposed in this pull request?

We should include in Spark distribution the built source package for 
SparkR. This will enable help and vignettes when the package is used. Also this 
source package is what we would release to CRAN.

### more details

These are the additional steps in make-distribution; please see 
[here](https://github.com/apache/spark/blob/master/R/CRAN_RELEASE.md) on what's 
going to a CRAN release, which is now run during make-distribution.sh.
1. package needs to be installed because the first code block in vignettes 
is `library(SparkR)` without lib path
2. `R CMD build` will build vignettes
3. `R CMD check` on the source package will install package and build 
vignettes again (this time from source packaged)
 (will skip tests here but tests will need to pass for CRAN release process 
to success - ideally, during release signoff we should install from the R 
package and run tests)
4. `R CMD Install` on the source package (this is the only way to generate 
doc/vignettes rds files correctly, not in step #1)
 (the output of this step is what we package into Spark dist and sparkr.zip)

Alternatively, 
   R CMD build should already be installing the package in a temp directory 
though it might just be finding this location and set it to lib.loc parameter; 
another approach is perhaps we could try calling `R CMD INSTALL --build pkg` 
instead.
 But in any case, despite installing the package multiple times this is 
relatively fast. 
Building vignettes takes a while though.



## How was this patch tested?

Manually, CI.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rdist

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16014


commit 79771392f7a8c7fe4ed90b20aec05e5e65304975
Author: Felix Cheung 
Date:   2016-11-25T23:00:25Z

build source package in make-distribution, and take that as a part of the 
distribution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15998
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15998
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69174/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15998
  
**[Test build #69174 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69174/consoleFull)**
 for PR 15998 at commit 
[`4e03c3e`](https://github.com/apache/spark/commit/4e03c3e46d22e5fe1b1fbc01ea57ef15d2723b9b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16007: [SPARK-18583][SQL] Fix nullability of InputFileName.

2016-11-25 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/16007
  
@rxin The default value is `""` (`UTF8String.fromString("")`) if input file 
name is not set for now.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/InputFileNameHolder.scala#L32
Should we change the default value to `null`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16012: [SPARK-17251][SQL] Support `OuterReference` in projectio...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16012
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69173/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16012: [SPARK-17251][SQL] Support `OuterReference` in projectio...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16012
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16012: [SPARK-17251][SQL] Support `OuterReference` in projectio...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16012
  
**[Test build #69173 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69173/consoleFull)**
 for PR 16012 at commit 
[`3de9419`](https://github.com/apache/spark/commit/3de9419a30790020fb4d562625941dbc5e1772d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class OuterReference(e: NamedExpression)(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread nsyca
Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89657974
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -989,7 +989,7 @@ class Analyzer(
   withPosition(u) {
 try {
   outer.resolve(nameParts, resolver) match {
-case Some(outerAttr) => OuterReference(outerAttr)
+case Some(outerAttr) => OuterReference(outerAttr)()
--- End diff --

I have not looked at the code changes closely but got a general idea of 
what the originally reported problem is. I second @hvanhovell to not support 
outer reference in a SELECT clause of a subquery in 2.1. Just fix the named 
expression first.

IN subquery might be okay as it reflects the inner join semantics more or 
less. NOT IN subquery is converted to a special case of an anti-join with extra 
logic for the null value.

sql
select *
from   tbl_a
where  tbl_a.c1 not in (select tbl_a.c2 from tbl_b)


Does the LeftAnti with effectively no join predicate, i.e.,

`(isnull(tbl_a.c1 = tbl_a.c2) || (tbl_a.c1 = tbl_a.c2))`

work correctly today? And if it returns a correct result, is it by design, 
not by chance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-25 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r89656487
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 ---
@@ -482,6 +482,19 @@ class InMemoryCatalog(
 }
   }
 
+  override def listPartitionNames(
+  db: String,
+  table: String,
+  partialSpec: Option[TablePartitionSpec] = None): Seq[String] = 
synchronized {
+val partitionColumnNames = getTable(db, table).partitionColumnNames
+
+listPartitions(db, table, partialSpec).map { partition =>
+  partitionColumnNames.map { name =>
+name + "=" + partition.spec(name)
--- End diff --

Does this need escaping, as provided by escapePathName?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-25 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r89656749
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   /**
* Returns the partition names from hive metastore for a given table in 
a database.
*/
+  override def listPartitionNames(
+  db: String,
+  table: String,
+  partialSpec: Option[TablePartitionSpec] = None): Seq[String] = 
withClient {
+val actualPartColNames = getTable(db, table).partitionColumnNames
+val clientPartitionNames =
+  client.getPartitionNames(db, table, 
partialSpec.map(lowerCasePartitionSpec))
+
+if (actualPartColNames.exists(partColName => partColName != 
partColName.toLowerCase)) {
+  clientPartitionNames.map { partName =>
+val partSpec = PartitioningUtils.parsePathFragmentAsSeq(partName)
--- End diff --

Is the (un)escaping here correct? It would be nice to have a unit test to 
verify these edge cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-25 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r89656509
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -189,6 +189,21 @@ abstract class ExternalCatalog {
   spec: TablePartitionSpec): Option[CatalogTablePartition]
 
   /**
+   * List the names of all partitions that belong to the specified table, 
assuming it exists.
+   *
+   * A partial partition spec may optionally be provided to filter the 
partitions returned.
+   * For instance, if there exist partitions (a='1', b='2'), (a='1', 
b='3') and (a='2', b='4'),
+   * then a partial spec of (a='1') will return the first two only.
--- End diff --

nit: newline here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15998: [SPARK-18572][SQL] Add a method `listPartitionNam...

2016-11-25 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15998#discussion_r89656787
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -922,6 +923,29 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   /**
* Returns the partition names from hive metastore for a given table in 
a database.
*/
+  override def listPartitionNames(
+  db: String,
+  table: String,
+  partialSpec: Option[TablePartitionSpec] = None): Seq[String] = 
withClient {
+val actualPartColNames = getTable(db, table).partitionColumnNames
+val clientPartitionNames =
+  client.getPartitionNames(db, table, 
partialSpec.map(lowerCasePartitionSpec))
+
+if (actualPartColNames.exists(partColName => partColName != 
partColName.toLowerCase)) {
+  clientPartitionNames.map { partName =>
+val partSpec = PartitioningUtils.parsePathFragmentAsSeq(partName)
+partSpec.map { case (partName, partValue) =>
+  actualPartColNames.find(_.equalsIgnoreCase(partName)).get + "=" 
+ partValue
+}.mkString("/")
+  }
+} else {
+  clientPartitionNames
--- End diff --

Consider not having this optimization to avoid two different code paths 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89656824
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -989,7 +989,7 @@ class Analyzer(
   withPosition(u) {
 try {
   outer.resolve(nameParts, resolver) match {
-case Some(outerAttr) => OuterReference(outerAttr)
+case Some(outerAttr) => OuterReference(outerAttr)()
--- End diff --

Hmm. Correct.  I'll check that again. 

BTW, What about the predicates? It felt the predicates are handled the same 
way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89656679
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -356,10 +356,17 @@ case class PrettyAttribute(
  * A place holder used to hold a reference that has been resolved to a 
field outside of the current
  * plan. This is used for correlated subqueries.
  */
-case class OuterReference(e: NamedExpression) extends LeafExpression with 
Unevaluable {
+case class OuterReference(e: NamedExpression)(
+  val exprId: ExprId = NamedExpression.newExprId)
--- End diff --

Is it okay? I thought it works like 'Alias'. Anyway, no problem. I'll 
update like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15977: [SPARK-18436][SQL] isin causing SQL syntax error ...

2016-11-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15977


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15977: [SPARK-18436][SQL] isin causing SQL syntax error with JD...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15977
  
LGTM. Merging to master/2.1. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89656204
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -989,7 +989,7 @@ class Analyzer(
   withPosition(u) {
 try {
   outer.resolve(nameParts, resolver) match {
-case Some(outerAttr) => OuterReference(outerAttr)
+case Some(outerAttr) => OuterReference(outerAttr)()
--- End diff --

I am not sure the analyzer change has the desired effect. This just remove 
the outer reference from the tree, and this won't work if we use the attribute 
anywhere in the tree. For example:
```sql
select *
from   tbl_a
where id in (select x
 from (select tbl_b.id,
  tbl_a.id + 1 as x,
  tbl_a.id + tbl_b.id as y
   from   tbl_b)
  where y > 0)
```

I think we need to break this down into two steps:
1. Do not support this for now and just fix the named expression. That 
would be my goal for 2.1.
2. Try to see if we can rewrite the tree in such a way that we can extract 
the value. That would be my goal for 2.2. I am not sure how well we can make 
this work. In the end I think we need a dedicated subquery operator.

cc @nsyca what do you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89655153
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -356,10 +356,17 @@ case class PrettyAttribute(
  * A place holder used to hold a reference that has been resolved to a 
field outside of the current
  * plan. This is used for correlated subqueries.
  */
-case class OuterReference(e: NamedExpression) extends LeafExpression with 
Unevaluable {
+case class OuterReference(e: NamedExpression)(
+  val exprId: ExprId = NamedExpression.newExprId)
--- End diff --

Use the `exprId` of the `NamedExpression`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16012#discussion_r89655174
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -356,10 +356,17 @@ case class PrettyAttribute(
  * A place holder used to hold a reference that has been resolved to a 
field outside of the current
  * plan. This is used for correlated subqueries.
  */
-case class OuterReference(e: NamedExpression) extends LeafExpression with 
Unevaluable {
+case class OuterReference(e: NamedExpression)(
+  val exprId: ExprId = NamedExpression.newExprId)
+  extends LeafExpression with NamedExpression with Unevaluable {
   override def dataType: DataType = e.dataType
   override def nullable: Boolean = e.nullable
   override def prettyName: String = "outer"
+
+  override def name: String = e.name
+  override def qualifier: Option[String] = e.qualifier
+  override def toAttribute: Attribute = e.toAttribute
+  override def newInstance(): NamedExpression = OuterReference(e)()
--- End diff --

`OuterReference(e.newInstance())()`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15998
  
**[Test build #69174 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69174/consoleFull)**
 for PR 15998 at commit 
[`4e03c3e`](https://github.com/apache/spark/commit/4e03c3e46d22e5fe1b1fbc01ea57ef15d2723b9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15998: [SPARK-18572][SQL] Add a method `listPartitionNames` to ...

2016-11-25 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15998
  
CC @ericl @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15736
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69172/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15736
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15736
  
**[Test build #69172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69172/consoleFull)**
 for PR 15736 at commit 
[`53ed170`](https://github.com/apache/spark/commit/53ed1708112fbf66b04fe89502e534ca3270d15c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89652990
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val (returnPercentileArray: Boolean, percentages: 
Seq[Number]) =
+evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override def dataType: DataType =
+if (returnPercentileArray) ArrayType(DoubleType) else DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(NumericType, TypeCollection(NumericType, ArrayType))
+
+  override def checkInputDataTypes(): TypeCheckResult =
+TypeUtils.checkForNumericExpr(child.dataType, "function percentile")
+
+  override def createAggregationBuffer(): Countings = {
+// Initialize new Countings instance here.
+Countings()
+  }
+
+  private def evalPercentages(expr: Expression): (Boolean, Seq[Number]) = {
+val (isArrayType, values) = 

[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89647985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val (returnPercentileArray: Boolean, percentages: 
Seq[Number]) =
+evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override def dataType: DataType =
+if (returnPercentileArray) ArrayType(DoubleType) else DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(NumericType, TypeCollection(NumericType, ArrayType))
+
+  override def checkInputDataTypes(): TypeCheckResult =
+TypeUtils.checkForNumericExpr(child.dataType, "function percentile")
+
+  override def createAggregationBuffer(): Countings = {
+// Initialize new Countings instance here.
+Countings()
+  }
+
+  private def evalPercentages(expr: Expression): (Boolean, Seq[Number]) = {
+val (isArrayType, values) = 

[GitHub] spark pull request #14136: [SPARK-16282][SQL] Implement percentile SQL funct...

2016-11-25 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14136#discussion_r89646058
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ---
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.Percentile.Countings
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.Platform.BYTE_ARRAY_OFFSET
+import org.apache.spark.util.collection.OpenHashMap
+
+
+/**
+ * The Percentile aggregate function returns the exact percentile(s) of 
numeric column `expr` at
+ * the given percentage(s) with value range in [0.0, 1.0].
+ *
+ * The operator is bound to the slower sort based aggregation path because 
the number of elements
+ * and their partial order cannot be determined in advance. Therefore we 
have to store all the
+ * elements in memory, and that too many elements can cause GC paused and 
eventually OutOfMemory
+ * Errors.
+ *
+ * @param child child expression that produce numeric column value with 
`child.eval(inputRow)`
+ * @param percentageExpression Expression that represents a single 
percentage value or an array of
+ * percentage values. Each percentage value 
must be in the range
+ * [0.0, 1.0].
+ */
+@ExpressionDescription(
+  usage =
+"""
+  _FUNC_(col, percentage) - Returns the exact percentile value of 
numeric column `col` at the
+  given percentage. The value of percentage must be between 0.0 and 
1.0.
+
+  _FUNC_(col, array(percentage1 [, percentage2]...)) - Returns the 
exact percentile value array
+  of numeric column `col` at the given percentage(s). Each value of 
the percentage array must
+  be between 0.0 and 1.0.
+""")
+case class Percentile(
+  child: Expression,
+  percentageExpression: Expression,
+  mutableAggBufferOffset: Int = 0,
+  inputAggBufferOffset: Int = 0) extends 
TypedImperativeAggregate[Countings] {
+
+  def this(child: Expression, percentageExpression: Expression) = {
+this(child, percentageExpression, 0, 0)
+  }
+
+  override def prettyName: String = "percentile"
+
+  override def withNewMutableAggBufferOffset(newMutableAggBufferOffset: 
Int): Percentile =
+copy(mutableAggBufferOffset = newMutableAggBufferOffset)
+
+  override def withNewInputAggBufferOffset(newInputAggBufferOffset: Int): 
Percentile =
+copy(inputAggBufferOffset = newInputAggBufferOffset)
+
+  // Mark as lazy so that percentageExpression is not evaluated during 
tree transformation.
+  private lazy val (returnPercentileArray: Boolean, percentages: 
Seq[Number]) =
+evalPercentages(percentageExpression)
+
+  override def children: Seq[Expression] = child :: percentageExpression 
:: Nil
+
+  // Returns null for empty inputs
+  override def nullable: Boolean = true
+
+  override def dataType: DataType =
+if (returnPercentileArray) ArrayType(DoubleType) else DoubleType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(NumericType, TypeCollection(NumericType, ArrayType))
+
+  override def checkInputDataTypes(): TypeCheckResult =
+TypeUtils.checkForNumericExpr(child.dataType, "function percentile")
+
+  override def createAggregationBuffer(): Countings = {
+// Initialize new Countings instance here.
+Countings()
+  }
+
+  private def evalPercentages(expr: Expression): (Boolean, Seq[Number]) = {
+val (isArrayType, values) = 

  1   2   3   >