date:20160518

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220233909
  
@cloud-fan . 
Now, it's ready again.
Could you merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15114][SQL] Column name generated by ty...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13045#issuecomment-220233152
  
**[Test build #58850 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58850/consoleFull)**
 for PR 13045 at commit 
[`9eb6f40`](https://github.com/apache/spark/commit/9eb6f4063adaf7cda79cdf0bf2ac11414ca5c1d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13189#issuecomment-220233150
  
**[Test build #58848 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58848/consoleFull)**
 for PR 13189 at commit 
[`8db358f`](https://github.com/apache/spark/commit/8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220233159
  
**[Test build #58849 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58849/consoleFull)**
 for PR 13186 at commit 
[`ac3aa33`](https://github.com/apache/spark/commit/ac3aa334b59d430ea7c239c706ed7e490af5f0b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13173#issuecomment-220232881
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13173#issuecomment-220232882
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58845/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13173#issuecomment-220232874
  
**[Test build #58845 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58845/consoleFull)**
 for PR 13173 at commit 
[`ca22d71`](https://github.com/apache/spark/commit/ca22d7102537bd7411f37aa957f877802ebd6d17).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13189#issuecomment-220232797
  
cc @andrewor14 @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/13189

[SPARK-14670][SQL][WIP] allow updating driver side sql metrics

## What changes were proposed in this pull request?

On the SparkUI right now we have this SQLTab that displays accumulator 
values per operator. However, it only displays metrics updated on the 
executors, not on the driver. It is useful to also include driver metrics, e.g. 
broadcast time.

This is a different version from 
https://github.com/apache/spark/pull/12427. This PR sends driver side 
accumulator updates right after the updating happens, not at the end of 
execution. But it has some drawback:

1. If there is no update, we won't send zero value updates, and in web UI 
the operator will be empty, no metrics info in displayed.
2. We need to trigger the event explicitly, not as simply as just update 
the accumulator.
3. maybe hard to use it inside whole stage codegen.

## How was this patch tested?

TODO


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13189


commit 8db358f801f3dbd9f5eacf20dc10ef773c0d7ccb
Author: Wenchen Fan 
Date:   2016-05-19T05:36:34Z

allow updating driver side sql metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220232622
  
**[Test build #58847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58847/consoleFull)**
 for PR 13186 at commit 
[`5bcef84`](https://github.com/apache/spark/commit/5bcef84700bd4ec51097e58bea099ded54334a59).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13173#issuecomment-220232037
  
**[Test build #58845 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58845/consoleFull)**
 for PR 13173 at commit 
[`ca22d71`](https://github.com/apache/spark/commit/ca22d7102537bd7411f37aa957f877802ebd6d17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15308][SQL] RowEncoder should preserve ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13090#issuecomment-220232042
  
**[Test build #58846 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58846/consoleFull)**
 for PR 13090 at commit 
[`698c261`](https://github.com/apache/spark/commit/698c2619dc71650ef0faac278014b539387fb273).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

2016-05-18 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/13173#issuecomment-220231882
  
Doesn't seem to be a valid MiMA check failure. Actually the tool crashed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14346][SQL] Lists unsupported Hive feat...

2016-05-18 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/13173#issuecomment-220231892
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220231390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58843/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220231329
  
**[Test build #58843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58843/consoleFull)**
 for PR 13139 at commit 
[`e0079d0`](https://github.com/apache/spark/commit/e0079d03f279dc68eb19faed6d5cb6823802051a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220231389
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220231132
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58841/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220231131
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220230990
  
**[Test build #58841 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58841/consoleFull)**
 for PR 13122 at commit 
[`84aa14a`](https://github.com/apache/spark/commit/84aa14a5deda14083520e8e23f83cdb7f5bbb2bc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13187#issuecomment-220230863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58840/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13187#issuecomment-220230862
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13188#issuecomment-220230908
  
**[Test build #58844 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58844/consoleFull)**
 for PR 13188 at commit 
[`e584575`](https://github.com/apache/spark/commit/e584575bb786e77b7ea1d6de3f80ec556011d291).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-18 Thread sameeragarwal

GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/13188

[SPARK-15078][SQL] Add all TPCDS 1.4 benchmark queries for SparkSQL

## What changes were proposed in this pull request?

Now that SparkSQL supports all TPC-DS queries, this patch adds all 99 
benchmark queries inside SparkSQL.

## How was this patch tested?

Benchmark only

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark tpcds-all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13188.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13188


commit e584575bb786e77b7ea1d6de3f80ec556011d291
Author: Sameer Agarwal 
Date:   2016-05-03T00:28:12Z

Add all TPCDS queries




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13187#issuecomment-220230733
  
**[Test build #58840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58840/consoleFull)**
 for PR 13187 at commit 
[`9b07d09`](https://github.com/apache/spark/commit/9b07d09301e9c6695e3586e06852f679594d988d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-15078] Add all TPCDS 1.4 benchmark...

2016-05-18 Thread sameeragarwal

Github user sameeragarwal closed the pull request at:

https://github.com/apache/spark/pull/12854


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220230272
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58839/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220230271
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220230302
  
**[Test build #58843 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58843/consoleFull)**
 for PR 13139 at commit 
[`e0079d0`](https://github.com/apache/spark/commit/e0079d03f279dc68eb19faed6d5cb6823802051a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220230146
  
**[Test build #58839 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58839/consoleFull)**
 for PR 13122 at commit 
[`0702178`](https://github.com/apache/spark/commit/0702178a3c485aa316d5b03b3aefb2ea4a228cc2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220229725
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220229727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58842/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220229641
  
**[Test build #58842 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58842/consoleFull)**
 for PR 13139 at commit 
[`ce7c55e`](https://github.com/apache/spark/commit/ce7c55e14a76dc85bca51a2563d770e3eac3a2a2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-18 Thread frreiss

Github user frreiss commented on a diff in the pull request:

https://github.com/apache/spark/pull/13155#discussion_r63823201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1648,16 +1648,56 @@ object RewriteCorrelatedScalarSubquery extends 
Rule[LogicalPlan] {
   }
 
   /**
+   * Statically evaluate an expression containing one or more aggregates 
on an empty input.
+   */
+  private def evalOnZeroTups(expr : Expression) : Option[Any] = {
+// AggregateExpressions are Unevaluable, so we need to replace all 
aggregates
+// in the expression with the value they would return for zero input 
tuples.
+val rewrittenExpr = expr transform {
+  case a @ AggregateExpression(aggFunc, _, _, resultId) =>
+val resultLit = aggFunc.defaultResult match {
+  case Some(lit) => lit
+  case None => Literal.default(NullType)
+}
+Alias(resultLit, "aggVal") (exprId = resultId)
+}
+Option(rewrittenExpr.eval())
+  }
+
+  /**
* Construct a new child plan by left joining the given subqueries to a 
base plan.
*/
   private def constructLeftJoins(
   child: LogicalPlan,
   subqueries: ArrayBuffer[ScalarSubquery]): LogicalPlan = {
 subqueries.foldLeft(child) {
   case (currentChild, ScalarSubquery(query, conditions, _)) =>
+val aggOutputExpr = 
query.asInstanceOf[Aggregate].aggregateExpressions.head
--- End diff --

Sorry, didn't see your reply before I posted mine. I must not have 
refreshed my browser. Thanks for the info on the possible cases. I'm testing 
the updated static evaluation code now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220228793
  
**[Test build #58842 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58842/consoleFull)**
 for PR 13139 at commit 
[`ce7c55e`](https://github.com/apache/spark/commit/ce7c55e14a76dc85bca51a2563d770e3eac3a2a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-18 Thread sethah

Github user sethah commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-220228661
  
@yanboliang @MLnick Thanks for the feedback. For now, I've just addressed 
the comment about the optimization section. I'll address the other comments in 
my next commit (very soon!).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-18 Thread sethah

Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/13139#discussion_r63823104
  
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,197 @@ regression model and extracting model summary 
statistics.
 
 
 
+## Generalized linear regression
+
+When working with data that has a relatively small number of features (< 
4096), Spark's GeneralizedLinearRegression interface
+allows for flexible specification of [generalized linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) which 
can be used for various types of
+prediction problems including linear regression, Poisson regression, 
logistic regression, and others.
+
+Contrasted with linear regression where the output is assumed to follow a 
Gaussian
+distribution, GLMs are specifications of linear models where the response 
variable $Y_i$ may take on _any_
+distribution from the [exponential family of 
distributions](https://en.wikipedia.org/wiki/Exponential_family). 
+
+$$
+Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right)
+$$
+
+An exponential family distribution is any probability distribution of the 
form
+
+$$
+f\left(y|\theta, \phi, w\right) = \exp{\left(\frac{y\theta - 
b(\theta)}{\phi/w} - c(y, \phi)\right)}
+$$
+
+where the parameter of interest $\theta_i$ is related to the expected 
value of the response variable
+$\mu_i$ by
+
+$$
+\theta_i = h(\mu_i)
+$$
+
+Here, $h(\mu_i)$ is defined by the form of the exponential family 
distribution used. GLMs also allow specification
+of a link function, which defines the relationship between the expected 
value of the response variable $\mu_i$
+and the so called _linear predictor_ $\eta_i$:
+
+$$
+g(\mu_i) = \eta_i = \vec{x_i}^T \cdot \vec{\beta}
+$$
+
+Often, the link function is chosen such that $h(\mu) = g(\mu)$, which 
yields a simplified relationship
+between the parameter of interest $\theta$ and the linear predictor 
$\eta$. In this case, the link
+function $g(\mu)$ is said to be the "canonical" link function.
+
+$$
+\theta_i = h(g^{-1}(\eta_i)) = \eta_i
+$$
+
+A GLM finds the regression coefficients $\vec{\beta}$ which maximize the 
likelihood function.
+
+$$
+\min_{\vec{\beta}} \mathcal{L}(\vec{\theta}|\vec{y},X) =
+\prod_{i=1}^{N} \exp{\left(\frac{y_i\theta_i - b(\theta_i)}{\phi/w_i} - 
c(y_i, \phi)\right)}
+$$
+
+where the parameter of interest $\theta_i$ is related to the regression 
coefficients $\vec{\beta}$
+by
+
+$$
+\theta_i = h(g^{-1}(\vec{x_i} \cdot \vec{\beta}))
+$$
+
+Spark's generalized linear regression interface also provides summary 
statistics for diagnosing the
+fit of GLM models, including residuals, p-values, deviances, the Akaike 
information criterion, and
+others.
+
+###  Available families
+
+
+  
+
+  
+  PDF
+  Response Type
+  Supported Links
+  
+  
+
+  Gaussian
+  $\frac{1}{\sigma \sqrt{2\pi}} \exp \left( -\frac{(x - 
\mu)^2}{2\sigma^2}\right)$
+  Continuous
+  Identity*, Log, Inverse
+
+
+  Binomial
+  $\binom{n}{k}p^k (1-p)^{n-k}$
+  Binary
+  Logit*, Probit, CLogLog
+
+
+  Poisson
+  $\frac{\lambda^k e^{-\lambda}}{k!}$
+  Count
+  Log*, Identity, Sqrt
+
+
+  Gamma
+  $\frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta 
x}$
+  Continuous
+  Inverse*, Idenity, Log
+
+* Canonical Link
+  
+
+
+### Optimization
--- End diff --

So, I went ahead and added some more detail on the optimization routine. I 
made an effort to stress the limitations on numFeatures and to give some 
explanation as to why. Could you take a look at it? I didn't generate the docs 
to make sure it looks alright just yet, but I wanted to get that up so it could 
be reviewed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220228251
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220228252
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58835/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220228173
  
**[Test build #58835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58835/consoleFull)**
 for PR 13186 at commit 
[`23b43d4`](https://github.com/apache/spark/commit/23b43d4c837d762461dd56a62b85cb998919e0ef).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63822575
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 
   private def init(): Unit = {
 if (mm != null) {
+  require(capacity < (512 << 20), "Cannot broadcast more than 512 
millions rows")
--- End diff --

Looks like it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63822450
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 
   private def init(): Unit = {
 if (mm != null) {
+  require(capacity < (512 << 20), "Cannot broadcast more than 512 
millions rows")
--- End diff --

yes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63822349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -410,9 +410,10 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
 
   private def init(): Unit = {
 if (mm != null) {
+  require(capacity < (512 << 20), "Cannot broadcast more than 512 
millions rows")
--- End diff --

Is `capacity` number of row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

2016-05-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13167


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/13167#issuecomment-220226195
  
Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10061#discussion_r63822163
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -96,6 +100,7 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdated: SparkListenerBlockUpdated =>
 throw new MatchError(blockUpdated)  // TODO(ekl) implement this
+  case _ => parse(mapper.writeValueAsString(event))
--- End diff --

> Events are a public API, and they should be carefully crafted, since 
changing them affects user applications (including event logs). If there is 
unnecessary information in the event, then it's a bug in the event definition, 
not here.

Yea. I totally agree. However, my concern is that having this line at here 
will make the developer harder to spot issues during the development. Since the 
serialization works automatically, we are not making a self-review on what will 
be serialized and what methods will be called during serialization a mandatory 
step, which makes the auditing work much harder. Although it introduces more 
work to the developer to make every event explicitly handled, when we review 
the pull request, we can clearly know what will be serialized and how a event 
is serialized when a pull request is submitted. What do you think?

btw, if I am missing any context, please let me know :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13167#issuecomment-220225651
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58836/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13167#issuecomment-220225648
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220225586
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220225588
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58837/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13167#issuecomment-220225530
  
**[Test build #58836 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58836/consoleFull)**
 for PR 13167 at commit 
[`a97e358`](https://github.com/apache/spark/commit/a97e3586b7b856d5a62981ff459f48da8d1128bb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220225490
  
**[Test build #58837 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58837/consoleFull)**
 for PR 12719 at commit 
[`0cb1136`](https://github.com/apache/spark/commit/0cb11361ff70d88ae09a4fd31154999fc9c3efae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-220224055
  
@sun-rui @felixcheung Let me try to build and run all tests for R first in 
Windows and then will try to correct and add each test one by one. This will 
take a bit of time and I might have to ask a lot of questions but anyway I will 
try.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63820600
  
--- Diff: 
sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
 ---
@@ -58,4 +58,16 @@ class HiveContext private[hive](
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
 
+  /**
+   * Invalidate and refresh all the cached the metadata of the given 
table. For performance reasons,
+   * Spark SQL or the external data source library it uses might cache 
certain metadata about a
+   * table, such as the location of blocks. When those change outside of 
Spark SQL, users should
+   * call this function to invalidate the cache.
+   *
+   * @since 1.3.0
+   */
+  def refreshTable(tableName: String): Unit = {
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13135#issuecomment-220223044
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58838/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13135#issuecomment-220223043
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13135#issuecomment-220222980
  
**[Test build #58838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58838/consoleFull)**
 for PR 13135 at commit 
[`9ec58e6`](https://github.com/apache/spark/commit/9ec58e6368d848b90b94145a1bb1354587898d82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Revert "[SPARK-10216][SQL] Avoid creating empt...

2016-05-18 Thread HyukjinKwon

Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13181#issuecomment-220222603
  
Hi @marmbrus , it seems okay!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13187#issuecomment-220222494
  
**[Test build #58840 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58840/consoleFull)**
 for PR 13187 at commit 
[`9b07d09`](https://github.com/apache/spark/commit/9b07d09301e9c6695e3586e06852f679594d988d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220222493
  
**[Test build #58841 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58841/consoleFull)**
 for PR 13122 at commit 
[`84aa14a`](https://github.com/apache/spark/commit/84aa14a5deda14083520e8e23f83cdb7f5bbb2bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

2016-05-18 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63820108
  
--- Diff: 
sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
 ---
@@ -58,4 +58,16 @@ class HiveContext private[hive](
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
 
+  /**
+   * Invalidate and refresh all the cached the metadata of the given 
table. For performance reasons,
+   * Spark SQL or the external data source library it uses might cache 
certain metadata about a
+   * table, such as the location of blocks. When those change outside of 
Spark SQL, users should
+   * call this function to invalidate the cache.
+   *
+   * @since 1.3.0
+   */
+  def refreshTable(tableName: String): Unit = {
--- End diff --

This class is for the compatibility purpose. Let's leave it as is. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15322][SQL][FOLLOW-UP] Update deprecate...

2016-05-18 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/13187

[SPARK-15322][SQL][FOLLOW-UP] Update deprecated accumulator usage into 
accumulatorV2

## What changes were proposed in this pull request?

This PR corrects another case that uses deprecated `accumulableCollection` 
to use `listAccumulator`, which seems the previous PR missed.

Since `ArrayBuffer[InternalRow]` is `java.util.List[InternalRow]`, it seems 
reasonable to replace the usage.

## How was this patch tested?

Related existing tests `InMemoryColumnarQuerySuite` and `CachedTableSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-15322

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13187.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13187


commit 9b07d09301e9c6695e3586e06852f679594d988d
Author: hyukjinkwon 
Date:   2016-05-19T03:50:37Z

Use list accumulator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15031][EXAMPLES][FOLLOW-UP] Make Python...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13135#issuecomment-220222031
  
**[Test build #58838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58838/consoleFull)**
 for PR 13135 at commit 
[`9ec58e6`](https://github.com/apache/spark/commit/9ec58e6368d848b90b94145a1bb1354587898d82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13122#issuecomment-220222027
  
**[Test build #58839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58839/consoleFull)**
 for PR 13122 at commit 
[`0702178`](https://github.com/apache/spark/commit/0702178a3c485aa316d5b03b3aefb2ea4a228cc2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15331] [SQL] Disallow All the Unsupport...

2016-05-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13122#discussion_r63819835
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -234,6 +234,13 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 )
   }
 
+  test("unsupported operations") {
--- End diff --

@hvanhovell The latest changes added the test cases for the unsupported 
operations. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15130][PySpark][ML][DOCS] pyspark expos...

2016-05-18 Thread MLnick

Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/12914#issuecomment-220219840

@jkbradley @yanboliang @holdenk @sethah let's discuss the issue of defaults
in param doc (refer
https://github.com/apache/spark/pull/13148#discussion_r63600571) on this PR
since it is pertinent.

Here, Holden raises 2 issues:
1. The Scaladoc contains default values for many params (sometimes in
shared traits). In addition the Scala `Param` itself has the self-contained
`doc` field (typically not containing defaults, since the built-in doc shows
current and default in `explainParam`).
2. The PyDoc only contains the `Param` `doc` field.

(By the way, (1) implies that in cases where the default param value in the
trait is overridden, the Scaladoc is incorrect, but that is another issue).

The result of (2) is that the HTML API doc doesn't look great, e.g.

https://cloud.githubusercontent.com/assets/1036807/15381231/0a937dde-1d7e-11e6-885c-b120679f84ee.png;>

Also, nowhere in the PyDoc are the defaults listed, while in the Scaladoc
they are.

I agree that it would be nice to have the defaults listed in the PyDoc in
some way.
1. One solution is the original approach here, where defaults are put in
the Param doc in a standard way, but stripped out during `explainParams`. This
works but IMO is more prone to breaking in future if people forget to do things
in exactly the correct format. It also doesn't directly solve the problem of
the API doc looking ugly;
2. Another solution is the current approach here, where the attributes are
turned into properties with a docstring (possibly including the default) - this
does solve the problem of nice display in the API doc. The downside here is the
potentially fairly large change to make everything a property, and the code
duplication introduced (though kept to a minimum) and extra boilerplate when
adding new params that could be more error-prone;
3. A third solution is what I've done
[here](https://github.com/mlnick/spark/tree/sphinx-doc-params) as a PoC, which
basically adds the built-in doc as the instance docstring for each Python
`Param`. Then we override the `AttributeDocumenter` in Sphinx to handle it. The
result displays nicely in the API doc (the same as the property approach, but
no defaults are added). The other thing that changes is the `__init__`
docstring is brought back (for some reason the current docs are not showing
that), which means that the defaults are essentially documented there for each
class. In a way this seems more "Pythonic" to me (i.e. Python users are
accustomed to seeing the default arg values in constructer doc, e.g.
sciki-learn).
4. Another option is to do nothing (for now at least), except bring back
the `__init__` docstring. This keeps the ugly-looking `Param` doc, but at least
shows the default args for each class, and is the current behavior. We can do
something like (1) or (3) later (but maybe not (2) during Spark 2.x as it may
be too large a change).
5. A final option is to perhaps document defaults elsewhere (such as the
setter for the param which is usually implemented in the class or a model trait
in Scala).

Let's decide on an approach and make it consistent across the board.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...

2016-05-18 Thread zzcclp

Github user zzcclp commented on the pull request:

https://github.com/apache/spark/pull/13185#issuecomment-220218127
  
ï¼ zsxwing will this pr be merged into branch 1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...

2016-05-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13185


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...

2016-05-18 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/13185#issuecomment-220217816
  
Didn't merge to 1.6 due to the conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...

2016-05-18 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/13185#issuecomment-220217602
  
Thanks. Merging to master, 2.0 and 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220217482
  
Oh, amazing. According to the last Jenkins results. The seven test failures 
in `catalyst` are all of them.
```
[info] *** 7 TESTS FAILED ***
[error] Failed: Total 1656, Failed 7, Errors 0, Passed 1649, Ignored 1
[error] Failed tests:
[error] org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite
[error] org.apache.spark.sql.catalyst.expressions.CastSuite
[error] (catalyst/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 222 s, completed May 18, 2016 8:11:07 PM
```
Anyway, I will handle them in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220217398
  
**[Test build #58837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58837/consoleFull)**
 for PR 12719 at commit 
[`0cb1136`](https://github.com/apache/spark/commit/0cb11361ff70d88ae09a4fd31154999fc9c3efae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-220217381
  
**[Test build #58835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58835/consoleFull)**
 for PR 13186 at commit 
[`23b43d4`](https://github.com/apache/spark/commit/23b43d4c837d762461dd56a62b85cb998919e0ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13167#issuecomment-220217395
  
**[Test build #58836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58836/consoleFull)**
 for PR 13167 at commit 
[`a97e358`](https://github.com/apache/spark/commit/a97e3586b7b856d5a62981ff459f48da8d1128bb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13156#discussion_r63817417
  
--- Diff: 
sql/hivecontext-compatibility/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
 ---
@@ -58,4 +58,16 @@ class HiveContext private[hive](
 sparkSession.sharedState.asInstanceOf[HiveSharedState]
   }
 
+  /**
+   * Invalidate and refresh all the cached the metadata of the given 
table. For performance reasons,
+   * Spark SQL or the external data source library it uses might cache 
certain metadata about a
+   * table, such as the location of blocks. When those change outside of 
Spark SQL, users should
+   * call this function to invalidate the cache.
+   *
+   * @since 1.3.0
+   */
+  def refreshTable(tableName: String): Unit = {
--- End diff --

if `invalidateTable` has different meaning than `refreshTable`, should we 
also add it to `HiveContext`? cc @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220217295
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58834/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220217294
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220217222
  
**[Test build #58834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58834/consoleFull)**
 for PR 12719 at commit 
[`d8257ee`](https://github.com/apache/spark/commit/d8257eef75433fe25aa4fd9c8c387933f23cfd20).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220217246
  
I removed the last test commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-18 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/13186

[SPARK-15397] [SQL] fix string udf locate as hive

## What changes were proposed in this pull request?

in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` 
would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, 
`locate("aa", "aaa", 0)` would yield 1,  `locate("aa", "aaa", 1)` would yield 2 
and  `locate("aa", "aaa", 2)` would yield 0. This results from the different 
understanding of the third parameter in udf `locate`. It means the starting 
index and starts from 1, so when we use 0, the return would always be 0.


## How was this patch tested?

tested with modified `StringExpressionsSuite` and `StringFunctionsSuite`




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark locate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13186


commit 23b43d4c837d762461dd56a62b85cb998919e0ef
Author: Daoyuan Wang 
Date:   2016-05-18T11:30:07Z

fix string udf locate as hive




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220216995
  
Thank you for understanding. I'll try to handle those test issues in 
another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8603][SPARKR] Incorrect file separator ...

2016-05-18 Thread felixcheung

Github user felixcheung commented on the pull request:

https://github.com/apache/spark/pull/13165#issuecomment-220216386
  
Does this apply to other cases:

https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/R/pkg/inst/worker/daemon.R#L22

https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/R/pkg/inst/profile/shell.R#L20

https://github.com/apache/spark/blob/6ab4d9e0c76b69b4d6d5f39037a77bdfb042be19/examples/src/main/r/dataframe.R#L37

(last one is an example)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220216050
  
I'm fine to leave the `resolved` checking in this PR, because the test 
issue is kind of unrelated. But it will be good if we can send another PR to 
fix the test issue, it doesn't make sense to test evaluation of an unresolved 
expression, as it will never happen in real world.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOC][MINOR] ml.feature Scala and Python API s...

2016-05-18 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13159


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [DOC][MINOR] ml.feature Scala and Python API s...

2016-05-18 Thread MLnick

Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/13159#issuecomment-220214631
  
LGTM, thanks @BryanCutler. Merged to master/branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13182#discussion_r63815948
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 ---
@@ -72,9 +72,18 @@ case class BroadcastExchangeExec(
 val beforeCollect = System.nanoTime()
 // Note that we use .executeCollect() because we don't want to 
convert data to Scala types
 val input: Array[InternalRow] = child.executeCollect()
+if (input.length >= (512 << 20)) {
+  throw new SparkException(
+s"Cannot broadcast the table with more than 512 millions rows: 
${input.length} rows")
--- End diff --

Yes, it's not, will update them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/13167#issuecomment-220214210
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13167#discussion_r63815763
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
@@ -70,7 +94,7 @@ case class DeserializeToObject(
  */
 case class SerializeFromObjectExec(
 serializer: Seq[NamedExpression],
-child: SparkPlan) extends UnaryExecNode with CodegenSupport {
+child: SparkPlan) extends UnaryExecNode with ObjectConsumerExec with 
CodegenSupport {
--- End diff --

minor: ObjectConsumerExec already extend UnaryExecNode


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13167#discussion_r63815785
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
@@ -166,10 +187,7 @@ case class MapElementsExec(
 func: AnyRef,
 outputObjAttr: Attribute,
 child: SparkPlan)
-  extends UnaryExecNode with ObjectOperator with CodegenSupport {
-
-  override def output: Seq[Attribute] = outputObjAttr :: Nil
-  override def producedAttributes: AttributeSet = 
AttributeSet(outputObjAttr)
+  extends UnaryExecNode with ObjectProducerExec with ObjectConsumerExec 
with CodegenSupport {
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15381][SQL] physical object operator sh...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13167#discussion_r63815797
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala ---
@@ -141,15 +165,12 @@ case class MapPartitionsExec(
 func: Iterator[Any] => Iterator[Any],
 outputObjAttr: Attribute,
 child: SparkPlan)
-  extends UnaryExecNode with ObjectOperator {
-
-  override def output: Seq[Attribute] = outputObjAttr :: Nil
-  override def producedAttributes: AttributeSet = 
AttributeSet(outputObjAttr)
+  extends UnaryExecNode with ObjectProducerExec with ObjectConsumerExec {
--- End diff --

same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-18 Thread wangyang1992

Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r63815687
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,7 +480,7 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case CausedBy(ex: Exception) if ex.getMessage.contains(s"$name does 
not exist") =>
--- End diff --

The objective here is not to catch all the exceptions but the ones caused 
by the function not existing. In my case, this exception is 
"org.apache.hadoop.hive.ql.metadata.HiveException: 
MetaException(message:NoSuchObjectException(message:Function default.func does 
not exist))" whose root cause is MetaException, but it may vary in different 
situations (not really sure it varies, just conjecture based on previous code. 
See pr #12198 and #12853).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220213832
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220213834
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58832/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15360][Spark-Submit]Should print spark-...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13163#issuecomment-220213672
  
**[Test build #58832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58832/consoleFull)**
 for PR 13163 at commit 
[`b257891`](https://github.com/apache/spark/commit/b257891583865af83559ddefd46d70bf627f88dd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-220213423
  
Hmm. @cloud-fan . What about just using `resolved` checking simply? IMHO, 
it provides just robustness. And, in fact, I'm reluctant to change testsuite 
when adding new feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220212872
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220212875
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58831/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15367] [SQL] Add refreshTable back

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13156#issuecomment-220212724
  
**[Test build #58831 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58831/consoleFull)**
 for PR 13156 at commit 
[`2b773b8`](https://github.com/apache/spark/commit/2b773b823672199a685e765f5345ceb6584eb3d8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveContextCompatibilitySuite extends SparkFunSuite with 
BeforeAndAfterEach `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-15395][Core]Use getHostString to create...

2016-05-18 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/13185#issuecomment-220212769
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...