[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78390/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #78390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78390/testReport)**
 for PR 16677 at commit 
[`8f779ac`](https://github.com/apache/spark/commit/8f779ac74233eea7a1e69e5bde654b370e52f16f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #78390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78390/testReport)**
 for PR 16677 at commit 
[`8f779ac`](https://github.com/apache/spark/commit/8f779ac74233eea7a1e69e5bde654b370e52f16f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@jiangxb1987 Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-06-21 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya Could you please bring this up to date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-21 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@watermen Are you sure `create table t1 using parquet select * from src 
limit 1000;` will invoke `CollectLimitExec.doExecute`?

Seems that the writing is delegated to `FileFormatWriter` which writes the 
rdd from the query to the table. I don't see it calls collect which invokes 
`CollectLimitExec` operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77079/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #77079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77079/testReport)**
 for PR 16677 at commit 
[`55ee6b0`](https://github.com/apache/spark/commit/55ee6b0fb3bc9e6998b4098a369c54a15824e414).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #77079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77079/testReport)**
 for PR 16677 at commit 
[`55ee6b0`](https://github.com/apache/spark/commit/55ee6b0fb3bc9e6998b4098a369c54a15824e414).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77054/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #77054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77054/testReport)**
 for PR 16677 at commit 
[`55ee6b0`](https://github.com/apache/spark/commit/55ee6b0fb3bc9e6998b4098a369c54a15824e414).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #77054 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77054/testReport)**
 for PR 16677 at commit 
[`55ee6b0`](https://github.com/apache/spark/commit/55ee6b0fb3bc9e6998b4098a369c54a15824e414).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76809/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #76809 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76809/testReport)**
 for PR 16677 at commit 
[`867a93d`](https://github.com/apache/spark/commit/867a93d01a7f83f4ea0e43a77bebee09727b96e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #76809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76809/testReport)**
 for PR 16677 at commit 
[`867a93d`](https://github.com/apache/spark/commit/867a93d01a7f83f4ea0e43a77bebee09727b96e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-10 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16677
  
@hvanhovell @cloud-fan We have seen value of this PR in our customer 
scenarios, and that's why we started a discussion in dev list before. And thank 
@viirya to discuss with us and implement it.   
It has been a while (> two months) since your last comments, do you have 
time to review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-02 Thread watermen
Github user watermen commented on the issue:

https://github.com/apache/spark/pull/16677
  
When we create a DataSource table like below
```sql
create table t1 using parquet select * from src limit 1000;
```
It will call `CollectLimitExec.doExecute`, it also use SinglePartition, so 
we should cover this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75699/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #75699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75699/testReport)**
 for PR 16677 at commit 
[`b8a2275`](https://github.com/apache/spark/commit/b8a22755bfdef8f1ab78016aea6914155ada67c1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #75699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75699/testReport)**
 for PR 16677 at commit 
[`b8a2275`](https://github.com/apache/spark/commit/b8a22755bfdef8f1ab78016aea6914155ada67c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75694/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #75694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75694/testReport)**
 for PR 16677 at commit 
[`b8a2275`](https://github.com/apache/spark/commit/b8a22755bfdef8f1ab78016aea6914155ada67c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75688/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #75688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75688/testReport)**
 for PR 16677 at commit 
[`b8a2275`](https://github.com/apache/spark/commit/b8a22755bfdef8f1ab78016aea6914155ada67c1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-04-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #75688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75688/testReport)**
 for PR 16677 at commit 
[`b8a2275`](https://github.com/apache/spark/commit/b8a22755bfdef8f1ab78016aea6914155ada67c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@hvanhovell Thanks for replying! Because this doesn't get feedback from 
committers for a long time, so I have the feeling. It is good to know you will 
take a look in the future. Take your time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya there is interest. I am just lacking the time, I will try to take a 
pass at some point in the next few weeks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
Although this has good improvement on `Limit` operation, looks like there 
is no much interests from the committers...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74086/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #74086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74086/testReport)**
 for PR 16677 at commit 
[`2d37598`](https://github.com/apache/spark/commit/2d37598d20f8a2c536eab4bd6a7e4da26f8bdc61).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74082/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #74086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74086/testReport)**
 for PR 16677 at commit 
[`2d37598`](https://github.com/apache/spark/commit/2d37598d20f8a2c536eab4bd6a7e4da26f8bdc61).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #74082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74082/testReport)**
 for PR 16677 at commit 
[`1a56252`](https://github.com/apache/spark/commit/1a5625222febf99bb89732d2cedd773923d5f6c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-03-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73662/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #73662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73662/testReport)**
 for PR 16677 at commit 
[`1a56252`](https://github.com/apache/spark/commit/1a5625222febf99bb89732d2cedd773923d5f6c6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #73662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73662/testReport)**
 for PR 16677 at commit 
[`1a56252`](https://github.com/apache/spark/commit/1a5625222febf99bb89732d2cedd773923d5f6c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73655/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #73655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73655/testReport)**
 for PR 16677 at commit 
[`45a1fcb`](https://github.com/apache/spark/commit/45a1fcb729cd551fbba1c633af713152fd63c24c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #73655 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73655/testReport)**
 for PR 16677 at commit 
[`45a1fcb`](https://github.com/apache/spark/commit/45a1fcb729cd551fbba1c633af713152fd63c24c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73518/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #73518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73518/testReport)**
 for PR 16677 at commit 
[`45a1fcb`](https://github.com/apache/spark/commit/45a1fcb729cd551fbba1c633af713152fd63c24c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #73518 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73518/testReport)**
 for PR 16677 at commit 
[`45a1fcb`](https://github.com/apache/spark/commit/45a1fcb729cd551fbba1c633af713152fd63c24c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
ping @cloud-fan @rxin @hvanhovell Can you provide some feedback for this? 
Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-15 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
ping @cloud-fan Any suggestions on this? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@waterman But we need to know the number of output in each partition. 
`resultSize` is actually updated at driver with the size of data transmitted 
back to the driver. 

It is doable if we use a collection accumulator for collecting number of 
output. However, as we submit map stage with `submitMapStage`, it is more 
natural to get the `MapOutputStatistics` back from the future object returned.

For the API change, as they are private, I think it should not a big issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread watermen
Github user watermen commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya We'd better don't modify the API and in `TaskMetrics` already has 
`resultSize`, we can add `resultNum` like it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@watermen Thanks for the review. What is the advantage of adding it in 
`TaskMetrics`  instead of `MapStatus`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-08 Thread watermen
Github user watermen commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya LGTM, but can we add the statistics of the number of rows in 
`TaskMetrics` insteads of `MapStatus`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@sujith71955 Thanks for the test! The test number looks promising!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-06 Thread sujith71955
Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya i tested with the above mentioned approach with sample data, it has 
improved the performance almost into 3X
Please find the test report 
Total No of Executers = 3
Total Memory assigned = 66 G
Total Number of cores = 15 core
Number of Partition = 200
Data size = 10745616 (>10 milliion)
Limit value = 1000 (10 million)

query executed:  create destination_table as select * from source_table  
limit 1000;

Time Taken with current implementation with single partition : 383 seconds
**With use map output statistics(spark open source proposed solution) :120 
sec**  which is great!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72373/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #72373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72373/testReport)**
 for PR 16677 at commit 
[`b049cc4`](https://github.com/apache/spark/commit/b049cc4a4c83b308eb9aefa225435471b90a5f02).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #72373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72373/testReport)**
 for PR 16677 at commit 
[`b049cc4`](https://github.com/apache/spark/commit/b049cc4a4c83b308eb9aefa225435471b90a5f02).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-02-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@cloud-fan Compared with the sizes of each block, we only send back an Int 
for each partition. I think the overhead should be very low.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71977/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71977 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71977/testReport)**
 for PR 16677 at commit 
[`4e31bb7`](https://github.com/apache/spark/commit/4e31bb7959cb774b51d6d8662f53a3ad96b4dc49).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class LocalPartitioning(orgPartition: Partitioning, 
numPartitions: Int) extends Partitioning `
  * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode with CodegenSupport `
  * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71977 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71977/testReport)**
 for PR 16677 at commit 
[`4e31bb7`](https://github.com/apache/spark/commit/4e31bb7959cb774b51d6d8662f53a3ad96b4dc49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71972/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71973/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71973 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71973/testReport)**
 for PR 16677 at commit 
[`4e31bb7`](https://github.com/apache/spark/commit/4e31bb7959cb774b51d6d8662f53a3ad96b4dc49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71972/testReport)**
 for PR 16677 at commit 
[`def10e6`](https://github.com/apache/spark/commit/def10e696bed279a240b4454154ce4aea713cf47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71955/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16677
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71955/testReport)**
 for PR 16677 at commit 
[`7f89c30`](https://github.com/apache/spark/commit/7f89c305f8ddd595fd752f7a8c238d23ec796895).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class FakePartitioning(orgPartition: Partitioning, numPartitions: 
Int) extends Partitioning `
  * `case class LocalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode with CodegenSupport `
  * `case class GlobalLimitExec(limit: Int, child: SparkPlan) extends 
UnaryExecNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #71955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71955/testReport)**
 for PR 16677 at commit 
[`7f89c30`](https://github.com/apache/spark/commit/7f89c305f8ddd595fd752f7a8c238d23ec796895).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-01-24 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
also cc @cloud-fan and @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org