date:20170313

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-03-13 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/13599
  
@holdenk  Do you have time to review this ? Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-03-13 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/13599
  
I created a google doc about how to use it, 
https://docs.google.com/document/d/1KB9RYW8_bSeOzwVqZFc_zy_vXqqqctwrU5TROP_16Ds/edit?usp=sharing



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17255#discussion_r105835718
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
@@ -40,18 +40,11 @@ private[sql] object JsonInferSchema {
   json: RDD[T],
   configOptions: JSONOptions,
   createParser: (JsonFactory, T) => JsonParser): StructType = {
-require(configOptions.samplingRatio > 0,
-  s"samplingRatio (${configOptions.samplingRatio}) should be greater 
than 0")
 val shouldHandleCorruptRecord = configOptions.permissive
 val columnNameOfCorruptRecord = configOptions.columnNameOfCorruptRecord
-val schemaData = if (configOptions.samplingRatio > 0.99) {
-  json
-} else {
-  json.sample(withReplacement = false, configOptions.samplingRatio, 1)
-}
--- End diff --

Strictly, maybe this is not directly related with the JIRA. I am willing to 
revert this change back or please let me know if you have a better idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-13 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17255#discussion_r105835549
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
@@ -40,18 +40,11 @@ private[sql] object JsonInferSchema {
   json: RDD[T],
   configOptions: JSONOptions,
   createParser: (JsonFactory, T) => JsonParser): StructType = {
-require(configOptions.samplingRatio > 0,
-  s"samplingRatio (${configOptions.samplingRatio}) should be greater 
than 0")
 val shouldHandleCorruptRecord = configOptions.permissive
 val columnNameOfCorruptRecord = configOptions.columnNameOfCorruptRecord
-val schemaData = if (configOptions.samplingRatio > 0.99) {
-  json
-} else {
-  json.sample(withReplacement = false, configOptions.samplingRatio, 1)
-}
--- End diff --

Because `JsonInferSchema.infer` takes an `RDD[T]` which is the actual 
source to parse JSON strings. In case of whole text, it is 
`RDD[PortableDataStream]` whereas for normal one, it is `RDD[UTF8String]`.

Thing is, it seems there is an advantage of doing the sample operation on 
`Dataset[String]` (not on `RDD`). So, the sample had to be applied onto 
`Dataset[String]` before converting it into `RDD[UTF8String]`.

In a simple view:

- `TextInputJsonDataSource`:

  ```scala
  val json: Dataset[String] = ...
  val sampled: Dataset[String] = JsonUtils.sample(...)
  val rdd: RDD[UTF8String] = ...
  JsonInferSchema.infer(rdd)
  ```

- `WholeFileJsonDataSource`:

  ```scala
  val json: RDD[PortableDataStream] = ...
  val sampled: RDD[PortableDataStream] = JsonUtils.sample(...)
  JsonInferSchema.infer(rdd)
  ```

I could not find a good way to generalize `JsonInferSchema.infer` to take 
both `Dataset` and `RDD` as the source so that keep the logic within here with 
small and clean changes.

If this question is about why it use `Dataset.sample` instead of 
`RDD.sample`, it is suggested in 
https://github.com/apache/spark/pull/17255#issuecomment-285960658.

Up to my knowledge, both use the sample sampler `BernoulliCellSampler` as 
replacements are disabled but for `Dataset` one, it generates the codes. So, I 
thought there might be a bit of benefits.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17267
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17267
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74491/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17267
  
**[Test build #74491 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74491/testReport)**
 for PR 17267 at commit 
[`7b96e97`](https://github.com/apache/spark/commit/7b96e97b60b67cab49f3108ad84759ccb0f643e0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17232
  
**[Test build #74494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74494/testReport)**
 for PR 17232 at commit 
[`850943b`](https://github.com/apache/spark/commit/850943bc8dd558a695e9c1ea6be594aee535c7f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17178: [SPARK-19828][R] Support array type in from_json in R

2017-03-13 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17178
  
seems like `object` is the right term in JSON lango - how about 
`as.json.object`? otherwise `as.json.array` (the opposite) or 
`as.json.object.array` might be good option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17130: [SPARK-19791] [ML] Add doc and example for fpgrowth

2017-03-13 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17130
  
Thanks for the review. I'll wait for 
https://github.com/apache/spark/pull/17283 to be merged first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17260: [SPARK-19921] [SQL] [TEST] Enable end-to-end test...

2017-03-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17260


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17272: [SPARK-19724][SQL]create a managed table with an ...

2017-03-13 Thread windpiger

Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17272#discussion_r105834140
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1901,7 +1901,7 @@ def test_list_tables(self):
 self.assertEquals(len(tables), 2)
 self.assertEquals(len(tablesSomeDb), 2)
 self.assertEquals(tables[0], Table(
-name="tab1",
+name="t1",
--- End diff --

python test failed 
```
the 
location('file:/home/jenkins/workspace/SparkPullRequestBuilder/spark-warehouse/tab1')
 of table('`default`.`tab1`') already exists.;
```
the location of tab1 does not deleted , I have tried to found out in which 
test case it forget to delete it(search all test cases containing `tab1` and 
run it), but they are all ok to delete the location of tab1, so here we change 
the table name to work around it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17260: [SPARK-19921] [SQL] [TEST] Enable end-to-end testing usi...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17260
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17272
  
**[Test build #74493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74493/testReport)**
 for PR 17272 at commit 
[`2ac70b4`](https://github.com/apache/spark/commit/2ac70b4674e13a73ddec1b6c54f59b62fa67100a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17285
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74485/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15009
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74492/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15009
  
**[Test build #74492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74492/testReport)**
 for PR 15009 at commit 
[`bc99435`](https://github.com/apache/spark/commit/bc994356987a2e6e321a0f8f23ffad0797de22d5).
 * This patch **fails Scala style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15009
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17285
  
**[Test build #74485 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74485/testReport)**
 for PR 17285 at commit 
[`c199469`](https://github.com/apache/spark/commit/c1994696172192f0808cd210ed7f453ec2e7ef7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SQLConf extends Serializable with Logging `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17285
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17285
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74484/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17285
  
**[Test build #74484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74484/testReport)**
 for PR 17285 at commit 
[`bbf0211`](https://github.com/apache/spark/commit/bbf02110a9232a545370c05dcdac7840f5b96af7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15009
  
**[Test build #74492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74492/testReport)**
 for PR 15009 at commit 
[`bc99435`](https://github.com/apache/spark/commit/bc994356987a2e6e321a0f8f23ffad0797de22d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17267
  
**[Test build #74491 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74491/testReport)**
 for PR 17267 at commit 
[`7b96e97`](https://github.com/apache/spark/commit/7b96e97b60b67cab49f3108ad84759ccb0f643e0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17267
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74490/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17267
  
**[Test build #74490 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74490/testReport)**
 for PR 17267 at commit 
[`6c55e02`](https://github.com/apache/spark/commit/6c55e022660e56feff882c8feaa7710f0b0aee69).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17267
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17267
  
**[Test build #74490 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74490/testReport)**
 for PR 17267 at commit 
[`6c55e02`](https://github.com/apache/spark/commit/6c55e022660e56feff882c8feaa7710f0b0aee69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17241: [SPARK-19877][SQL] Restrict the nested level of a view

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17241
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74483/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17241: [SPARK-19877][SQL] Restrict the nested level of a view

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17241
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17241: [SPARK-19877][SQL] Restrict the nested level of a view

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17241
  
**[Test build #74483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74483/testReport)**
 for PR 17241 at commit 
[`5c91ab7`](https://github.com/apache/spark/commit/5c91ab7fb1dede638b246e1fb2d7b7018e0b284f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13116: [SPARK-15324] [SQL] Add the takeSample function to the D...

2017-03-13 Thread burness

Github user burness commented on the issue:

https://github.com/apache/spark/pull/13116
  
@HyukjinKwon  It is too hard to solve the OOM, I'm so sorry


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17250: [SPARK-19911][STREAMING] Add builder interface for Kines...

2017-03-13 Thread budde

Github user budde commented on the issue:

https://github.com/apache/spark/pull/17250
  
@brkyvz I think if we're eliminating the constructor arguments then the 
second approach you've proposed might make more sense. I can't think of 
anything cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-03-13 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16954#discussion_r105831346
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
@@ -123,19 +123,36 @@ case class Not(child: Expression)
  */
 @ExpressionDescription(
   usage = "expr1 _FUNC_(expr2, expr3, ...) - Returns true if `expr` equals 
to any valN.")
-case class In(value: Expression, list: Seq[Expression]) extends Predicate
-with ImplicitCastInputTypes {
+case class In(value: Expression, list: Seq[Expression]) extends Predicate {
 
   require(list != null, "list should not be null")
-
-  override def inputTypes: Seq[AbstractDataType] = value.dataType +: 
list.map(_.dataType)
-
   override def checkInputDataTypes(): TypeCheckResult = {
-if (list.exists(l => l.dataType != value.dataType)) {
-  TypeCheckResult.TypeCheckFailure(
-"Arguments must be same type")
-} else {
-  TypeCheckResult.TypeCheckSuccess
+list match {
+  case ListQuery(sub, _, _) :: Nil =>
+val valExprs = value match {
+  case cns: CreateNamedStruct => cns.valExprs
+  case expr => Seq(expr)
+}
+val isTypeMismatched = valExprs.zip(sub.output).exists {
+  case (l, r) => l.dataType != r.dataType
+}
+if (isTypeMismatched) {
--- End diff --

@hvanhovell The new error message looks like following. Does this look okay 
to you ?

```
Error in query: cannot resolve '(named_struct('c1', at1.`c1`, 'c2', 
at1.`c2`) IN (listquery()))' due to data type mismatch: 
The data type of one or more elements in the left hand side of an IN 
subquery
is not compatible with the data type of the output of the subquery
Mismatched columns:
[(at1.`c1`:decimal(10,0), at2.`c1`:timestamp), (at1.`c2`:timestamp, 
at2.`c2`:decimal(10,0))]
Left side:
[decimal(10,0), timestamp].
Right side:
[timestamp, decimal(10,0)].
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17109
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74488/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17109
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17109
  
**[Test build #74488 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74488/testReport)**
 for PR 17109 at commit 
[`cbb784a`](https://github.com/apache/spark/commit/cbb784a1a278f2d0db5c5122d52c30dfd26fc3db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MesosSchedulerBackendUtilSuite extends SparkFunSuite `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17277
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74482/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ....

2017-03-13 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16373#discussion_r105831078
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -925,6 +925,26 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  test("show table extended ... partition") {
--- End diff --

Okay, I'll update that later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17277
  
**[Test build #74482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74482/testReport)**
 for PR 17277 at commit 
[`8896507`](https://github.com/apache/spark/commit/889650770345d93d520007a39a2f140350c3b104).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ....

2017-03-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16373#discussion_r105830656
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -925,6 +925,26 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  test("show table extended ... partition") {
--- End diff --

Then, you just need to improve the function `getNormalizedResult` in 
SQLQueryTestSuite to mask it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74480/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17277
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17277
  
**[Test build #74480 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74480/testReport)**
 for PR 17277 at commit 
[`a04e7e5`](https://github.com/apache/spark/commit/a04e7e5b22105188d076010bf9c6adffdcfa1f7e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16954
  
**[Test build #74489 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74489/testReport)**
 for PR 16954 at commit 
[`19cdbb0`](https://github.com/apache/spark/commit/19cdbb040ccf2e74e1271ca33e6842607c1e0760).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17109
  
**[Test build #74488 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74488/testReport)**
 for PR 17109 at commit 
[`cbb784a`](https://github.com/apache/spark/commit/cbb784a1a278f2d0db5c5122d52c30dfd26fc3db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-03-13 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16954#discussion_r105830367
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -365,17 +368,73 @@ object TypeCoercion {
   }
 
   /**
-   * Convert the value and in list expressions to the common operator type
-   * by looking at all the argument types and finding the closest one that
-   * all the arguments can be cast to. When no common operator type is 
found
-   * the original expression will be returned and an Analysis Exception 
will
-   * be raised at type checking phase.
+   * Handles type coercion for both IN expression with subquery and IN
+   * expressions without subquery.
+   * 1. In the first case, find the common type by comparing the left hand 
side (LHS)
+   *expression types against corresponding right hand side (RHS) 
expression derived
+   *from the subquery expression's plan output. Inject appropriate 
casts in the
+   *LHS and RHS side of IN expression.
+   *
+   * 2. In the second case, convert the value and in list expressions to 
the
+   *common operator type by looking at all the argument types and 
finding
+   *the closest one that all the arguments can be cast to. When no 
common
+   *operator type is found the original expression will be returned 
and an
+   *Analysis Exception will be raised at the type checking phase.
*/
   object InConversion extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
+  // Handle type casting required between value expression and 
subquery output
+  // in IN subquery.
+  case i @ In(a, Seq(ListQuery(sub, children, exprId))) if !i.resolved 
=>
+// LHS is the value expression of IN subquery.
+val lhs = a match {
+  // Multi columns in IN clause is represented as a 
CreateNamedStruct.
+  // flatten the named struct to get the list of expressions.
+  case cns: CreateNamedStruct => cns.valExprs
+  case expr => Seq(expr)
+}
+
+// RHS is the subquery output.
+val rhs = sub.output
+require(lhs.length == rhs.length)
+
+val commonTypes = lhs.zip(rhs).flatMap { case (l, r) =>
+  findCommonTypeForBinaryComparison(l.dataType, r.dataType) match {
+case d @ Some(_) => d
+case _ => findTightestCommonType(l.dataType, r.dataType)
+  }
+}
+
+// The number of columns/expressions must match between LHS and 
RHS of an
+// IN subquery expression.
+if (commonTypes.length == lhs.length) {
+  val castedRhs = rhs.zip(commonTypes).map {
+case (e, dt) if e.dataType != dt => Alias(Cast(e, dt), 
e.name)()
+case (e, _) => e
+  }
+  val castedLhs = lhs.zip(commonTypes).map {
+case (e, dt) if e.dataType != dt => Cast(e, dt)
+case (e, _) => e
+  }
+
+  // Before constructing the In expression, wrap the multi values 
in LHS
+  // in a CreatedNamedStruct.
+  val newLhs = a match {
--- End diff --

@hvanhovell Thanks a lot. You are right, we don't care about the names. 
This looks much better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

2017-03-13 Thread hhbyyh

Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/13656
  
Close this and add the support to ml.fpm.  
https://github.com/apache/spark/pull/17280 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

2017-03-13 Thread hhbyyh

Github user hhbyyh closed the pull request at:

https://github.com/apache/spark/pull/13656


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17255: [SPARK-19918][SQL] Use TextFileFormat in implemen...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17255#discussion_r105830235
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala
 ---
@@ -40,18 +40,11 @@ private[sql] object JsonInferSchema {
   json: RDD[T],
   configOptions: JSONOptions,
   createParser: (JsonFactory, T) => JsonParser): StructType = {
-require(configOptions.samplingRatio > 0,
-  s"samplingRatio (${configOptions.samplingRatio}) should be greater 
than 0")
 val shouldHandleCorruptRecord = configOptions.permissive
 val columnNameOfCorruptRecord = configOptions.columnNameOfCorruptRecord
-val schemaData = if (configOptions.samplingRatio > 0.99) {
-  json
-} else {
-  json.sample(withReplacement = false, configOptions.samplingRatio, 1)
-}
--- End diff --

why move the sample logic out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16867
  
@squito 
Thanks a lot for comments. I've refined :):)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ... PART...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16373
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74481/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ... PART...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16373
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ... PART...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16373
  
**[Test build #74481 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74481/testReport)**
 for PR 16373 at commit 
[`b46d771`](https://github.com/apache/spark/commit/b46d7717aa823f839d4790b097fd841440d70660).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17279: Added dayofweek function to Functions.scala

2017-03-13 Thread RishikeshTeke

Github user RishikeshTeke closed the pull request at:

https://github.com/apache/spark/pull/17279


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15628
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74479/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15628
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15628
  
**[Test build #74479 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74479/consoleFull)**
 for PR 15628 at commit 
[`254b9fb`](https://github.com/apache/spark/commit/254b9fb07a35d6927fefe1a4abe6f8a24ae81d4a).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17267
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74487/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17267
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17267
  
**[Test build #74487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74487/testReport)**
 for PR 17267 at commit 
[`5bc1d8e`](https://github.com/apache/spark/commit/5bc1d8e75b3690b911cf88bcf2fba561bc63e354).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13116: [SPARK-15324] [SQL] Add the takeSample function to the D...

2017-03-13 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13116
  
Hi @burness, what's the state of this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17267: [SPARK-19926][PYSPARK] Make pyspark exception mor...

2017-03-13 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17267#discussion_r105827541
  
--- Diff: python/pyspark/sql/utils.py ---
@@ -24,7 +24,7 @@ def __init__(self, desc, stackTrace):
 self.stackTrace = stackTrace
 
 def __str__(self):
-return repr(self.desc)
+return str(self.desc)
--- End diff --

based on latest commit:

```
>>> df.select("ì")
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/sql/dataframe.py", line 992, in select
jdf = self._jdf.select(self._jcols(*cols))
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", 
line 1133, in __call__
  File ".../spark/python/pyspark/sql/utils.py", line 75, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException
: cannot resolve '`ì`' given input columns: [age, name];;
'Project ['ì]
+- Relation[age#0L,name#1] json


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17267: [SPARK-19926][PYSPARK] Make pyspark exception more reada...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17267
  
**[Test build #74487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74487/testReport)**
 for PR 17267 at commit 
[`5bc1d8e`](https://github.com/apache/spark/commit/5bc1d8e75b3690b911cf88bcf2fba561bc63e354).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ....

2017-03-13 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16373#discussion_r105827253
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -925,6 +925,26 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  test("show table extended ... partition") {
--- End diff --

Yes it works, but it outputs the absolute path for `Location`, so the test 
suite will fail on another environment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ....

2017-03-13 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16373#discussion_r105826784
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -925,6 +925,26 @@ class DDLSuite extends QueryTest with SharedSQLContext 
with BeforeAndAfterEach {
 }
   }
 
+  test("show table extended ... partition") {
--- End diff --

If we change `hiveResultString` to
```
case command @ ExecutedCommandExec(s: ShowTablesCommand) if 
!s.isExtended =>
  command.executeCollect().map(_.getString(1))
```

I did a try. It works. Below is the output.


```

-- !query 22
SHOW TABLE EXTENDED LIKE 'show_t1' PARTITION(c='Ch', d=1)
-- !query 22 schema

struct
-- !query 22 output
showdb  show_t1 false   CatalogPartition(
Partition Values: [c=Ch, d=1]
Storage(Location: 
file:/Users/xiao/IdeaProjects/sparkDelivery/sql/core/spark-warehouse/showdb.db/show_t1/c=Ch/d=1)
Partition Parameters:{})
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15628: [SPARK-17471][ML] Add compressed method to ML matrices

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15628
  
**[Test build #74486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74486/testReport)**
 for PR 15628 at commit 
[`baa8c9d`](https://github.com/apache/spark/commit/baa8c9daff8e405575c1c733e4001a0c1ccb6796).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17175: [SPARK-19931][SQL] InMemoryTableScanExec should r...

2017-03-13 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17175#discussion_r105825600
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -41,11 +41,31 @@ case class InMemoryTableScanExec(
 
   override def output: Seq[Attribute] = attributes
 
+  private def updateAttribute(expr: Expression, attrMap: 
AttributeMap[Attribute]): Expression =
--- End diff --

then when processing `outputOrdering`, we will create `attrMap` many times.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17175: [SPARK-19931][SQL] InMemoryTableScanExec should r...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17175#discussion_r105825397
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -41,11 +41,31 @@ case class InMemoryTableScanExec(
 
   override def output: Seq[Attribute] = attributes
 
+  private def updateAttribute(expr: Expression, attrMap: 
AttributeMap[Attribute]): Expression =
--- End diff --

we can create the `attrMap` in this method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17285
  
**[Test build #74485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74485/testReport)**
 for PR 17285 at commit 
[`c199469`](https://github.com/apache/spark/commit/c1994696172192f0808cd210ed7f453ec2e7ef7d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17265: [SPARK-19924] [SQL] Handle InvocationTargetExcept...

2017-03-13 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17265


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17265: [SPARK-19924] [SQL] Handle InvocationTargetException for...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17265
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to sql/cat...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17285
  
**[Test build #74484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74484/testReport)**
 for PR 17285 at commit 
[`bbf0211`](https://github.com/apache/spark/commit/bbf02110a9232a545370c05dcdac7840f5b96af7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17285: [SPARK-19944][SQL] Move SQLConf from sql/core to ...

2017-03-13 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/17285

[SPARK-19944][SQL] Move SQLConf from sql/core to sql/catalyst

## What changes were proposed in this pull request?
This patch moves SQLConf from sql/core to sql/catalyst. To minimize the 
changes, the patch used type alias to still keep CatalystConf (as a type alias) 
and SimpleCatalystConf (as a concrete class that extends SQLConf).

Motivation for the change is that it is pretty weird to have SQLConf only 
in sql/core and then we have to duplicate config options that impact 
optimizer/analyzer in sql/catalyst using CatalystConf.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-19944

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17285.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17285


commit bbf02110a9232a545370c05dcdac7840f5b96af7
Author: Reynold Xin 
Date:   2017-03-14T04:01:17Z

[SPARK-19944][SQL] Move SQLConf from sql/core to sql/catalyst




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17241: [SPARK-19877][SQL] Restrict the nested level of a view

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17241
  
**[Test build #74483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74483/testReport)**
 for PR 17241 at commit 
[`5c91ab7`](https://github.com/apache/spark/commit/5c91ab7fb1dede638b246e1fb2d7b7018e0b284f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17270: [SPARK-19929] [SQL] Showing Hive Managed table's LOATION...

2017-03-13 Thread ouyangxiaochen

Github user ouyangxiaochen commented on the issue:

https://github.com/apache/spark/pull/17270
  
@gatorsmile cc ,is it reasonable? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/17179#discussion_r105822080
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala ---
@@ -61,25 +65,50 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
  *  - After that, if `update(newState)` is called, then `exists()` will 
again return `true`,
  *`get()` and `getOption()`will return the updated value.
  *
+ * Important points to note about using `KeyedStateTimeout`.
+ *  - The timeout type is a global param across all the keys (set as 
`timeout` param in
+ *`[map|flatMap]GroupsWithState`, but the exact timeout duration is 
configurable per key
+ *(by calling `setTimeout...()` in `KeyedState`).
+ *  - When the timeout occurs for a key, the function is called with no 
values, and
+ *`KeyedState.isTimingOut()` set to true.
+ *  - The timeout is reset for key every time the function is called on 
the key, that is,
+ *when the key has new data, or the key has timed out. So the user has 
to set the timeout
+ *duration every time the function is called, otherwise there will not 
be any timeout set.
+ *  - Guarantees provided on processing-time-based timeout of key, when 
timeout duration is D ms:
+ *- Timeout will never be called before real clock time has advanced 
by D ms
+ *- Timeout will be called eventually when there is a trigger in the 
query
+ *  (i.e. after D ms). So there is a no strict upper bound on when the 
timeout would occur.
+ *  For example, the trigger interval of the query will affect when 
the timeout is actually hit.
+ *  If there is no data in the stream (for any key) for a while, then 
their will not be
+ *  any trigger and timeout will not be hit until there is data.
+ *
  * Scala example of using KeyedState in `mapGroupsWithState`:
  * {{{
  * // A mapping function that maintains an integer state for string keys 
and returns a string.
--- End diff --

Could you update this comment to describe the timeout behavior of the 
function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/17179#discussion_r105821698
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -298,12 +368,14 @@ class KeyValueGroupedDataset[K, V] private[sql](
* @param outputMode The output mode of the function.
*
* See [[Encoder]] for more details on what types are encodable to Spark 
SQL.
-   * @since 2.1.1
+   * @since 2.2.0
*/
   @Experimental
   @InterfaceStability.Evolving
   def flatMapGroupsWithState[S: Encoder, U: Encoder](
-  func: (K, Iterator[V], KeyedState[S]) => Iterator[U], outputMode: 
OutputMode): Dataset[U] = {
+  func: (K, Iterator[V], KeyedState[S]) => Iterator[U],
--- End diff --

Another option here would put the function at the end so that you could do 
this:

```scala
df.flatMapGroupWithState(Append) { (key, iter, state: KeyedState[Int]) =>
   ...
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/17179#discussion_r105823059
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
 ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.streaming
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{Ascending, Attribute, 
AttributeReference, Expression, Literal, SortOrder, SpecificInternalRow, 
UnsafeProjection, UnsafeRow}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalKeyedState, 
ProcessingTimeTimeout}
+import 
org.apache.spark.sql.catalyst.plans.physical.{ClusteredDistribution, 
Distribution, Partitioning}
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.streaming.state._
+import org.apache.spark.sql.streaming.{KeyedStateTimeout, OutputMode}
+import org.apache.spark.sql.types.{BooleanType, IntegerType}
+import org.apache.spark.util.CompletionIterator
+
+/**
+ * Physical operator for executing `FlatMapGroupsWithState.`
+ *
+ * @param func function called on each group
+ * @param keyDeserializer used to extract the key object for each group.
+ * @param valueDeserializer used to extract the items in the iterator from 
an input row.
+ * @param groupingAttributes used to group the data
+ * @param dataAttributes used to read the data
+ * @param outputObjAttr used to define the output object
+ * @param stateEncoder used to serialize/deserialize state before calling 
`func`
+ * @param outputMode the output mode of `func`
+ * @param timeout used to timeout groups that have not received data in a 
while
+ * @param batchTimestampMs processing timestamp of the current batch.
+ */
+case class FlatMapGroupsWithStateExec(
+func: (Any, Iterator[Any], LogicalKeyedState[Any]) => Iterator[Any],
+keyDeserializer: Expression,
+valueDeserializer: Expression,
+groupingAttributes: Seq[Attribute],
+dataAttributes: Seq[Attribute],
+outputObjAttr: Attribute,
+stateId: Option[OperatorStateId],
+stateEncoder: ExpressionEncoder[Any],
+outputMode: OutputMode,
+timeout: KeyedStateTimeout,
+batchTimestampMs: Long,
+child: SparkPlan) extends UnaryExecNode with ObjectProducerExec with 
StateStoreWriter {
+
+  private val isTimeoutEnabled = timeout == ProcessingTimeTimeout
+  private val timestampTimeoutAttribute =
+AttributeReference("timeoutTimestamp", dataType = IntegerType, 
nullable = false)()
+  private val stateExistsAttribute =
+AttributeReference("stateExists", dataType = BooleanType, nullable = 
false)()
+  private val stateAttributes: Seq[Attribute] = {
+val encoderSchemaAttributes = stateEncoder.schema.toAttributes
+if (isTimeoutEnabled) {
+  encoderSchemaAttributes :+ stateExistsAttribute :+ 
timestampTimeoutAttribute
+} else encoderSchemaAttributes
+  }
+
+  import KeyedStateImpl._
+  override def outputPartitioning: Partitioning = child.outputPartitioning
--- End diff --

This is not true, right?  They could be outputting whatever they want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/17179#discussion_r105822317
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala ---
@@ -61,25 +65,50 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
  *  - After that, if `update(newState)` is called, then `exists()` will 
again return `true`,
  *`get()` and `getOption()`will return the updated value.
  *
+ * Important points to note about using `KeyedStateTimeout`.
+ *  - The timeout type is a global param across all the keys (set as 
`timeout` param in
+ *`[map|flatMap]GroupsWithState`, but the exact timeout duration is 
configurable per key
+ *(by calling `setTimeout...()` in `KeyedState`).
+ *  - When the timeout occurs for a key, the function is called with no 
values, and
+ *`KeyedState.isTimingOut()` set to true.
+ *  - The timeout is reset for key every time the function is called on 
the key, that is,
+ *when the key has new data, or the key has timed out. So the user has 
to set the timeout
+ *duration every time the function is called, otherwise there will not 
be any timeout set.
+ *  - Guarantees provided on processing-time-based timeout of key, when 
timeout duration is D ms:
+ *- Timeout will never be called before real clock time has advanced 
by D ms
+ *- Timeout will be called eventually when there is a trigger in the 
query
+ *  (i.e. after D ms). So there is a no strict upper bound on when the 
timeout would occur.
+ *  For example, the trigger interval of the query will affect when 
the timeout is actually hit.
+ *  If there is no data in the stream (for any key) for a while, then 
their will not be
+ *  any trigger and timeout will not be hit until there is data.
--- End diff --

How hard to remove this limitation?  It seems like its very hard to build 
reliable monitoring applications on this API unless we fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/17179#discussion_r105822109
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala ---
@@ -61,25 +65,50 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LogicalKeyedState
  *  - After that, if `update(newState)` is called, then `exists()` will 
again return `true`,
  *`get()` and `getOption()`will return the updated value.
  *
+ * Important points to note about using `KeyedStateTimeout`.
+ *  - The timeout type is a global param across all the keys (set as 
`timeout` param in
+ *`[map|flatMap]GroupsWithState`, but the exact timeout duration is 
configurable per key
+ *(by calling `setTimeout...()` in `KeyedState`).
+ *  - When the timeout occurs for a key, the function is called with no 
values, and
+ *`KeyedState.isTimingOut()` set to true.
+ *  - The timeout is reset for key every time the function is called on 
the key, that is,
+ *when the key has new data, or the key has timed out. So the user has 
to set the timeout
+ *duration every time the function is called, otherwise there will not 
be any timeout set.
+ *  - Guarantees provided on processing-time-based timeout of key, when 
timeout duration is D ms:
+ *- Timeout will never be called before real clock time has advanced 
by D ms
+ *- Timeout will be called eventually when there is a trigger in the 
query
+ *  (i.e. after D ms). So there is a no strict upper bound on when the 
timeout would occur.
+ *  For example, the trigger interval of the query will affect when 
the timeout is actually hit.
+ *  If there is no data in the stream (for any key) for a while, then 
their will not be
+ *  any trigger and timeout will not be hit until there is data.
+ *
  * Scala example of using KeyedState in `mapGroupsWithState`:
  * {{{
  * // A mapping function that maintains an integer state for string keys 
and returns a string.
  * def mappingFunction(key: String, value: Iterator[Int], state: 
KeyedState[Int]): String = {
- *   // Check if state exists
- *   if (state.exists) {
- * val existingState = state.get  // Get the existing state
- * val shouldRemove = ... // Decide whether to remove the state
+ *
+ *   if (state.isTimingOut) {// If called when timing out, 
remove the state
+ * state.remove()
+ *
+ *   } else if (state.exists) {  // If state exists, use it 
for processing
+ * val existingState = state.get // Get the existing state
+ * val shouldRemove = ...// Decide whether to remove 
the state
  * if (shouldRemove) {
- *   state.remove() // Remove the state
+ *   state.remove()  // Remove the state
+ *
  * } else {
  *   val newState = ...
- *   state.update(newState)// Set the new state
+ *   state.update(newState)  // Set the new state
  * }
+ *
  *   } else {
  * val initialState = ...
- * state.update(initialState)  // Set the initial state
+ * state.update(initialState)// Set the initial state
  *   }
- *   ... // return something
+ *   state.setTimeoutDuration("1 hour")  // Set the timeout
--- End diff --

Does this set a timeout on a removed state?  What does that do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17179: [SPARK-19067][SS] Processing-time-based timeout i...

2017-03-13 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/17179#discussion_r105821496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -249,6 +250,43 @@ class KeyValueGroupedDataset[K, V] private[sql](
 dataAttributes,
 OutputMode.Update,
 isMapGroupsWithState = true,
+KeyedStateTimeout.none,
+child = logicalPlan))
+  }
+
+  /**
+   * ::Experimental::
+   * (Scala-specific)
+   * Applies the given function to each group of data, while maintaining a 
user-defined per-group
+   * state. The result Dataset will represent the objects returned by the 
function.
+   * For a static batch Dataset, the function will be invoked once per 
group. For a streaming
+   * Dataset, the function will be invoked for each group repeatedly in 
every trigger, and
+   * updates to each group's state will be saved across invocations.
+   * See [[org.apache.spark.sql.streaming.KeyedState]] for more details.
+   *
+   * @tparam S The type of the user-defined state. Must be encodable to 
Spark SQL types.
+   * @tparam U The type of the output objects. Must be encodable to Spark 
SQL types.
+   * @param func Function to be called on every group.
+   * @param timeout Timeout information for groups that do not receive 
data for a while
+   *
+   * See [[Encoder]] for more details on what types are encodable to Spark 
SQL.
+   * @since 2.2.0
+   */
+  @Experimental
+  @InterfaceStability.Evolving
+  def mapGroupsWithState[S: Encoder, U: Encoder](
+  func: (K, Iterator[V], KeyedState[S]) => U,
+  timeout: KeyedStateTimeout): Dataset[U] = {
--- End diff --

`timeoutType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17240: [SPARK-19915][SQL] Improve join reorder: simplify...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17240#discussion_r105823209
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -122,46 +119,48 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
  * level 3: p({A, B, C, D})
  * where p({A, B, C, D}) is the final output plan.
  *
- * For cost evaluation, since physical costs for operators are not 
available currently, we use
- * cardinalities and sizes to compute costs.
+ * To evaluate cost for a given plan, we calculate the sum of 
cardinalities for all intermediate
+ * joins in the plan.
  */
 object JoinReorderDP extends PredicateHelper {
 
   def search(
   conf: CatalystConf,
   items: Seq[LogicalPlan],
-  conditions: Set[Expression],
-  topOutput: AttributeSet): Option[LogicalPlan] = {
+  conditions: Set[Expression]): Option[LogicalPlan] = {
 
 // Level i maintains all found plans for i + 1 items.
 // Create the initial plans: each plan is a single item with zero cost.
-val itemIndex = items.zipWithIndex
+val itemIndex = items.zipWithIndex.map(_.swap).toMap
 val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
-  case (item, id) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 
0))
-}.toMap)
+  case (id, item) => Set(id) -> JoinPlan(Set(id), item, cost = 0)
+})
 
-for (lev <- 1 until items.length) {
+while (foundPlans.size < items.length && foundPlans.last.size > 1) {
   // Build plans for the next level.
-  foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
+  foundPlans += searchLevel(foundPlans, conf, conditions)
 }
 
-val plansLastLevel = foundPlans(items.length - 1)
-if (plansLastLevel.isEmpty) {
-  // Failed to find a plan, fall back to the original plan
-  None
-} else {
-  // There must be only one plan at the last level, which contains all 
items.
-  assert(plansLastLevel.size == 1 && plansLastLevel.head._1.size == 
items.length)
-  Some(plansLastLevel.head._2.plan)
+// Find the best plan
+assert(foundPlans.last.size <= 1)
+val bestJoinPlan = foundPlans.last.headOption
--- End diff --

and what if the last level has more than one entries? shall we pick the 
best among them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74476/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #74476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74476/testReport)**
 for PR 16867 at commit 
[`5aa2fcf`](https://github.com/apache/spark/commit/5aa2fcf8c244e4503302053a98ef12c7d5c80878).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17240: [SPARK-19915][SQL] Improve join reorder: simplify...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17240#discussion_r105822819
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -122,46 +119,48 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
  * level 3: p({A, B, C, D})
  * where p({A, B, C, D}) is the final output plan.
  *
- * For cost evaluation, since physical costs for operators are not 
available currently, we use
- * cardinalities and sizes to compute costs.
+ * To evaluate cost for a given plan, we calculate the sum of 
cardinalities for all intermediate
+ * joins in the plan.
  */
 object JoinReorderDP extends PredicateHelper {
 
   def search(
   conf: CatalystConf,
   items: Seq[LogicalPlan],
-  conditions: Set[Expression],
-  topOutput: AttributeSet): Option[LogicalPlan] = {
+  conditions: Set[Expression]): Option[LogicalPlan] = {
 
 // Level i maintains all found plans for i + 1 items.
 // Create the initial plans: each plan is a single item with zero cost.
-val itemIndex = items.zipWithIndex
+val itemIndex = items.zipWithIndex.map(_.swap).toMap
 val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
-  case (item, id) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 
0))
-}.toMap)
+  case (id, item) => Set(id) -> JoinPlan(Set(id), item, cost = 0)
+})
 
-for (lev <- 1 until items.length) {
+while (foundPlans.size < items.length && foundPlans.last.size > 1) {
   // Build plans for the next level.
-  foundPlans += searchLevel(foundPlans, conf, conditions, topOutput)
+  foundPlans += searchLevel(foundPlans, conf, conditions)
 }
 
-val plansLastLevel = foundPlans(items.length - 1)
-if (plansLastLevel.isEmpty) {
-  // Failed to find a plan, fall back to the original plan
-  None
-} else {
-  // There must be only one plan at the last level, which contains all 
items.
-  assert(plansLastLevel.size == 1 && plansLastLevel.head._1.size == 
items.length)
-  Some(plansLastLevel.head._2.plan)
+// Find the best plan
+assert(foundPlans.last.size <= 1)
+val bestJoinPlan = foundPlans.last.headOption
--- End diff --

what if the last level has 0 entry but the previous level has some?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74475/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16867
  
**[Test build #74475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74475/testReport)**
 for PR 16867 at commit 
[`318a172`](https://github.com/apache/spark/commit/318a172130bd84c0f36494f839a87b86c6750f66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17240: [SPARK-19915][SQL] Improve join reorder: simplify...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17240#discussion_r105822517
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -122,46 +119,48 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
  * level 3: p({A, B, C, D})
  * where p({A, B, C, D}) is the final output plan.
  *
- * For cost evaluation, since physical costs for operators are not 
available currently, we use
- * cardinalities and sizes to compute costs.
+ * To evaluate cost for a given plan, we calculate the sum of 
cardinalities for all intermediate
+ * joins in the plan.
  */
 object JoinReorderDP extends PredicateHelper {
 
   def search(
   conf: CatalystConf,
   items: Seq[LogicalPlan],
-  conditions: Set[Expression],
-  topOutput: AttributeSet): Option[LogicalPlan] = {
+  conditions: Set[Expression]): Option[LogicalPlan] = {
 
 // Level i maintains all found plans for i + 1 items.
 // Create the initial plans: each plan is a single item with zero cost.
-val itemIndex = items.zipWithIndex
+val itemIndex = items.zipWithIndex.map(_.swap).toMap
 val foundPlans = mutable.Buffer[JoinPlanMap](itemIndex.map {
-  case (item, id) => Set(id) -> JoinPlan(Set(id), item, Set(), Cost(0, 
0))
-}.toMap)
+  case (id, item) => Set(id) -> JoinPlan(Set(id), item, cost = 0)
+})
 
-for (lev <- 1 until items.length) {
+while (foundPlans.size < items.length && foundPlans.last.size > 1) {
--- End diff --

add some comments to explain why we can stop when the last level has less 
than 1 entry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17233: [SPARK-11569][ML] Fix StringIndexer to handle null value...

2017-03-13 Thread crackcell

Github user crackcell commented on the issue:

https://github.com/apache/spark/pull/17233
  
@jkbradley Hi, I have made some updates according to your comments, please 
review it again. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17240: [SPARK-19915][SQL] Improve join reorder: simplify...

2017-03-13 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17240#discussion_r105821744
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
@@ -87,8 +84,8 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
   val replacedLeft = replaceWithOrderedJoin(left)
   val replacedRight = replaceWithOrderedJoin(right)
   OrderedJoin(j.copy(left = replacedLeft, right = replacedRight))
-case p @ Project(_, join) =>
-  p.copy(child = replaceWithOrderedJoin(join))
+case p @ Project(projectList, j: Join) =>
--- End diff --

now the result of join reordering won't have project, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17278: [SPARK-19933][SQL] Do not change output of a subq...

2017-03-13 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17278#discussion_r105821394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -140,7 +140,8 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog, conf: CatalystConf)
   object OptimizeSubqueries extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
   case s: SubqueryExpression =>
-s.withNewPlan(Optimizer.this.execute(s.plan))
+val ReturnAnswer(newPlan) = 
Optimizer.this.execute(ReturnAnswer(s.plan))
--- End diff --

How about using a case class like `OptimizedSubquery` which extends 
`SubqueryExpression`? I think it's easier to understand from the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17277
  
**[Test build #74482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74482/testReport)**
 for PR 17277 at commit 
[`8896507`](https://github.com/apache/spark/commit/889650770345d93d520007a39a2f140350c3b104).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ....

2017-03-13 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16373#discussion_r105820580
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -642,18 +644,34 @@ case class ShowTablesCommand(
 // instead of calling tables in sparkSession.
 val catalog = sparkSession.sessionState.catalog
 val db = databaseName.getOrElse(catalog.getCurrentDatabase)
-val tables =
-  tableIdentifierPattern.map(catalog.listTables(db, 
_)).getOrElse(catalog.listTables(db))
-tables.map { tableIdent =>
-  val database = tableIdent.database.getOrElse("")
-  val tableName = tableIdent.table
-  val isTemp = catalog.isTemporaryTable(tableIdent)
-  if (isExtended) {
-val information = 
catalog.getTempViewOrPermanentTableMetadata(tableIdent).toString
-Row(database, tableName, isTemp, s"${information}\n")
-  } else {
-Row(database, tableName, isTemp)
+if (partitionSpec.isEmpty) {
+  // Show the information of tables.
+  val tables =
+tableIdentifierPattern.map(catalog.listTables(db, 
_)).getOrElse(catalog.listTables(db))
+  tables.map { tableIdent =>
+val database = tableIdent.database.getOrElse("")
--- End diff --

For temporary views, they have empty database.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16373: [SPARK-18961][SQL] Support `SHOW TABLE EXTENDED ... PART...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16373
  
**[Test build #74481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74481/testReport)**
 for PR 16373 at commit 
[`b46d771`](https://github.com/apache/spark/commit/b46d7717aa823f839d4790b097fd841440d70660).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17277: [SPARK-19887][SQL] dynamic partition keys can be null or...

2017-03-13 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17277
  
**[Test build #74480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74480/testReport)**
 for PR 17277 at commit 
[`a04e7e5`](https://github.com/apache/spark/commit/a04e7e5b22105188d076010bf9c6adffdcfa1f7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 549 matches

Mail list logo