[GitHub] spark issue #15229: [SPARK-17654] [SQL] Propagate bucketing information for ...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15229
  
**[Test build #65862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65862/consoleFull)**
 for PR 15229 at commit 
[`8726cc6`](https://github.com/apache/spark/commit/8726cc6430cbeaf8c2eebd7cef40199a7c563218).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15229: [SPARK-17654] [SQL] Propagate bucketing informati...

2016-09-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/15229

[SPARK-17654] [SQL] Propagate bucketing information for Hive tables to / 
from Catalog

## What changes were proposed in this pull request?

Currently Spark does not respect bucketing for Hive tables. This PR 
includes following changes:

- will extract table's bucketing information in `HiveClientImpl`
- while writing table info to metastore, `MetastoreRelation` now populates 
the bucketing information in the hive `Table` object
- `HiveTableScanExec` now exposes `outputPartitioning` and `outputOrdering` 
as per bucketing spec.
- `InsertIntoHiveTable` now exposes `requiredChildDistribution` and 
`requiredChildOrdering` based on the target table's bucketing spec.

TODOs (which will be done in linked PRs and not this one):

- [ ] `ClusteredDistribution` does not guarantee the number of partitions 
(which corresponds to output bucket files created) generated. This will require 
adding strict guarantees to `ClusteredDistribution`. I think it will need more 
thought and better to do incrementally and not packing in this PR.
- [ ] While writing to bucketed files, Hive's hashing function should be 
used. I have a PR open to implement Hive hashing native in Spark : 
https://github.com/apache/spark/pull/15047
- [ ] Allow creating Hive bucketed tables

## How was this patch tested?

Tested with Hive tables created locally. Adding a new test case will need 
implementing bucketed table creation which is not supported :( Suggestions 
welcome.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark 
SPARK-17654_hive_extract_bucketing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15229.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15229


commit caef89a198dac2fee4afaad622e2ecc11f200836
Author: Tejas Patil 
Date:   2016-08-23T20:45:00Z

Support bucketing for Hive tables

commit ee79dd2ae1e174ab38fc5f6b10f5a9a2e2721533
Author: Tejas Patil 
Date:   2016-08-23T20:45:00Z

Support bucketing for Hive tables

commit 8726cc6430cbeaf8c2eebd7cef40199a7c563218
Author: Tejas Patil 
Date:   2016-09-24T03:22:07Z

Merge remote-tracking branch 'origin/SPARK-17654_hive_extract_bucketing' 
into SPARK-17654_hive_extract_bucketing_2

# Conflicts:
#
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableSca
nExec.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12601
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12601
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65858/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15168
  
**[Test build #65861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65861/consoleFull)**
 for PR 15168 at commit 
[`ba22975`](https://github.com/apache/spark/commit/ba22975232bd64263ef0b513f11887378e0de43f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15168
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/12601
  
Mostly LGTM, except three minor comments. 

Thank you for your hard work, @JustinPihony !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80353253
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -420,62 +420,11 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   def jdbc(url: String, table: String, connectionProperties: Properties): 
Unit = {
 assertNotPartitioned("jdbc")
 assertNotBucketed("jdbc")
-
-// to add required options like URL and dbtable
-val params = extraOptions.toMap ++ Map("url" -> url, "dbtable" -> 
table)
-val jdbcOptions = new JDBCOptions(params)
-val jdbcUrl = jdbcOptions.url
-val jdbcTable = jdbcOptions.table
-
-val props = new Properties()
-extraOptions.foreach { case (key, value) =>
-  props.put(key, value)
-}
 // connectionProperties should override settings in extraOptions
-props.putAll(connectionProperties)
-val conn = JdbcUtils.createConnectionFactory(jdbcUrl, props)()
-
-try {
-  var tableExists = JdbcUtils.tableExists(conn, jdbcUrl, jdbcTable)
-
-  if (mode == SaveMode.Ignore && tableExists) {
-return
-  }
-
-  if (mode == SaveMode.ErrorIfExists && tableExists) {
-sys.error(s"Table $jdbcTable already exists.")
-  }
-
-  if (mode == SaveMode.Overwrite && tableExists) {
-if (jdbcOptions.isTruncate &&
-JdbcUtils.isCascadingTruncateTable(jdbcUrl) == Some(false)) {
-  JdbcUtils.truncateTable(conn, jdbcTable)
-} else {
-  JdbcUtils.dropTable(conn, jdbcTable)
-  tableExists = false
-}
-  }
-
-  // Create the table if the table didn't exist.
-  if (!tableExists) {
-val schema = JdbcUtils.schemaString(df, jdbcUrl)
-// To allow certain options to append when create a new table, 
which can be
-// table_options or partition_options.
-// E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"
-val createtblOptions = jdbcOptions.createTableOptions
-val sql = s"CREATE TABLE $jdbcTable ($schema) $createtblOptions"
-val statement = conn.createStatement
-try {
-  statement.executeUpdate(sql)
-} finally {
-  statement.close()
-}
-  }
-} finally {
-  conn.close()
-}
-
-JdbcUtils.saveTable(df, jdbcUrl, jdbcTable, props)
+this.extraOptions = this.extraOptions ++ (connectionProperties.asScala)
+// explicit url and dbtable should override all
+this.extraOptions += ("url" -> url, "dbtable" -> table)
+format("jdbc").save
--- End diff --

The omission of parentheses on methods should only be used when the method 
has no side-effects. 

Thus, please change it to `save()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80353203
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala ---
@@ -208,4 +210,84 @@ class JDBCWriteSuite extends SharedSQLContext with 
BeforeAndAfter {
 assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", properties).count())
 assert(2 === spark.read.jdbc(url1, "TEST.PEOPLE1", 
properties).collect()(0).length)
   }
+
+  test("save works for format(\"jdbc\") if url and dbtable are set") {
+val df = sqlContext.createDataFrame(sparkContext.parallelize(arr2x2), 
schema2)
+
+df.write.format("jdbc")
+.options(Map("url" -> url, "dbtable" -> "TEST.SAVETEST"))
+.save
--- End diff --

Nit: `save` -> `save()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15168
  
The failure seems to be irrelevant. Retest this please.
```
[info] - Naive Bayes Multinomial *** FAILED *** (137 milliseconds)
[info]   Expected 0.7 and 0.6494565217391305 to be within 0.05 using 
absolute tolerance.
[info]   validateModelFit:
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15228
  
Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15228
  
**[Test build #65857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65857/consoleFull)**
 for PR 15228 at commit 
[`caef89a`](https://github.com/apache/spark/commit/caef89a198dac2fee4afaad622e2ecc11f200836).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15228
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65857/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80353010
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1096,13 +1096,17 @@ the Data Sources API. The following options are 
supported:
 
 {% highlight sql %}
 
-CREATE TEMPORARY VIEW jdbcTable
+CREATE TEMPORARY TABLE jdbcTable
--- End diff --

Please change it back. `CREATE TEMPORARY TABLE` is deprecated. You will get 
a Parser error
```
CREATE TEMPORARY TABLE is not supported yet. Please use CREATE TEMPORARY 
VIEW as an alternative.(line 1, pos 0)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15168
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15168
  
**[Test build #65859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65859/consoleFull)**
 for PR 15168 at commit 
[`ba22975`](https://github.com/apache/spark/commit/ba22975232bd64263ef0b513f11887378e0de43f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15168
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65859/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15217: [SPARK-17577][Core] Update SparkContext.addFile to make ...

2016-09-23 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15217
  
Close this PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15217: [SPARK-17577][Core] Update SparkContext.addFile t...

2016-09-23 Thread yanboliang
Github user yanboliang closed the pull request at:

https://github.com/apache/spark/pull/15217


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15228: [SPARK-17654] [SQL] Propagate bucketing informati...

2016-09-23 Thread tejasapatil
Github user tejasapatil closed the pull request at:

https://github.com/apache/spark/pull/15228


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread JustinPihony
Github user JustinPihony commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80352586
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -21,6 +21,7 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Properties;
 // $example off:schema_merging$
 
--- End diff --

@HyukjinKwon Yes, that is what I was talking about...just fixed it back


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12601
  
**[Test build #65860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65860/consoleFull)**
 for PR 12601 at commit 
[`8fb86b4`](https://github.com/apache/spark/commit/8fb86b482929e321f4ec8865124b8661f1a29bbf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15168
  
**[Test build #65859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65859/consoleFull)**
 for PR 15168 at commit 
[`ba22975`](https://github.com/apache/spark/commit/ba22975232bd64263ef0b513f11887378e0de43f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/12601
  
Thanks for mentioning me. It looks good to me in my personal view.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80352317
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -21,6 +21,7 @@
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.List;
+import java.util.Properties;
 // $example off:schema_merging$
 
--- End diff --

Oh, maybe, my previous comment was not clear. I meant

```java
Import java.util.List;
// $example off:schema_merging$
Import java.util.Properties;
```

I haven't tried to build the doc against the current state but I guess we 
won't need this import for Parquet`s schema mering example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12601
  
**[Test build #65858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65858/consoleFull)**
 for PR 12601 at commit 
[`06c1cba`](https://github.com/apache/spark/commit/06c1cba1da5ab140d71c29f41afd608e863bfe1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread JustinPihony
Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/12601
  
@gatorsmile I added the R and SQL documentation. I took the SQL portion 
from https://github.com/apache/spark/pull/6121/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15228: [SPARK-17654] [SQL] Propagate bucketing information for ...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15228
  
**[Test build #65857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65857/consoleFull)**
 for PR 15228 at commit 
[`caef89a`](https://github.com/apache/spark/commit/caef89a198dac2fee4afaad622e2ecc11f200836).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15071: [SPARK-17517][SQL]Improve generated Code for BroadcastHa...

2016-09-23 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/15071
  
cc @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15228: [SPARK-17654] [SQL] Propagate bucketing informati...

2016-09-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/15228

[SPARK-17654] [SQL] Propagate bucketing information for Hive tables to / 
from Catalog

## What changes were proposed in this pull request?

Currently Spark does not respect bucketing for Hive tables. This PR 
includes following changes:

- will extract table's bucketing information in `HiveClientImpl`
- while writing table info to metastore, `MetastoreRelation` now populates 
the bucketing information in the hive `Table` object
- `HiveTableScanExec` now exposes `outputPartitioning` and `outputOrdering` 
as per bucketing spec.
- `InsertIntoHiveTable` now exposes `requiredChildDistribution` and 
`requiredChildOrdering` based on the target table's bucketing spec.

TODOs (which will be done in linked PRs and not this one):

- [ ] `ClusteredDistribution` does not guarantee the number of partitions 
(which corresponds to output bucket files created) generated. This will require 
adding strict guarantees to `ClusteredDistribution`. I think it will need more 
thought and better to do incrementally and not packing in this PR.
- [ ] While writing to bucketed files, Hive's hashing function should be 
used. I have a PR open to implement Hive hashing native in Spark : 
https://github.com/apache/spark/pull/15047
- [ ] Allow creating Hive bucketed tables

## How was this patch tested?

Tested with Hive tables created locally. Adding a new test case will need 
implementing bucketed table creation which is not supported :( Suggestions 
welcome.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark 
SPARK-17654_hive_extract_bucketing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15228






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15227: [SPARK-17655][SQL]Remove unused variables declarations a...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15227
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15227: [SPARK-17655][SQL]Remove unused variables declara...

2016-09-23 Thread yaooqinn
GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/15227

[SPARK-17655][SQL]Remove unused variables declarations and definations in a 
WholeStageCodeGened stage

## What changes were proposed in this pull request?

A WholeStageCodeGened stage with multiple CodegenSupport Operators 
generates unused result rows and their associated buffer holders and row 
writers, which can be removed.


## How was this patch tested?

existing ut.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark rm-unused-object

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15227


commit eabd4a55cbe8fd57c722396c95087a2b6c695587
Author: Kent Yao 
Date:   2016-09-24T01:58:42Z

remove redundant variables declarations and definations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15218
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65856/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15218
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15218
  
**[Test build #65856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65856/consoleFull)**
 for PR 15218 at commit 
[`f71f1c0`](https://github.com/apache/spark/commit/f71f1c0f245aa9534330c9b4913ce40a1cfa250e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80350919
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -23,6 +23,8 @@
 import java.util.List;
 // $example off:schema_merging$
 
+import java.util.Properties;
+
--- End diff --

No reason to not follow the guildline?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15226
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65855/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15226
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15226
  
**[Test build #65855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65855/consoleFull)**
 for PR 15226 at commit 
[`0e014b0`](https://github.com/apache/spark/commit/0e014b02d03eeda8373cd8892662ed6ce9de664c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread JustinPihony
Github user JustinPihony commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80350755
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -23,6 +23,8 @@
 import java.util.List;
 // $example off:schema_merging$
 
+import java.util.Properties;
+
--- End diff --

Should this really be added to the example, though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80350458
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -23,6 +23,8 @@
 import java.util.List;
 // $example off:schema_merging$
 
+import java.util.Properties;
+
--- End diff --

I think we should put `java.util` imports together without additional 
newline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65854/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15224
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15224
  
**[Test build #65854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65854/consoleFull)**
 for PR 15224 at commit 
[`49afc56`](https://github.com/apache/spark/commit/49afc5686d7ccf9a7864fc9b9c9eb5217a281086).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...

2016-09-23 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/15226#discussion_r80350179
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala ---
@@ -117,6 +124,24 @@ private[spark] abstract class 
AsynchronousListenerBus[L <: AnyRef, E](name: Stri
   eventLock.release()
 } else {
   onDropEvent(event)
+  droppedEventsCounter.incrementAndGet()
+}
+
+val droppedEvents = droppedEventsCounter.get
+if (droppedEvents > 0) {
+  // Don't log too frequently
+  if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) {
--- End diff --

Won't nanotime be overkill ? Even if there is a single dropped event, this 
check will get executed with every post() so having currentTimeMillis (which is 
less costly) is preferable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15213
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65853/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15213
  
**[Test build #65853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65853/consoleFull)**
 for PR 15213 at commit 
[`1127ca1`](https://github.com/apache/spark/commit/1127ca1538e9a9ded9e91ead65af8c710e99003d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15220
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65851/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...

2016-09-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15226#discussion_r80347195
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala ---
@@ -117,6 +124,24 @@ private[spark] abstract class 
AsynchronousListenerBus[L <: AnyRef, E](name: Stri
   eventLock.release()
 } else {
   onDropEvent(event)
+  droppedEventsCounter.incrementAndGet()
+}
+
+val droppedEvents = droppedEventsCounter.get
+if (droppedEvents > 0) {
+  // Don't log too frequently
+  if (System.currentTimeMillis() - lastReportTimestamp >= 60 * 1000) {
--- End diff --

use nanotime


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15220
  
**[Test build #65851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65851/consoleFull)**
 for PR 15220 at commit 
[`77d7ba0`](https://github.com/apache/spark/commit/77d7ba0ad3f2382c52a15a24cabcb02c3c0009f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15226: [SPARK-17649][CORE] Log how many Spark events got droppe...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15226
  
**[Test build #65855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65855/consoleFull)**
 for PR 15226 at commit 
[`0e014b0`](https://github.com/apache/spark/commit/0e014b02d03eeda8373cd8892662ed6ce9de664c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15218
  
**[Test build #65856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65856/consoleFull)**
 for PR 15218 at commit 
[`f71f1c0`](https://github.com/apache/spark/commit/f71f1c0f245aa9534330c9b4913ce40a1cfa250e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread scwf
Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15213
  
> actual problem is not in abortStage but rather in improper additions to 
failedStages

correct, i think a more accurate description for this issue is "do not add 
`failedStages` when abortStage for fetch failure"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...

2016-09-23 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/15226

[SPARK-17649][CORE] Log how many Spark events got dropped in 
AsynchronousListenerBus

## What changes were proposed in this pull request?

Backport #15220 to 1.6.

## How was this patch tested?

Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-17649-branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15226


commit 0e014b02d03eeda8373cd8892662ed6ce9de664c
Author: Shixiong Zhu 
Date:   2016-09-23T23:57:28Z

[SPARK-17649][CORE] Log how many Spark events got dropped in LiveListenerBus

Log how many Spark events got dropped in LiveListenerBus so that the user 
can get insights on how to set a correct event queue size.

Jenkins

Author: Shixiong Zhu 

Closes #15220 from zsxwing/SPARK-17649.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread zhzhan
Github user zhzhan commented on the issue:

https://github.com/apache/spark/pull/15218
  
@gatorsmile  Thanks. #65832 is the latest one which does not have the same 
failure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15220
  
Thanks! Merging to master / 2.0. I will submit a patch for 1.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15223
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15223
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65849/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15223
  
**[Test build #65849 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65849/consoleFull)**
 for PR 15223 at commit 
[`a0122f0`](https://github.com/apache/spark/commit/a0122f0569b9caa8995c65eb27314edb0234a5ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/15213
  
Right, but `abortStage` occurs elsewhere.  "When abort stage" seems to 
imply that this fix is necessary for all usages of `abortStage` when the actual 
problem is not in `abortStage` but rather in improper additions to 
`failedStages`.  I've got to go now, but I'll come back to this soon(ish).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread scwf
Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15213
  
Actually the failedStages only added here in spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/15213
  
@scwf That description would actually be at least as bad since there are 
multiple routes to `abortStage` and this issue of adding to `failedStages` only 
applies to these two.  I'll take another look soon and see if I can come up 
with a clean refactoring and a better description for the commit message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15089
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15089
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65850/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15089
  
**[Test build #65850 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65850/consoleFull)**
 for PR 15089 at commit 
[`5239042`](https://github.com/apache/spark/commit/52390429fb1f7b20705ddad5621e8267c2aff12b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/15213
  
Ok, that makes better sense.

The `disallowStageRetryForTest` case doesn't worry me too much since it is 
only used in tests.  If we can fix this case, great; else if it remains 
possible to create failing tests that can never happen outside of the tests, 
then that is not all that important (but should at least be noted in comments 
in the test suite.)

Yes, not adding to `failedStages` after going down either of those two 
paths to `abortStage` is a correct fix even if the description of the problem 
wasn't really accurate.  I'll take another look over the weekend to see if the 
logic can be expressed a bit more clearly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread scwf
Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15213
  
Thanks @zsxwing to explain this. 
@markhamstra the issue happens in the case of my PR description. It usually 
depends on muti-thread submitting jobs cases and the order of fetch failure, so 
i said it is a race condition.

If you think it is confusing, how about change the title to " do not add 
failedStages when abort stage"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65846/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15224
  
**[Test build #65846 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65846/consoleFull)**
 for PR 15224 at commit 
[`c65f94f`](https://github.com/apache/spark/commit/c65f94f440fd67c1d3b555e647dede95ac71fa25).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65847/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15220
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15220
  
**[Test build #65847 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65847/consoleFull)**
 for PR 15220 at commit 
[`2f47c30`](https://github.com/apache/spark/commit/2f47c30bf9b3ad1e929fe9bf0da4b835e7ea13cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15089
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65848/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15089
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/15213
  
@markhamstra I agreed this is not a race condition since there is only one 
single thread.

This issue is the code doesn't handle the following two corner cases:

- `failedStage.failedOnFetchAndShouldAbort(task.stageAttemptId) && 
failedStages.isEmpty` is true
- `disallowStageRetryForTest && failedStages.isEmpty`

In the above cases, `ResubmitFailedStages` won't be posted.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15089
  
**[Test build #65848 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65848/consoleFull)**
 for PR 15089 at commit 
[`87ecc0d`](https://github.com/apache/spark/commit/87ecc0db2c5c980273e06d37ecb764fd03ad2b65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15224
  
**[Test build #65854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65854/consoleFull)**
 for PR 15224 at commit 
[`49afc56`](https://github.com/apache/spark/commit/49afc5686d7ccf9a7864fc9b9c9eb5217a281086).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15224: [SPARK-17650] malformed url's throw exceptions before br...

2016-09-23 Thread brkyvz
Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/15224
  
@zsxwing Thanks for the review. Addressed the nit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/15213
  
This doesn't make sense to me.  The DAGSchedulerEventProcessLoop runs on a 
single thread and processes a single event from its queue at a time.

When the first CompletionEvent is run as a result of a fetch failure, 
failedStages is added to and a ResubmitFailedStages event is queued.  After 
handleTaskCompletion is done, the next event from the queue will be processed.  
As events are sequentially dequeued and handled, either the 
ResubmitFailedStages event will be handled before the CompletionEvent for the 
second fetch failure, or the CompletionEvent will be handled before the 
ResubmitFailedStages event.  If the ResubmitFailedStages is handled first, then 
failedStages will be cleared in resubmitFailedStages, and there will be nothing 
preventing the subsequent CompletionEvent from queueing another 
ResubmitFailedStages event to handle additional fetch failures.  In the 
alternative that the second CompletionEvent is queued and handled before the 
ResubmitFailedStages event, then the additional stages are added to the 
non-empty failedStages, but there is no need to schedule another 
ResubmitFailedStages event because the one from 
 the first CompletionEvent is still on the queue and the handling of that 
queued event will also handle the newly added failedStages from the second 
CompletionEvent.  In either ordering, all the failedStages are handled and 
there is no race condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15223
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15223
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65844/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15223: [SPARKR][SPARK-17651] Set R package version number along...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15223
  
**[Test build #65844 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65844/consoleFull)**
 for PR 15223 at commit 
[`742a787`](https://github.com/apache/spark/commit/742a7879865a4b85883337798c36af99c867ccae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14808: [SPARK-17156][ML][EXAMPLE] Add multiclass logistic regre...

2016-09-23 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/14808
  
I think we should close this. The new example and the user guide should be 
updated against 
[SPARK-17239](https://issues.apache.org/jira/browse/SPARK-17239). 
@jaceklaskowski If you'd still like to do it, please let me know otherwise I am 
happy to do it. We should try to get this in soon. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Fix the race condition when DAGSche...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15213
  
**[Test build #65853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65853/consoleFull)**
 for PR 15213 at commit 
[`1127ca1`](https://github.com/apache/spark/commit/1127ca1538e9a9ded9e91ead65af8c710e99003d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15200: Skip building R vignettes if Spark is not built

2016-09-23 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/15200
  
if it's part of the `-Psparkr` profile of the build it will be regenerated 
by default. If it's changed and not in .gitignore it should be flagged for 
commit..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15220
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65843/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15220
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15220: [SPARK-17649][Core]Log how many Spark events got dropped...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15220
  
**[Test build #65843 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65843/consoleFull)**
 for PR 15220 at commit 
[`b4f56a0`](https://github.com/apache/spark/commit/b4f56a073ac8f5b76db929a456f18b77b8e8910f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15225: [SPARK-17652] Fix confusing exception message whi...

2016-09-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15225#discussion_r80340526
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -285,19 +285,19 @@ public void reserve(int requiredCapacity) {
 try {
   reserveInternal(newCapacity);
 } catch (OutOfMemoryError outOfMemoryError) {
-  throwUnsupportedException(newCapacity, requiredCapacity, 
outOfMemoryError);
+  throwUnsupportedException(requiredCapacity, outOfMemoryError);
 }
   } else {
-throwUnsupportedException(newCapacity, requiredCapacity, null);
+throwUnsupportedException(requiredCapacity, null);
   }
 }
   }
 
-  private void throwUnsupportedException(int newCapacity, int 
requiredCapacity, Throwable cause) {
-String message = "Cannot reserve more than " + newCapacity +
-" bytes in the vectorized reader (requested = " + requiredCapacity 
+ " bytes). As a" +
-" workaround, you can disable the vectorized reader by setting "
-+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.";
+  private void throwUnsupportedException(int requiredCapacity, Throwable 
cause) {
+String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader " +
+"(requested = " + requiredCapacity + " bytes). As a workaround, 
you can disable the " +
+"vectorized reader by setting " + 
SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() +
+" to false.";
--- End diff --

oh, I was thinking if we can explain the reason that fails the allocation 
instead of just saying we cannot allocate mem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/12601
  
Not sure you already knew it. Just want to share the commands how to build 
the doc. 
```Scala
SKIP_API=1 jekyll build
SKIP_API=1 jekyll serve
```

After the second command, you can visit the generated document:
```
Server address: http://127.0.0.1:4000/
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15200: Skip building R vignettes if Spark is not built

2016-09-23 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15200
  
Yeah - so I'm thinking we should just auto-generate this and check in the 
html file in git. Its not that big. When somebody updates the vignette we need 
to remind them to regenerate it though as a part of the PR ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15225: [SPARK-17652] Fix confusing exception message whi...

2016-09-23 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/15225#discussion_r80339515
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -285,19 +285,19 @@ public void reserve(int requiredCapacity) {
 try {
   reserveInternal(newCapacity);
 } catch (OutOfMemoryError outOfMemoryError) {
-  throwUnsupportedException(newCapacity, requiredCapacity, 
outOfMemoryError);
+  throwUnsupportedException(requiredCapacity, outOfMemoryError);
 }
   } else {
-throwUnsupportedException(newCapacity, requiredCapacity, null);
+throwUnsupportedException(requiredCapacity, null);
   }
 }
   }
 
-  private void throwUnsupportedException(int newCapacity, int 
requiredCapacity, Throwable cause) {
-String message = "Cannot reserve more than " + newCapacity +
-" bytes in the vectorized reader (requested = " + requiredCapacity 
+ " bytes). As a" +
-" workaround, you can disable the vectorized reader by setting "
-+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.";
+  private void throwUnsupportedException(int requiredCapacity, Throwable 
cause) {
+String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader " +
+"(requested = " + requiredCapacity + " bytes). As a workaround, 
you can disable the " +
+"vectorized reader by setting " + 
SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() +
+" to false.";
--- End diff --

Shouldn't the first line work: `Cannot reserve additional contiguous bytes 
in the vectorized reader`? Do you have something in mind?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15225: [SPARK-17652] Fix confusing exception message while rese...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15225
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65852/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15224: [SPARK-17650] malformed url's throw exceptions be...

2016-09-23 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/15224#discussion_r80337786
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -698,6 +698,28 @@ private[spark] object Utils extends Logging {
   }
 
   /**
+   * Validate that a given URI is actually a valid URL as well.
+   * @param uri The URI to validate
+   */
+  @throws[MalformedURLException]("when the URI is an invalid URL")
+  def validateURL(uri: URI): Unit = {
+Option(uri.getScheme).getOrElse("file") match {
+  case "http" | "https" | "ftp" =>
+try {
+  uri.toURL
+} catch {
+  case e: MalformedURLException =>
+val msg = s"URI (${uri.toString}) is not a valid URL."
+logError(msg)
--- End diff --

nit: not need to log it since it's already be thrown.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15225: [SPARK-17652] Fix confusing exception message while rese...

2016-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15225
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15225: [SPARK-17652] Fix confusing exception message while rese...

2016-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15225
  
**[Test build #65852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65852/consoleFull)**
 for PR 15225 at commit 
[`ed87537`](https://github.com/apache/spark/commit/ed8753766e7d3e18603b2408553b624e17edec0b).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-09-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15218
  
See 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65832/testReport/org.apache.spark.streaming.kafka010/DirectKafkaStreamSuite/pattern_based_subscription/history/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15225: [SPARK-17652] Fix confusing exception message whi...

2016-09-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/15225#discussion_r80337683
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -285,19 +285,19 @@ public void reserve(int requiredCapacity) {
 try {
   reserveInternal(newCapacity);
 } catch (OutOfMemoryError outOfMemoryError) {
-  throwUnsupportedException(newCapacity, requiredCapacity, 
outOfMemoryError);
+  throwUnsupportedException(requiredCapacity, outOfMemoryError);
 }
   } else {
-throwUnsupportedException(newCapacity, requiredCapacity, null);
+throwUnsupportedException(requiredCapacity, null);
   }
 }
   }
 
-  private void throwUnsupportedException(int newCapacity, int 
requiredCapacity, Throwable cause) {
-String message = "Cannot reserve more than " + newCapacity +
-" bytes in the vectorized reader (requested = " + requiredCapacity 
+ " bytes). As a" +
-" workaround, you can disable the vectorized reader by setting "
-+ SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + " to false.";
+  private void throwUnsupportedException(int requiredCapacity, Throwable 
cause) {
+String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader " +
+"(requested = " + requiredCapacity + " bytes). As a workaround, 
you can disable the " +
+"vectorized reader by setting " + 
SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() +
+" to false.";
--- End diff --

Is it possible to also explain what's the cause of this error in the error 
message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >