[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-22 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14324
  
@breakdawn it'd be great to do more tests when you open a request. As I'm 
investigate into this too, I found that my same fix works for 201 cols but 
fails for 8118 cols. The exact limit is 8117.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-22 Thread breakdawn
Github user breakdawn commented on the issue:

https://github.com/apache/spark/pull/14324
  
Yes, working on that


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in partiti...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13756
  
**[Test build #62746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62746/consoleFull)**
 for PR 13756 at commit 
[`08b5374`](https://github.com/apache/spark/commit/08b5374e827f6680b4e4a00ed700ef689dce22ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-22 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14324
  
Can you add a test case?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13756: [SPARK-16041][SQL] Disallow Duplicate Columns in partiti...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13756
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14296: [SPARK-16639][SQL] The query with having condition that ...

2016-07-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14296
  
@cloud-fan any more comments? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14322
  
cc @marmbrus @cloud-fan @liancheng After history checking, most of codes 
are done by you. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14324: [SPARK-16664][SQL] Fix persist call on Data frames with ...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14324
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14324: [SPARK-16664][SQL] Fix persist call on Data frame...

2016-07-22 Thread breakdawn
GitHub user breakdawn opened a pull request:

https://github.com/apache/spark/pull/14324

[SPARK-16664][SQL] Fix persist call on Data frames with more than 200…

## What changes were proposed in this pull request?

f12f11e578169b47e3f8b18b299948c0670ba585 introduced this bug, missed 
foreach as map

## How was this patch tested?

manual tests done with following:
test("test data frame with 201 columns") {
val sparkConfig = new SparkConf()
val parallelism = 5
sparkConfig.set("spark.default.parallelism", s"$parallelism")
sparkConfig.set("spark.sql.shuffle.partitions", s"$parallelism")

val sc = new SparkContext(s"local[3]", "TestNestedJson", sparkConfig)
val sqlContext = new SQLContext(sc)

// create dataframe with 201 columns and fake 201 values
val size = 201
val rdd: RDD[Seq[Long]] = sc.parallelize(Seq(Seq.range(0, size)))
val rowRdd: RDD[Row] = rdd.map(d => Row.fromSeq(d))

val schemas = List.range(0, size).map(a => StructField("name"+ a, 
LongType, true))
val testSchema = StructType(schemas)
val testDf = sqlContext.createDataFrame(rowRdd, testSchema)


// test value
assert(testDf.persist.take(1).apply(0).toSeq(100).asInstanceOf[Long] == 
100)
sc.stop()
  }

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/breakdawn/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14324.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14324


commit 7040dc9f45eae56cb706cb44cd48bea16217db1e
Author: Wesley Tang 
Date:   2016-07-23T04:35:48Z

[SPARK-16664][SQL] Fix persist call on Data frames with more than 200 
columns is wiping out the data.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14323
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14323
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62745/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14323
  
**[Test build #62745 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62745/consoleFull)**
 for PR 14323 at commit 
[`8cac7de`](https://github.com/apache/spark/commit/8cac7dec9b1b999d3c5aa8ecf2086c40078ea4d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14270
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62744/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14270
  
**[Test build #62744 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62744/consoleFull)**
 for PR 14270 at commit 
[`3c8ea96`](https://github.com/apache/spark/commit/3c8ea966c5d3f356ad9fd4bead3fd2ced236c6bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14323: [SPARK-16675][SQL] Avoid per-record type dispatch in JDB...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14323
  
**[Test build #62745 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62745/consoleFull)**
 for PR 14323 at commit 
[`8cac7de`](https://github.com/apache/spark/commit/8cac7dec9b1b999d3c5aa8ecf2086c40078ea4d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14323: [SPARK-16675][SQL] Avoid per-record type dispatch...

2016-07-22 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14323

[SPARK-16675][SQL] Avoid per-record type dispatch in JDBC when writing

## What changes were proposed in this pull request?

Currently, `JdbcUtils.savePartition` is doing type-based dispatch for each 
row to write appropriate values.

So, appropriate setters for `PreparedStatement` can be created first 
according to the schema, and then apply them to each row. This approach is 
similar with `CatalystWriteSupport`.

This PR simply make the setters to avoid this.

## How was this patch tested?

Existing tests should cover this.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16675

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14323.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14323


commit 4284d4621104c3badcb743d908e48f283130b186
Author: hyukjinkwon 
Date:   2016-07-23T02:59:25Z

[SPARK-16675][SQL] Avoid per-record type dispatch in JDBC when writing

commit 8cac7dec9b1b999d3c5aa8ecf2086c40078ea4d9
Author: hyukjinkwon 
Date:   2016-07-23T03:01:16Z

Fix comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14322
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62743/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14322
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14322
  
**[Test build #62743 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62743/consoleFull)**
 for PR 14322 at commit 
[`c1ff046`](https://github.com/apache/spark/commit/c1ff0465815f6adefb2b29c2973c9bc63aa13623).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14164: [SPARK-16629] Allow comparisons between UDTs and ...

2016-07-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14164#discussion_r71965649
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -110,6 +110,28 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("test filtering with predicates on UDT columns") {
+val rowRDD = sparkContext.parallelize(Seq(Row(new ExampleMoney(1.0)), 
Row(new ExampleMoney(2.0)), Row(new ExampleMoney(3.0
+val schema = StructType(Array(StructField("dollar", new 
ExampleMoneyUDT(), false)))
+val df = spark.createDataFrame(rowRDD, schema)
+
+checkAnswer(df.filter(df("dollar") < 2.0), Seq(Row(new 
ExampleMoney(1.0
--- End diff --

cc @mengxr , is UDT designed to work like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14296: [SPARK-16639][SQL] The query with having conditio...

2016-07-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14296#discussion_r71965620
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1207,6 +1207,12 @@ class Analyzer(
 val alias = Alias(ae, ae.toString)()
 aggregateExpressions += alias
 alias.toAttribute
+  case ne: NamedExpression => ne
+  case e: Expression if grouping.exists(_.semanticEquals(e)) &&
+  !ResolveGroupingAnalytics.hasGroupingFunction(e) =>
--- End diff --

I am not near my laptop. But pushing grouping function causes test failed 
in SQLQuerySuite. I remember there is another rule taking care of grouping 
function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning P...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14322#discussion_r71965622
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
 ---
@@ -135,9 +135,17 @@ private[sql] object FileSourceStrategy extends 
Strategy with Logging {
 PUSHED_FILTERS -> pushedDownFilters.mkString("[", ", ", "]"),
 INPUT_PATHS -> fsRelation.location.paths.mkString(", "))
 
+  // If the required attributes does not have the partitioning 
columns, we do not need
+  // to scan the partitioning columns. If partitioning columns are 
selected, the column order
+  // of partitionColumns is fixed in rdd. Thus, we always scan all the 
partitioning columns.
+  val scannedColumns = if 
(requiredAttributes.intersect(partitionSet).nonEmpty) {
+readDataColumns ++ partitionColumns
+  } else {
+readDataColumns
+  }
   val scan =
 DataSourceScanExec.create(
-  readDataColumns ++ partitionColumns,
+  scannedColumns,
--- End diff --

This is solution one that requires few code changes but gets most benefits. 
We can have another solution that covers more cases by changing the RDD 
generation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13761: [SPARK-12197] [SparkCore] Kryo & Avro - Support Schema R...

2016-07-22 Thread RotemShaul
Github user RotemShaul commented on the issue:

https://github.com/apache/spark/pull/13761
  
Indeed it is, but then you lose the already implemented
GenericAvroSerializer abilities which come out of the box with Spark.
(Caching / Registering of static schemas )

As Spark already chose to (partially) support Avro from within SparkCore,
to me it makes sense it will also support schema repos, as they are very
common with Avro users to deal with Schema Evolution.
It was this partial support that actually sparked the idea of 'if they
support registering of Avro schemas, why not go all the way  ?' and that's
why I created the PR in the first place.

Avro Generic Records and Spark-Core users will always face the
serialization problem of schemas, some might be able to solve it with
static schemas, and other will need the dynamic solution. It makes sense
that either SparkCore will provide solution for both use cases or none of
them. (and let it be resolved by custom serializer)

Just my opinion. In my current workplace - I took your
GenericAvroSerializer, added few lines of code to it, and used it as custom
serializer. But it could be generalized - hence the PR.




On Sat, Jul 23, 2016 at 5:14 AM, Reynold Xin 
wrote:

> @RotemShaul  is this something doable by
> implementing a custom serializer outside Spark?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14259: [SPARK-16622][SQL] Fix NullPointerException when the ret...

2016-07-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14259
  
Thanks for reviewing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14259: [SPARK-16622][SQL] Fix NullPointerException when the ret...

2016-07-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14259
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14259: [SPARK-16622][SQL] Fix NullPointerException when ...

2016-07-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14259


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13761: [SPARK-12197] [SparkCore] Kryo & Avro - Support Schema R...

2016-07-22 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13761
  
@RotemShaul is this something doable by implementing a custom serializer 
outside Spark?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14270
  
**[Test build #62744 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62744/consoleFull)**
 for PR 14270 at commit 
[`3c8ea96`](https://github.com/apache/spark/commit/3c8ea966c5d3f356ad9fd4bead3fd2ced236c6bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14270: [SPARK-5847][CORE] Allow for configuring MetricsSystem's...

2016-07-22 Thread markgrover
Github user markgrover commented on the issue:

https://github.com/apache/spark/pull/14270
  
Ok, I have pushed changes to use the expansion capabilities brought in by 
SPARK-16272. Overall, I think it was a very good call to use that, so thanks 
for the suggestions! Would appreciate a review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14216
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62742/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14216
  
**[Test build #62742 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62742/consoleFull)**
 for PR 14216 at commit 
[`25b6fde`](https://github.com/apache/spark/commit/25b6fde3ecfdaae2873552064044fed15e7f7374).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14322
  
**[Test build #62743 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62743/consoleFull)**
 for PR 14322 at commit 
[`c1ff046`](https://github.com/apache/spark/commit/c1ff0465815f6adefb2b29c2973c9bc63aa13623).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14322
  
**After the PR changes**, the whole-stage codegen output is like:
```JAVA
== Subtree 1 / 1 ==
*Scan json [value#37L] Format: JSON, InputPaths: 
file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-8ac18be7-053f-4498-bf59-5ed87...,
 PushedFilters: [], ReadSchema: struct

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private org.apache.spark.sql.execution.metric.SQLMetric 
scan_numOutputRows;
/* 008 */   private scala.collection.Iterator scan_input;
/* 009 */
/* 010 */   public GeneratedIterator(Object[] references) {
/* 011 */ this.references = references;
/* 012 */   }
/* 013 */
/* 014 */   public void init(int index, scala.collection.Iterator inputs[]) 
{
/* 015 */ partitionIndex = index;
/* 016 */ this.scan_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[0];
/* 017 */ scan_input = inputs[0];
/* 018 */   }
/* 019 */
/* 020 */   protected void processNext() throws java.io.IOException {
/* 021 */ while (scan_input.hasNext()) {
/* 022 */   InternalRow scan_row = (InternalRow) scan_input.next();
/* 023 */   scan_numOutputRows.add(1);
/* 024 */   append(scan_row);
/* 025 */   if (shouldStop()) return;
/* 026 */ }
/* 027 */   }
/* 028 */ }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning Partitio...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14322
  
**Before the PR changes**, the whole-stage codegen output is like:
```JAVA
== Subtree 1 / 1 ==
*Project [value#37L]
+- *Scan json [value#37L,p1#39,p2#40,p3#41] Format: JSON, InputPaths: 
file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-f7a4294a-2e1b-4f44-9ebb-1a5eb...,
 PushedFilters: [], ReadSchema: struct

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private org.apache.spark.sql.execution.metric.SQLMetric 
scan_numOutputRows;
/* 008 */   private scala.collection.Iterator scan_input;
/* 009 */   private UnsafeRow project_result;
/* 010 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder project_holder;
/* 011 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
project_rowWriter;
/* 012 */
/* 013 */   public GeneratedIterator(Object[] references) {
/* 014 */ this.references = references;
/* 015 */   }
/* 016 */
/* 017 */   public void init(int index, scala.collection.Iterator inputs[]) 
{
/* 018 */ partitionIndex = index;
/* 019 */ this.scan_numOutputRows = 
(org.apache.spark.sql.execution.metric.SQLMetric) references[0];
/* 020 */ scan_input = inputs[0];
/* 021 */ project_result = new UnsafeRow(1);
/* 022 */ this.project_holder = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(project_result, 
0);
/* 023 */ this.project_rowWriter = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(project_holder,
 1);
/* 024 */   }
/* 025 */
/* 026 */   protected void processNext() throws java.io.IOException {
/* 027 */ while (scan_input.hasNext()) {
/* 028 */   InternalRow scan_row = (InternalRow) scan_input.next();
/* 029 */   scan_numOutputRows.add(1);
/* 030 */   boolean scan_isNull4 = scan_row.isNullAt(0);
/* 031 */   long scan_value4 = scan_isNull4 ? -1L : 
(scan_row.getLong(0));
/* 032 */   project_rowWriter.zeroOutNullBytes();
/* 033 */
/* 034 */   if (scan_isNull4) {
/* 035 */ project_rowWriter.setNullAt(0);
/* 036 */   } else {
/* 037 */ project_rowWriter.write(0, scan_value4);
/* 038 */   }
/* 039 */   append(project_result);
/* 040 */   if (shouldStop()) return;
/* 041 */ }
/* 042 */   }
/* 043 */ }
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning P...

2016-07-22 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14322

[SPARK-16689] [SQL] FileSourceStrategy: Pruning Partition Columns When No 
Partition Column Exist in Project

### What changes were proposed in this pull request?
For partitioned file sources, the current implementation always scans all 
the partition columns. However, this is not necessary when the projected column 
list does not include any partition column. In addition, we also can avoid the 
unnecessary Project.
Below is an example, 

Below is an example,
```scala
spark
  .range(N)
  .selectExpr("id AS value1", "id AS value2", "id AS p1", "id AS p2", "id 
AS p3")
  .toDF("value", "value2", "p1", "p2", "p3").write.format("json")
  .partitionBy("p1", "p2", "p3").save(tempDir)
```
```
spark.read.format("json").load(tempDir).selectExpr("value")
```

**Before the PR changes**, the physical plan is like:
```
== Physical Plan ==
*Project [value#37L]
+- *Scan json [value#37L,p1#39,p2#40,p3#41] Format: JSON, InputPaths: 
file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-f7a4294a-2e1b-4f44-9ebb-1a5eb...,
 PushedFilters: [], ReadSchema: struct
```

**After the PR changes**, the physical plan becomes:
```
== Physical Plan ==
*Scan json [value#147L] Format: JSON, InputPaths: 
file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-a5bcb14a-46c2-4c20-8f34-9662b...,
 PushedFilters: [], ReadSchema: struct
```
### How was this patch tested?
Added a test case to verify the results.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark columnPruning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14322.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14322


commit c1ff0465815f6adefb2b29c2973c9bc63aa13623
Author: gatorsmile 
Date:   2016-07-23T00:59:14Z

solution1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14216
  
@srowen Oh, I miss your comment about loop brace, now it added, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14216
  
**[Test build #62742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62742/consoleFull)**
 for PR 14216 at commit 
[`25b6fde`](https://github.com/apache/spark/commit/25b6fde3ecfdaae2873552064044fed15e7f7374).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62741/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14086
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14086
  
**[Test build #62741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62741/consoleFull)**
 for PR 14086 at commit 
[`e989b3e`](https://github.com/apache/spark/commit/e989b3ea66170f28531652ffcf1bab1af6703329).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14304
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62740/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14304
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14304
  
**[Test build #62740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62740/consoleFull)**
 for PR 14304 at commit 
[`16e6b91`](https://github.com/apache/spark/commit/16e6b91688ad8f73336b7729745189e2bd7f880f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14321: [SPARK-8971][ML] Add stratified sampling to ML CrossVali...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14321
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14321: [SPARK-8971][ML] Add stratified sampling to ML CrossVali...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14321
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62738/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14321: [SPARK-8971][ML] Add stratified sampling to ML CrossVali...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14321
  
**[Test build #62738 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62738/consoleFull)**
 for PR 14321 at commit 
[`37be0b5`](https://github.com/apache/spark/commit/37be0b5c6a0a4bd6fcc4a0f59c5f575ef6f623ae).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14319: [SPARK-16635] [WEBUI] [SQL] [WIP] Provide Session suppor...

2016-07-22 Thread nblintao
Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14319
  
Thanks, @ajbozarth.
Yes, I think the configuration alone is not enough for a new tab. @yhuai 
and I actually plan to do more on this tab. As mentioned in 
[JIRA](https://issues.apache.org/jira/browse/SPARK-16635):
> In Spark 2.0, SparkSession will be the entry point of spark. We can think 
about how to display per-session info (like configurations) and jobs in a good 
way. We can start to experiment a Session tab and see if we can come up with a 
way to just show a single session's info in a nice way. Then, we can see if 
this new tab can replace any existing tab.
So this is not final and please make more suggestions. I've just put WIP in 
the title to avoid misleading.
Also, I will try to fix the error recently.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14295: [SPARK-16648][SQL] Overrides TreeNode.withNewChildren in...

2016-07-22 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14295
  
@liancheng  Can you also change `First`? I think that one is also broken 
for this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14086
  
**[Test build #62741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62741/consoleFull)**
 for PR 14086 at commit 
[`e989b3e`](https://github.com/apache/spark/commit/e989b3ea66170f28531652ffcf1bab1af6703329).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14304
  
**[Test build #62740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62740/consoleFull)**
 for PR 14304 at commit 
[`16e6b91`](https://github.com/apache/spark/commit/16e6b91688ad8f73336b7729745189e2bd7f880f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14086: [SPARK-16463][SQL] Support `truncate` option in Overwrit...

2016-07-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14086
  
Rebased.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14304
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62739/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14304
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14304
  
**[Test build #62739 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62739/consoleFull)**
 for PR 14304 at commit 
[`4f98c7f`](https://github.com/apache/spark/commit/4f98c7fc9c91893d22b78ed693d9a8f33bbb1146).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14304: [SPARK-16668][TEST] Test parquet reader for row groups c...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14304
  
**[Test build #62739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62739/consoleFull)**
 for PR 14304 at commit 
[`4f98c7f`](https://github.com/apache/spark/commit/4f98c7fc9c91893d22b78ed693d9a8f33bbb1146).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71954942
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
+  val column = reader.resultBatch().column(0)
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(column.getUTF8String(3 * i).toString == i.toString)
--- End diff --

Seems like there's no `toInt` function in 
`org.apache.spark.unsafe.types.UTF8String`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14304: [SPARK-16668][TEST] Test parquet reader for row g...

2016-07-22 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14304#discussion_r71954540
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetEncodingSuite.scala
 ---
@@ -78,4 +78,30 @@ class ParquetEncodingSuite extends 
ParquetCompatibilityTest with SharedSQLContex
   }}
 }
   }
+
+  test("Read row group containing both dictionary and plain encoded 
pages") {
+spark.conf.set("parquet.dictionary.page.size", "2048")
+spark.conf.set("parquet.page.size", "4096")
+
+withTempPath { dir =>
+  // In order to explicitly test for SPARK-14217, we set the parquet 
dictionary and page size
+  // such that the following data spans across 3 pages (within a 
single row group) where the
+  // first page is dictionary encoded and the remaining two are plain 
encoded.
+  val data = (0 until 512).flatMap(i => Seq.fill(3)(i.toString))
+  data.toDF("f").coalesce(1).write.parquet(dir.getCanonicalPath)
+  val file =
+
SpecificParquetRecordReaderBase.listDirectory(dir).toArray.head.asInstanceOf[String]
+
+  val reader = new VectorizedParquetRecordReader
+  reader.initialize(file, null /* set columns to null to project all 
columns */)
+  val column = reader.resultBatch().column(0)
+  assert(reader.nextBatch())
+
+  (0 until 512).foreach { i =>
+assert(column.getUTF8String(3 * i).toString == i.toString)
--- End diff --

Ah, gotcha!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14248: [SPARK-16589][PYTHON] Chained cartesian produces incorre...

2016-07-22 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/14248
  
@holdenk Can we move this discussion to JIRA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62737/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #62737 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62737/consoleFull)**
 for PR 14079 at commit 
[`8a12adf`](https://github.com/apache/spark/commit/8a12adf445b00e8841eb3df071c0b6adee6c16da).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14321: [SPARK-8971][ML] Add stratified sampling to ML CrossVali...

2016-07-22 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/14321
  
cc @MLnick @hhbyyh @mengxr I believe there is still interest in stratified 
sampling methods. Could you provide feedback/review on this patch? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14321: [SPARK-8971][ML] Add stratified sampling to ML CrossVali...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14321
  
**[Test build #62738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62738/consoleFull)**
 for PR 14321 at commit 
[`37be0b5`](https://github.com/apache/spark/commit/37be0b5c6a0a4bd6fcc4a0f59c5f575ef6f623ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14321: [SPARK-8971][ML] Add stratified sampling to ML Cr...

2016-07-22 Thread sethah
GitHub user sethah opened a pull request:

https://github.com/apache/spark/pull/14321

[SPARK-8971][ML] Add stratified sampling to ML CrossValidator and 
TrainValidationSplit

## What changes were proposed in this pull request?

This patch adds the ability to do stratified sampling in cross validation 
for ML pipelines. This is accomplished by modifying some of the methods in 
`StratifiedSamplingUtils` to support multiple _splits_ instead of a single 
subsample of the data. A method is added to `PairRDDFunctions` to support 
`randomSplitByKey`. Please see the detailed explanation below.


## How was this patch tested?

Unit tests were added to `PairRDDFunctionsSuite`, `MLUtilsSuite`, 
`CrossValidatorSuite`, and `TrainValidationSuite`.

## Algorithm changes

Currently, Spark implements a stratified sampling function on PairRDDs 
using the method `sampleByKeyExact` and `sampleByKey`. This method calls a 
stratified sampling routine that is implemented in `StratifiedSamplingUtils`. 
The underlying algorithm is described 
[here](http://jmlr.org/proceedings/papers/v28/meng13a.pdf) in the paper by 
Xiangrui Meng. When exact samples stratified samples are required, the 
algorithm makes an extra pass through the data. Each sample is mapped on to the 
interval [0, 1] (for sampling without replacement), and we expect that, say for 
a 50% sample, we will split the interval at 0.5 and accept the samples which 
fell below that threshold. Items near 0 are highly likely to be accepted, while 
items near 1 are highly unlikely to be accepted. Items near 0.5 are uncertain, 
and are added to a waitlist on the first pass. The items in the waitlist will 
be sorted and used to determine the exact split point which produces 50/50 
sample. 


![image](https://cloud.githubusercontent.com/assets/7275795/17071720/2f90bd14-5018-11e6-91c2-2c1dc2191213.png)

This patch modifies the routine to produce multiple splits by generating 
multiple waitlists on the first pass. Each waitlist is sorted to determine the 
exact split points and then we can sample as normal. 


![image](https://cloud.githubusercontent.com/assets/7275795/17071744/4f3b6d76-5018-11e6-8591-857eaaf288f9.png)

One potential concern is that if this is used for a large number of splits, 
it may degrade to the point where sorting the entire dataset would be quicker, 
as the waitlists get closer and closer together. It could potentially cause OOM 
errors on the driver if there are too many waitlists collected. Still, before 
this patch there was not a way to actually take a single _split_ of the data, 
as `sampleByKey` does not return the complement of the sample. This patch fixes 
this as well.

## ML API

This patch also allows users to specify a stratified column in the 
`CrossValidator` and `TrainValidationSplit` estimators. This is done by 
converting the input dataframe to a PairRDD and calling the `randomSplitByKey` 
method. This is exposed via a `setStratifiedCol` parameter which, if set, will 
use _exact_ stratified splits for cross validation. 

## Future considerations

This can be implemented as a function on dataframes in the future, if there 
is interest. It is somewhat inconvient to convert the dataframe to a pair rdd, 
perform sampling, and then convert back to a dataframe. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sethah/spark Working_on_SPARK-8971

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14321.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14321


commit a058cd8107666cb8bc5dd090fd1c52aadd896304
Author: sethah 
Date:   2015-08-08T00:04:13Z

Adding stratified sampling to cross validation and train validation split 
in ml/tuning

commit 5f244d1cb5bd747e7383a85b54394a2fa9efa32e
Author: sethah 
Date:   2015-08-10T22:26:38Z

Adding some tests and style fixes

commit 67f60027158fae37d3f3973fd22217298097ebd7
Author: sethah 
Date:   2016-04-22T14:51:13Z

Refactor for efficiency when computing multiple waitlists.

commit 37be0b5c6a0a4bd6fcc4a0f59c5f575ef6f623ae
Author: sethah 
Date:   2016-07-22T17:16:31Z

Move some logic back into SSUtils




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: 

[GitHub] spark issue #14319: [SPARK-16635][WEBUI][SQL] Provide Session support in the...

2016-07-22 Thread ajbozarth
Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/14319
  
I've read through your code and didn't catch any issues, I also checked it 
out and it looks good. I think this is a nice feature to add, my only qualm is 
it add yet another tab to the Web UI. If everyone is ok adding another tab I'm 
ok with it, but could it make sense to just put the table in the env tab? 
Opinions? @srowen @tgravescs 
Also I think you missed something in your updates to the tests but I 
couldn't see it on first look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...

2016-07-22 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/9766
  
I was looking at some similar stuff as part of 
https://github.com/apache/spark/pull/13571 and I was thinking that (to match 
the Scala API) it would be good to return the UDF object as well so people can 
use it progmatically with the DataFrame API instead of just just limited to 
using it inside of SQL queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14233: [SPARK-16490] [Examples] added a python example for chis...

2016-07-22 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/14233
  
Thanks for taking this on! More documentation is always an improvement - 
looking at the Scala & Java examples it seems like they are included in 
./docs/mllib-feature-extraction.md - it would probably be good to also do this 
for the Python as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13932: [SPARK-15354] [CORE] [WIP] Topology aware block replicat...

2016-07-22 Thread shubhamchopra
Github user shubhamchopra commented on the issue:

https://github.com/apache/spark/pull/13932
  
Based on feedback from @rxin, added a Basic Strategy that replicates HDFS 
behavior as a simpler alternative to the constraint solver. I also ran some 
performance tests on the constraint solver and saw these numbers:

![image](https://cloud.githubusercontent.com/assets/6588487/17070321/372b28be-5029-11e6-97ce-a89ce90b6fa6.png)

The times show average, min and max of 50 runs of the optimizer for 50, 
100, ..., 10 peers placed in appropriate number of racks. When blocks are 
being replicated, the majority of time is expected to be spent in the actual 
data movement across the network. These numbers show that the performance hit 
from the constraint solver can be expected to be minimal.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread dhruve
Github user dhruve commented on the issue:

https://github.com/apache/spark/pull/14269
  
The set of failures from [Test build 
#62733](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62733/consoleFull)
 are unrelated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14314: [SPARK-16678] [SPARK-16677] [SQL] Fix two View-re...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14314#discussion_r71939503
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -55,6 +54,76 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("error handling: existing a table with the duplicate name when 
creating/altering a view") {
+withTable("tab1") {
+  sql("CREATE TABLE tab1 (id int)")
+  var e = intercept[AnalysisException] {
+sql("CREATE OR REPLACE VIEW tab1 AS SELECT * FROM jt")
+  }.getMessage
+  assert(e.contains("The following is an existing table, not a view: 
`default`.`tab1`"))
+  e = intercept[AnalysisException] {
+sql("CREATE VIEW tab1 AS SELECT * FROM jt")
--- End diff --

```
hive> CREATE VIEW tab1 AS SELECT * FROM t1;
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. AlreadyExistsException(message:Table 
tab1 already exists)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

2016-07-22 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14309
  
I'm not sure I am good reviewer for this as I dont fully understand the 
consequences inside SQL for this change. cc @liancheng @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14314: [SPARK-16678] [SPARK-16677] [SQL] Fix two View-re...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14314#discussion_r71939188
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -55,6 +54,76 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("error handling: existing a table with the duplicate name when 
creating/altering a view") {
+withTable("tab1") {
+  sql("CREATE TABLE tab1 (id int)")
+  var e = intercept[AnalysisException] {
+sql("CREATE OR REPLACE VIEW tab1 AS SELECT * FROM jt")
+  }.getMessage
+  assert(e.contains("The following is an existing table, not a view: 
`default`.`tab1`"))
+  e = intercept[AnalysisException] {
+sql("CREATE VIEW tab1 AS SELECT * FROM jt")
+  }.getMessage
+  assert(e.contains("The following is an existing table, not a view: 
`default`.`tab1`"))
+  e = intercept[AnalysisException] {
+sql("ALTER VIEW tab1 AS SELECT * FROM jt")
+  }.getMessage
+  assert(e.contains("The following is an existing table, not a view: 
`default`.`tab1`"))
+}
+  }
+
+  test("existing a table with the duplicate name when CREATE VIEW IF NOT 
EXISTS") {
+withTable("tab1") {
+  sql("CREATE TABLE tab1 (id int)")
+  sql("CREATE VIEW IF NOT EXISTS tab1 AS SELECT * FROM jt")
+  checkAnswer(sql("select count(*) FROM tab1"), Row(0))
+}
+  }
+
+  test("error handling: insert/load/truncate table commands against a temp 
view") {
+val viewName = "testView"
+withView(viewName) {
--- End diff --

Will fix it, after your PR is merged. 
https://github.com/apache/spark/pull/14318

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14314: [SPARK-16678] [SPARK-16677] [SQL] Fix two View-re...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14314#discussion_r71938799
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLViewSuite.scala 
---
@@ -55,6 +54,76 @@ class SQLViewSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton {
 }
   }
 
+  test("error handling: existing a table with the duplicate name when 
creating/altering a view") {
+withTable("tab1") {
+  sql("CREATE TABLE tab1 (id int)")
+  var e = intercept[AnalysisException] {
+sql("CREATE OR REPLACE VIEW tab1 AS SELECT * FROM jt")
+  }.getMessage
+  assert(e.contains("The following is an existing table, not a view: 
`default`.`tab1`"))
+  e = intercept[AnalysisException] {
+sql("CREATE VIEW tab1 AS SELECT * FROM jt")
--- End diff --

For this case, it shows `table already existed`. However, when we found a 
view/table exists, the error message is wrong:
```
View $tableIdentifier already exists. If you want to update the view 
definition,
 please use ALTER VIEW AS or CREATE OR REPLACE VIEW AS
```
We are unable to alter a table (that is not a view) by these commands 
`ALTER VIEW AS or CREATE OR REPLACE VIEW AS`.

See the [source code] 
(https://github.com/gatorsmile/spark/blob/c64092c6aaec42663278343a27467c1c8c165b92/sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala#L119-L124)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14240#discussion_r71938022
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PrunedScanSuite.scala ---
@@ -114,16 +114,15 @@ class PrunedScanSuite extends DataSourceTest with 
SharedSQLContext {
   testPruning("SELECT * FROM oneToTenPruned", "a", "b")
   testPruning("SELECT a, b FROM oneToTenPruned", "a", "b")
   testPruning("SELECT b, a FROM oneToTenPruned", "b", "a")
-  testPruning("SELECT b, b FROM oneToTenPruned", "b")
+  testPruning("SELECT b, b FROM oneToTenPruned", "b", "b")
+  testPruning("SELECT b as alias_b, b FROM oneToTenPruned", "b")
--- End diff --

Found something interesting in File Source Scan. Will submit a separate PR 
for resolving it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14240: [SPARK-16594] [SQL] Remove Physical Plan Differen...

2016-07-22 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/14240


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14240: [SPARK-16594] [SQL] Remove Physical Plan Differences whe...

2016-07-22 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14240
  
Since File Scan is completely different from Data Source Table Scan, Hive 
Table Scan and In-memory Table Scan, it does not make sense to make all of them 
consistent. Close it now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #2135: [SPARK-3229] spark.shuffle.safetyFraction and spark.stora...

2016-07-22 Thread hastimal
Github user hastimal commented on the issue:

https://github.com/apache/spark/pull/2135
  
@andrewor14 Thank you for these info. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #62737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62737/consoleFull)**
 for PR 14079 at commit 
[`8a12adf`](https://github.com/apache/spark/commit/8a12adf445b00e8841eb3df071c0b6adee6c16da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/14079
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14275
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14275
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62735/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14275: [SPARK-16637] Unified containerizer

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14275
  
**[Test build #62735 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62735/consoleFull)**
 for PR 14275 at commit 
[`be145c7`](https://github.com/apache/spark/commit/be145c788806507992402f0ed3b2a9c631374d51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14288: [SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop...

2016-07-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/14288
  
Thank you for review and merging, @srowen and @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62734/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #62734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62734/consoleFull)**
 for PR 14079 at commit 
[`8a12adf`](https://github.com/apache/spark/commit/8a12adf445b00e8841eb3df071c0b6adee6c16da).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14269
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14269
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62732/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14269
  
**[Test build #62732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62732/consoleFull)**
 for PR 14269 at commit 
[`889fe66`](https://github.com/apache/spark/commit/889fe66b849ab789878fa3dbfe17c8b0fb4681eb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14320: [SPARK-16416] [Core] force eager creation of loggers to ...

2016-07-22 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14320
  
This doesn't seem to be the change discussed in the JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14269
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14269
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62733/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashMapGener...

2016-07-22 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/14174
  
LGTM. cc @ericl


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14269: [SPARK-15703] [Scheduler][Core][WebUI] Make ListenerBus ...

2016-07-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14269
  
**[Test build #62733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62733/consoleFull)**
 for PR 14269 at commit 
[`889fe66`](https://github.com/apache/spark/commit/889fe66b849ab789878fa3dbfe17c8b0fb4681eb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avo...

2016-07-22 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/14292#discussion_r71924307
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -247,6 +248,46 @@ private[sql] trait SQLTestUtils
   }
 }
   }
+
+  /** Run a test on a separate [[UninterruptibleThread]]. */
+  protected def testWithUninterruptibleThread(name: String, quietly: 
Boolean = false)
+(body: => Unit): Unit = {
+val timeoutMillis = 1
+var ex: Throwable = null
+
+def runOnThread(): Unit = {
+  val thread = new UninterruptibleThread(s"Testing thread for test 
$name") {
+override def run(): Unit = {
+  try {
+body
+  } catch {
+case NonFatal(e) =>
+  ex = e
+  }
+}
+  }
+  thread.setDaemon(true)
+  thread.start()
+  thread.join(timeoutMillis)
+  if (thread.isAlive) {
+thread.interrupt()
+// If this interrupt does not work, then this thread is most 
likely running something that
+// is not interruptible. There is not much point to wait for the 
thread to termniate, and
+// we rather let the JVM terminate the thread on exit.
+fail(
+  s"Test '$name' running on o.a.s.util.UninterruptibleThread timed 
out after" +
+s" $timeoutMillis ms")
+  } else if (ex != null) {
+throw ex
+  }
+}
+
+if (quietly) {
--- End diff --

Its more scala-ish, but slightly non-intuitive to read. Maybe rename f to 
testingFunc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashM...

2016-07-22 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14174#discussion_r71924045
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions;
+
+import java.io.IOException;
+
+import org.apache.spark.memory.MemoryConsumer;
+import org.apache.spark.memory.TaskMemoryManager;
+import org.apache.spark.sql.types.*;
+import org.apache.spark.unsafe.memory.MemoryBlock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * RowBasedKeyValueBatch stores key value pairs in contiguous memory 
region.
+ *
+ * Each key or value is stored as a single UnsafeRow. Each record contains 
one key and one value
+ * and some auxiliary data, which differs based on implementation:
+ * i.e., `FixedLengthRowBasedKeyValueBatch` and 
`VariableLengthRowBasedKeyValueBatch`.
+ *
+ * We use `FixedLengthRowBasedKeyValueBatch` if all fiends in the key and 
the value are fixed-length
+ * data types. Otherwise we use `VariableLengthRowBasedKeyValueBatch`.
+ *
+ * RowBasedKeyValueBatch is backed by a single page / MemoryBlock 
(defaults to 64MB). If the page
+ * is full, the aggregate logic should fallback to a second level, larger 
hash map. We intentionally
+ * use the single-page design because it simplifies memory address 
encoding & decoding for each
+ * key-value pair. Because the maximum capacity for RowBasedKeyValueBatch 
is only 2^16, it is
+ * unlikely we need a second page anyway. Filling the page requires an 
average size for key value
+ * pairs to be larger than 1024 bytes.
+ *
+ */
+public abstract class RowBasedKeyValueBatch extends MemoryConsumer {
+  protected final Logger logger = 
LoggerFactory.getLogger(RowBasedKeyValueBatch.class);
+
+  protected static final int DEFAULT_CAPACITY = 1 << 16;
--- End diff --

This and `DEFAULT_PAGE_SIZE` can probably be private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14292: [SPARK-14131][SQL[STREAMING] Improved fix for avo...

2016-07-22 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/14292#discussion_r71923960
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala ---
@@ -247,6 +248,46 @@ private[sql] trait SQLTestUtils
   }
 }
   }
+
+  /** Run a test on a separate [[UninterruptibleThread]]. */
+  protected def testWithUninterruptibleThread(name: String, quietly: 
Boolean = false)
+(body: => Unit): Unit = {
+val timeoutMillis = 1
+var ex: Throwable = null
+
+def runOnThread(): Unit = {
+  val thread = new UninterruptibleThread(s"Testing thread for test 
$name") {
+override def run(): Unit = {
+  try {
+body
+  } catch {
+case NonFatal(e) =>
+  ex = e
--- End diff --

my bad. ex needs to be transient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14174: [SPARK-16524][SQL] Add RowBatch and RowBasedHashM...

2016-07-22 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/14174#discussion_r71923918
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatch.java
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.catalyst.expressions;
+
+import java.io.IOException;
+
+import org.apache.spark.memory.MemoryConsumer;
+import org.apache.spark.memory.TaskMemoryManager;
+import org.apache.spark.sql.types.*;
+import org.apache.spark.unsafe.memory.MemoryBlock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * RowBasedKeyValueBatch stores key value pairs in contiguous memory 
region.
+ *
+ * Each key or value is stored as a single UnsafeRow. Each record contains 
one key and one value
+ * and some auxiliary data, which differs based on implementation:
+ * i.e., `FixedLengthRowBasedKeyValueBatch` and 
`VariableLengthRowBasedKeyValueBatch`.
+ *
+ * We use `FixedLengthRowBasedKeyValueBatch` if all fiends in the key and 
the value are fixed-length
--- End diff --

nit: fields


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >