[GitHub] spark pull request #19095: [SPARK-21886][SQL] Use SparkSession.internalCreat...

2017-08-31 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/19095

[SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create…

… Dataset with LogicalRDD logical operator

## What changes were proposed in this pull request?

Reusing `SparkSession.internalCreateDataFrame` wherever possible (to cut 
dups)

## How was this patch tested?

Local build and waiting for Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-21886-internalCreateDataFrame

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19095.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19095


commit fff4b6131cfb2143af465d28ab3cde4fb6af313e
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-08-31T10:44:24Z

[SPARK-21886][SQL] Use SparkSession.internalCreateDataFrame to create 
Dataset with LogicalRDD logical operator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19089: [SPARK-21728][core] Follow up: fix user config, auth in ...

2017-08-30 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/19089
  
Logs are back with the change. 👍 Thanks (and don't mess it up again 
fixing STS :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...

2017-08-30 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19056#discussion_r135989439
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala 
---
@@ -130,16 +130,7 @@ class TextSocketSource(host: String, port: Int, 
includeTimestamp: Boolean, sqlCo
 
 val rdd = sqlContext.sparkContext.parallelize(rawList).map(
 v => InternalRow(UTF8String.fromString(v._1), v._2.getTime()))
--- End diff --

May I ask to replace `v` with `case...` as follows? IMHO That would make 
things easier to read.

```
val rdd = sqlContext.sparkContext.
  parallelize(rawList).
  map { case (v, ts) => InternalRow(UTF8String.fromString(v), 
ts.getTime) }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...

2017-08-28 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19056#discussion_r135610992
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala 
---
@@ -126,16 +128,17 @@ class TextSocketSource(host: String, port: Int, 
includeTimestamp: Boolean, sqlCo
   batches.slice(sliceStart, sliceEnd)
 }
 
-import sqlContext.implicits._
-val rawBatch = sqlContext.createDataset(rawList)
+val rdd = sqlContext.sparkContext.parallelize(rawList).map(
+v => InternalRow(UTF8String.fromString(v._1), v._2.getTime()))
+val rawBatch = sqlContext.internalCreateDataFrame(rdd, schema, 
isStreaming = true)
 
 // Underlying MemoryStream has schema (String, Timestamp); strip out 
the timestamp
 // if requested.
 if (includeTimestamp) {
   rawBatch.toDF("value", "timestamp")
--- End diff --

I think that the schema will already handle what fields are included in 
`rawBatch` and this `if` is no longer necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...

2017-08-28 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19056#discussion_r135610632
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala 
---
@@ -126,16 +128,17 @@ class TextSocketSource(host: String, port: Int, 
includeTimestamp: Boolean, sqlCo
   batches.slice(sliceStart, sliceEnd)
 }
 
-import sqlContext.implicits._
-val rawBatch = sqlContext.createDataset(rawList)
+val rdd = sqlContext.sparkContext.parallelize(rawList).map(
+v => InternalRow(UTF8String.fromString(v._1), v._2.getTime()))
+val rawBatch = sqlContext.internalCreateDataFrame(rdd, schema, 
isStreaming = true)
 
 // Underlying MemoryStream has schema (String, Timestamp); strip out 
the timestamp
 // if requested.
 if (includeTimestamp) {
   rawBatch.toDF("value", "timestamp")
 } else {
   // Strip out timestamp
-  rawBatch.select("_1").toDF("value")
+  rawBatch.select("value").toDF()
--- End diff --

`toDF` is unnecessary since it's already a DataFrame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19056: [SPARK-21765] Check that optimization doesn't aff...

2017-08-28 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/19056#discussion_r135610234
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -39,6 +39,16 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
 
   protected def fixedPoint = FixedPoint(SQLConf.get.optimizerMaxIterations)
 
+  override protected def checkInvariants(
+  result: LogicalPlan,
+  original: LogicalPlan,
+  rule: Rule[LogicalPlan]): Unit = {
+assert(
+  result.isStreaming == original.isStreaming,
+  s"Rule ${rule.ruleName} changed isStreaming from original 
${original.isStreaming}:" +
--- End diff --

Space before the closing `"`"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...

2017-07-20 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18642
  
@zsxwing @tdas Could you review the change and let me know what you think? 
I'd appreciate. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWit...

2017-07-17 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18642
  
@zsxwing @tdas Your friendly reminder to give the change a nice review. I'd 
appreciate. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18642: [MINOR][REFACTORING] KeyValueGroupedDataset.mapGr...

2017-07-15 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/18642

[MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWithState uses 
flatMapGroupsWithState

## What changes were proposed in this pull request?

Refactored `KeyValueGroupedDataset.mapGroupsWithState` to use 
`flatMapGroupsWithState` explicitly (so it's clear that the former is almost 
the latter).

## How was this patch tested?

local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark mapGroupsWithState

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18642


commit ce51466411983181b52cac07610d2bece2ffc5e8
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-07-15T09:31:55Z

[MINOR][REFACTORING] KeyValueGroupedDataset.mapGroupsWithState uses 
flatMapGroupsWithState




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18539: [SPARK-21313][SS] ConsoleSink's string representation

2017-07-06 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18539
  
If you know how to display `ForeachWriter` that's passed in to 
`ForeachSink` nicely, let me know. `getClass.getName` didn't convince me and so 
I left it out. It'd be very helpful to see what the implementation is though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...

2017-07-06 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18509#discussion_r125874046
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala
 ---
@@ -81,7 +81,7 @@ class EventTimeStatsAccum(protected var currentStats: 
EventTimeStats = EventTime
 case class EventTimeWatermarkExec(
 eventTime: Attribute,
 delay: CalendarInterval,
-child: SparkPlan) extends SparkPlan {
+child: SparkPlan) extends UnaryExecNode {
--- End diff --

I disagree that the simple change requires a JIRA since it's about making 
the code more readable, but here it comes 
anyway...https://issues.apache.org/jira/browse/SPARK-21329


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18539: [SPARK-21313][SS] ConsoleSink's string representation

2017-07-05 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18539
  
I think that `ConsoleSink` was the only one with this mysterious name. We 
could however have another JIRA to _somehow_ unify how options are printed out 
for sources and sinks. I don't think it should be a part of this (smallish) 
change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-05 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18523
  
Thanks @facaiy for the changes. I wonder if the code could `collect` all 
the columns with incorrect type in one go (rather than reporting issues column 
by column until a user fixed all).

Something along the lines:

```
val incorrectColumns = inputColNames
  .map { name => (name, schema(name).dataType) }
  .collect {
case (_, _: NumericType | BooleanType) =>
case t if t.isInstanceOf[VectorUDT] =>
case other => other }
// report issue with incorrectColumns
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18539: [SPARK-21313][SS] ConsoleSink's string representa...

2017-07-05 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/18539

[SPARK-21313][SS] ConsoleSink's string representation

## What changes were proposed in this pull request?

Add `toString` with options for `ConsoleSink` so it shows nicely in query 
progress.

**BEFORE**

```
  "sink" : {
"description" : 
"org.apache.spark.sql.execution.streaming.ConsoleSink@4b340441"
  }
```

**AFTER**

```
  "sink" : {
"description" : "ConsoleSink[numRows=10, truncate=false]"
  }
```

## How was this patch tested?

Local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-21313-ConsoleSink-toString

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18539


commit 39a132f58092b81675bf271977cb13ba6b5dc428
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-07-05T08:15:37Z

[SPARK-21313][SS] ConsoleSink's string representation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...

2017-07-05 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18509#discussion_r125571708
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala
 ---
@@ -81,7 +81,7 @@ class EventTimeStatsAccum(protected var currentStats: 
EventTimeStats = EventTime
 case class EventTimeWatermarkExec(
 eventTime: Attribute,
 delay: CalendarInterval,
-child: SparkPlan) extends SparkPlan {
+child: SparkPlan) extends UnaryExecNode {
--- End diff --

@zsxwing friendly ping...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-04 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18523#discussion_r125397518
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala ---
@@ -113,12 +113,12 @@ class VectorAssembler @Since("1.4.0") 
(@Since("1.4.0") override val uid: String)
   override def transformSchema(schema: StructType): StructType = {
 val inputColNames = $(inputCols)
 val outputColName = $(outputCol)
-val inputDataTypes = inputColNames.map(name => schema(name).dataType)
+val inputDataTypes = inputColNames.map(name => (name, 
schema(name).dataType))
 inputDataTypes.foreach {
-  case _: NumericType | BooleanType =>
-  case t if t.isInstanceOf[VectorUDT] =>
-  case other =>
-throw new IllegalArgumentException(s"Data type $other is not 
supported.")
+  case (_, _: NumericType | BooleanType) =>
+  case (_, t) if t.isInstanceOf[VectorUDT] =>
--- End diff --

I wonder why the line does not do `(_, t: VectorUDT)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...

2017-07-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18509#discussion_r125353689
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/EventTimeWatermarkExec.scala
 ---
@@ -81,7 +81,7 @@ class EventTimeStatsAccum(protected var currentStats: 
EventTimeStats = EventTime
 case class EventTimeWatermarkExec(
 eventTime: Attribute,
 delay: CalendarInterval,
-child: SparkPlan) extends SparkPlan {
+child: SparkPlan) extends UnaryExecNode {
--- End diff --

I don't see it used in the other physical operators like 
[HashAggregateExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L45)
 or 
[CoalesceExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L574)
 or 
[BroadcastExchangeExec](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L41)
 etc. Why would that matter here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18509: [SS][MINOR] Make EventTimeWatermarkExec explicitl...

2017-07-03 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/18509

[SS][MINOR] Make EventTimeWatermarkExec explicitly UnaryExecNode

## What changes were proposed in this pull request?

Making EventTimeWatermarkExec explicitly UnaryExecNode

/cc @tdas @zsxwing 

## How was this patch tested?

Local build.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
EventTimeWatermarkExec-UnaryExecNode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18509.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18509


commit 2cc9fdca381f50cb405bf5fbcaa7229652749a83
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-07-03T07:07:12Z

[SS][MINOR] Make EventTimeWatermarkExec explicitly UnaryExecNode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122617147
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
+data.show(data.count().toInt, false)
--- End diff --

`ConsoleSink`  [has two 
options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L27-L30)
 that could be used here -- `numRows` and `truncate`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122616876
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
--- End diff --

Underscore `dataSource` since it's not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

2017-05-30 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18144
  
@cloud-fan If consistency is to remove (not add) I'm fine. Either way 
consistency is the ultimate goal (as I myself am running into this discrepancy 
far too often).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)

2017-05-30 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18074#discussion_r119168394
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala 
---
@@ -153,12 +153,13 @@ case class WindowExec(
   }
 
   /**
-   * Collection containing an entry for each window frame to process. Each 
entry contains a frames'
-   * WindowExpressions and factory function for the WindowFrameFunction.
+   * Collection containing an entry for each window frame to process. Each 
entry contains a frame's
+   * [[WindowExpression]]s and factory function for the 
WindowFrameFunction.
*/
   private[this] lazy val windowFrameExpressionFactoryPairs = {
 type FrameKey = (String, FrameType, Option[Int], Option[Int])
 type ExpressionBuffer = mutable.Buffer[Expression]
+type WindowExpressionBuffer = mutable.Buffer[WindowExpression]
--- End diff --

Sorry. Reverted the change manually and looks like it slipped through the 
cracks. Apologies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18144: [SPARK-20912][SQL] Allow column name in map functions.

2017-05-30 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18144
  
@cloud-fan I don't understand why would that be an issue...ever. The API is 
not consistent and I often run into it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)

2017-05-29 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/18074
  
Hey @srowen could you review the changes again and accept possibly? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)

2017-05-26 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18074#discussion_r118788857
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExec.scala 
---
@@ -153,19 +153,24 @@ case class WindowExec(
   }
 
   /**
-   * Collection containing an entry for each window frame to process. Each 
entry contains a frames'
-   * WindowExpressions and factory function for the WindowFrameFunction.
+   * Collection containing an entry for each window frame to process. Each 
entry contains a frame's
+   * [[WindowExpression]]s and factory function for the 
WindowFrameFunction.
*/
-  private[this] lazy val windowFrameExpressionFactoryPairs = {
+  private[this] lazy val windowFrameExpressionFactoryPairs:
+Seq[(mutable.Buffer[WindowExpression], InternalRow => 
WindowFunctionFrame)] =
+  {
 type FrameKey = (String, FrameType, Option[Int], Option[Int])
 type ExpressionBuffer = mutable.Buffer[Expression]
-val framedFunctions = mutable.Map.empty[FrameKey, (ExpressionBuffer, 
ExpressionBuffer)]
+type WindowExpressionBuffer = mutable.Buffer[WindowExpression]
+val framedFunctions = mutable.Map.empty[FrameKey, 
(WindowExpressionBuffer, ExpressionBuffer)]
 
 // Add a function and its function to the map for a given frame.
-def collect(tpe: String, fr: SpecifiedWindowFrame, e: Expression, fn: 
Expression): Unit = {
+def collect(
+  tpe: String, fr: SpecifiedWindowFrame, e: WindowExpression, fn: 
Expression): Unit =
--- End diff --

I'll revert it to not "pollute" the PR. It was much easier to read the code 
when the internals were less generic (that does nothing at all).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15575: [SPARK-18038] [SQL] Move output partitioning definition ...

2017-05-26 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/15575
  
I'm late with this, but just leaving it for future code reviewers...

I think the change took the most extreme path where even such simple 
`outputPartitioning` as the one in `WindowExec` (see below) is currently part 
of the extension of `UnaryExecNode` while I think it should stay with 
`UnaryExecNode` and be overridden when the output partitioning is different. 
It's a kind of code duplication.

```
  override def outputPartitioning: Partitioning = child.outputPartitioning
```

The above serves nothing but says _"I simply don't care what happens 
upstream"_.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)

2017-05-25 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18074#discussion_r118436041
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -35,12 +35,13 @@ import org.apache.spark.sql.types.NumericType
 import org.apache.spark.sql.types.StructType
 
 /**
- * A set of methods for aggregations on a `DataFrame`, created by 
`Dataset.groupBy`.
+ * A set of methods for aggregations on a `DataFrame`, created by 
[[Dataset#groupBy groupBy]],
+ * [[Dataset#cube cube]] or [[Dataset#rollup rollup]] (and also [[pivot]]).
  *
- * The main method is the agg function, which has multiple variants. This 
class also contains
- * convenience some first order statistics such as mean, sum for 
convenience.
+ * The main method is the [[agg]] function, which has multiple variants. 
This class also contains
--- End diff --

No scaladoc. I was focused on javadoc and seems I missed it. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18074: [DOCS][MINOR] Scaladoc fixes (aka typo hunting)

2017-05-23 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/18074

[DOCS][MINOR] Scaladoc fixes (aka typo hunting)

## What changes were proposed in this pull request?

Minor changes to scaladoc

## How was this patch tested?

Local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark scaladoc-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18074.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18074


commit 5a7eac8dbda2fe3f2f9fb03be0548835e329014d
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-05-23T14:58:52Z

[DOCS][MINOR] Scaladoc fixes (aka typo hunting)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18026: [SPARK-16202][SQL][DOC] Follow-up to Correct The ...

2017-05-18 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/18026

[SPARK-16202][SQL][DOC] Follow-up to Correct The Description of 
CreatableRelationProvider's createRelation

## What changes were proposed in this pull request?

Follow-up to SPARK-16202:

1. Remove the duplication of the meaning of `SaveMode` (as one was in fact 
missing that had proven that the duplication may be incomplete in the future 
again)

2. Use standard scaladoc tags

/cc @gatorsmile @rxin @yhuai (as they were involved previously)

## How was this patch tested?

local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
CreatableRelationProvider-SPARK-16202

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18026.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18026


commit 7bf5d1962aa750079da313cd6d106bf8c616d448
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-05-18T07:56:28Z

[SPARK-16202][SQL][DOC] Follow-up to Correct The Description of 
CreatableRelationProvider's createRelation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17917: [SPARK-20600][SS] KafkaRelation should be pretty printed...

2017-05-11 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17917
  
https://cloud.githubusercontent.com/assets/62313/25960541/879096ce-3677-11e7-900f-09bd5f200a00.png;>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...

2017-05-11 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16960
  
I'll have a look at this this week and send a PR unless you beat me to it 
:) Thanks @ala!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17917: [SPARK-20600][SS] KafkaRelation should be pretty ...

2017-05-10 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17917#discussion_r115711771
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala
 ---
@@ -143,4 +143,6 @@ private[kafka010] class KafkaRelation(
 validateTopicPartitions(partitions, partitionOffsets)
 }
   }
+
+  override def toString: String = "kafka"
 }
--- End diff --

Hehe...been thinking about it too, but thought it'd be too much for such a 
simple change.

It merits a JIRA issue though where one had to find out what and how to 
display. After all, you could read from multiple topics with different Kafka 
brokers and offsets. I was worried it'd be too much to handle as part of this 
issue.

I'm up for working on the other issue, but don't think we should hold this 
one up. WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16960: [SPARK-19447] Make Range operator generate "recordsRead"...

2017-05-10 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16960
  
I think that the commit has left 
[numGeneratedRows](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L344)
 metrics off, hasn't it? (it was added in #16829)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17904: [SPARK-20630] [Web UI] Fixed column visibility in Execut...

2017-05-09 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17904
  
WFM. Thanks @ajbozarth!

```
$ git fetch origin pull/17904/head:17904
$ gco 17904
$ ./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests 
clean install
```

The following worked fine (http://localhost:4040/executors/):

```
$ ./bin/spark-shell
$ ./bin/spark-shell --conf spark.ui.threadDumpsEnabled=true
$ ./bin/spark-shell --conf spark.ui.threadDumpsEnabled=false
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17917: [SPARK-20600][SS] KafkaRelation should be pretty ...

2017-05-09 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17917

[SPARK-20600][SS] KafkaRelation should be pretty printed in web UI

## What changes were proposed in this pull request?

User-friendly name of `KafkaRelation` in web UI (under Details for Query).

### Before

https://cloud.githubusercontent.com/assets/62313/25841955/74479ac6-34a2-11e7-87fb-d9f62a1356a7.png;>

### After

https://cloud.githubusercontent.com/assets/62313/25841829/f5335630-34a1-11e7-85a4-afe9b66d73c8.png;>

## How was this patch tested?

Local build

```
./bin/spark-shell --jars 
~/.m2/repository/org/apache/spark/spark-sql-kafka-0-10_2.11/2.3.0-SNAPSHOT/spark-sql-kafka-0-10_2.11-2.3.0-SNAPSHOT.jar
 --packages org.apache.kafka:kafka-clients:0.10.0.1
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-20600-KafkaRelation-webUI

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17917


commit 2ffe4476553cfe50eb6392d8e573545a92fef737
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-05-09T08:20:49Z

[SPARK-20600][SQL] KafkaRelation should be pretty printed in web UI 
(Details for Query)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17727: [SQL][MINOR] Remove misleading comment (and tags ...

2017-05-06 Thread jaceklaskowski
Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/17727


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (an...

2017-04-28 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17801
  
Are the errors (that led to `fails to generate documentation`) after my 
change? Look very weird to me.

```
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117:
 error: ExecutorAllocationClient is not public in org.apache.spark; cannot be 
accessed from outside package
[error]   public   BlacklistTracker 
(org.apache.spark.scheduler.LiveListenerBus listenerBus, 
org.apache.spark.SparkConf conf, 
scala.Option allocationClient, 
org.apache.spark.util.Clock clock)  { throw new RuntimeException(); }
[error] 
   ^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118:
 error: ExecutorAllocationClient is not public in org.apache.spark; cannot be 
accessed from outside package
[error]   public   BlacklistTracker (org.apache.spark.SparkContext sc, 
scala.Option allocationClient)  { 
throw new RuntimeException(); }
[error] 
^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:133:
 error: ConfigReader is not public in org.apache.spark.internal.config; cannot 
be accessed from outside package
[error]   private  org.apache.spark.internal.config.ConfigReader reader ()  
{ throw new RuntimeException(); }
[error]^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:138:
 error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
be accessed from outside package
[error] org.apache.spark.SparkConf set 
(org.apache.spark.internal.config.ConfigEntry entry, T value)  { throw new 
RuntimeException(); }
[error] 
^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:139:
 error: OptionalConfigEntry is not public in org.apache.spark.internal.config; 
cannot be accessed from outside package
[error] org.apache.spark.SparkConf set 
(org.apache.spark.internal.config.OptionalConfigEntry entry, T value)  { 
throw new RuntimeException(); }
[error] 
^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:187:
 error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
be accessed from outside package
[error] org.apache.spark.SparkConf 
setIfMissing (org.apache.spark.internal.config.ConfigEntry entry, T value)  
{ throw new RuntimeException(); }
[error] 
 ^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:188:
 error: OptionalConfigEntry is not public in org.apache.spark.internal.config; 
cannot be accessed from outside package
[error] org.apache.spark.SparkConf 
setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry entry, T 
value)  { throw new RuntimeException(); }
[error] 
 ^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/core/target/java/org/apache/spark/SparkConf.java:208:
 error: ConfigEntry is not public in org.apache.spark.internal.config; cannot 
be accessed from outside package
[error] org.apache.spark.SparkConf remove 
(org.apache.spark.internal.config.ConfigEntry entry)  { throw new 
RuntimeException(); }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17801: [MINOR][SQL][DOCS] Improve unix_timestamp's scala...

2017-04-28 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17801

[MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (and typo hunting)

## What changes were proposed in this pull request?

* Docs are consistent (across different `unix_timestamp` variants and their 
internal expressions)
* typo hunting

## How was this patch tested?

local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark unix_timestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17801.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17801


commit 20edec80f943b0fb76d8bee26f47f8c19ca299be
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-04-28T08:17:53Z

[MINOR][SQL][DOCS] Improve unix_timestamp's scaladoc (and typo hunting)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17727: [SQL][MINOR] Remove misleading comment (and tags do bett...

2017-04-26 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17727
  
Is the comment correct then? I don't think so. What about improving it? I 
don't mind if we stop discussing it either. It's a tiny change after all (and 
don't want to drag it along and waste your time). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17727: [SQL][MINOR] Remove misleading comment (and tags do bett...

2017-04-22 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17727
  
Fair enough. Let's do it here. Quoting directly from the code:

> Converts a logical plan into zero or more SparkPlans.  This API is 
exposed for experimenting with the query planner and is not designed to be 
stable across spark releases.  Developers writing libraries should instead 
consider using the stable APIs provided in [[org.apache.spark.sql.sources]]

> Converts a logical plan into zero or more SparkPlans.

It does not. It's a type alias for something that may do so. And moreover 
it's copied from `SparkStrategy`. No need for repeating it here.

> This API is exposed for experimenting with the query planner and is not 
designed to be stable across spark releases.

That's the purpose of `@DeveloperApi` and `@InterfaceStability.Unstable` 
annotations so, yes, you're right, it's not misleading, but a repetition (that 
cannot be checked by the Scala compiler).

> Developers writing libraries should instead consider using the stable 
APIs provided in [[org.apache.spark.sql.sources]]

How does `org.apache.spark.sql.sources` help here?! `SparkStrategy` is 
about query planning not DataSource IO. They're at different levels and may 
interest different developers. Certainly misleading.

WDYT?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17727: [SQL][MINOR] Remove misleading comment (and tags ...

2017-04-22 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17727

[SQL][MINOR] Remove misleading comment (and tags do better)

## What changes were proposed in this pull request?

Misleading comment removed (and tags do a better job to express instability)

## How was this patch tested?

Local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark sql-package-object

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17727


commit 69b68925fd7f0465b060c5279159615c39a35176
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-04-22T09:03:11Z

[SQL][MINOR] Remove misleading comment (and tags do better)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112692885
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -47,20 +47,31 @@ case class UserDefinedFunction protected[sql] (
 dataType: DataType,
 inputTypes: Option[Seq[DataType]]) {
 
-  // Optionally used for printing UDF names in EXPLAIN
-  private var nameOption: Option[String] = None
+  protected def name: Option[String] = None
 
-  def withName(name: String): this.type = {
-this.nameOption = Option(name)
-this
-  }
+  // Optionally used for printing UDF names in EXPLAIN
--- End diff --

"Optionally"? I'd assume the old "UDF" is the default unless the name is 
given.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112692504
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -47,20 +47,31 @@ case class UserDefinedFunction protected[sql] (
 dataType: DataType,
 inputTypes: Option[Seq[DataType]]) {
 
-  // Optionally used for printing UDF names in EXPLAIN
-  private var nameOption: Option[String] = None
+  protected def name: Option[String] = None
 
-  def withName(name: String): this.type = {
-this.nameOption = Option(name)
-this
-  }
+  // Optionally used for printing UDF names in EXPLAIN
+  def withName(name: String): this.type =
+UserDefinedFunctionWithName(Option(name), f, dataType, inputTypes)
 
   /**
* Returns an expression that invokes the UDF, using the given arguments.
*
* @since 1.3.0
*/
   def apply(exprs: Column*): Column = {
-Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), nameOption))
+Column(ScalaUDF(f, dataType, exprs.map(_.expr), 
inputTypes.getOrElse(Nil), name))
   }
 }
+
+/**
+ * A user-defined function with a function name.
+ *
+ * @since 2.3.0
+ */
+@InterfaceStability.Stable
+case class UserDefinedFunctionWithName protected[sql] (
+override val name: Option[String],
--- End diff --

Can a `UserDefinedFunctionWithName` have **no** name? When?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112634952
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -256,10 +256,12 @@ class UDFSuite extends QueryTest with 
SharedSQLContext {
   val sparkPlan = spark.sessionState.executePlan(explain).executedPlan
   
sparkPlan.executeCollect().map(_.getString(0).trim).headOption.getOrElse("")
 }
-val udf1 = "myUdf1"
-val udf2 = "myUdf2"
-spark.udf.register(udf1, (n: Int) => { n + 1 })
-spark.udf.register(udf2, (n: Int) => { n * 1 })
-assert(explainStr(sql("SELECT 
myUdf1(myUdf2(1))")).contains(s"UDF:$udf1(UDF:$udf2(1))"))
+val udf1Name = "myUdf1"
+val udf2Name = "myUdf2"
+val udf1 = spark.udf.register(udf1Name, (n: Int) => { n + 1 })
+val udf2 = spark.udf.register(udf2Name, (n: Int) => { n * 1 })
+assert(explainStr(sql("SELECT 
myUdf1(myUdf2(1))")).contains(s"UDF:$udf1Name(UDF:$udf2Name(1))"))
+assert(explainStr(spark.range(1).select(udf1(udf2(functions.lit(1)
+  .contains(s"UDF:$udf1Name(UDF:$udf2Name(1))"))
--- End diff --

The goal of the change was to make sure that the names are the same for SQL 
and Dataset "modes". The test should check it (even though it does it using the 
above two tests the last one should rather check equality of SQL's and 
Dataset's outputs).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112634044
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -47,12 +47,20 @@ case class UserDefinedFunction protected[sql] (
 dataType: DataType,
 inputTypes: Option[Seq[DataType]]) {
 
+  // Optionally used for printing UDF names in EXPLAIN
+  private var nameOption: Option[String] = None
--- End diff --

Beside @rxin's comment, I'd also add `name` as an input parameter with 
default value `None`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17712#discussion_r112634273
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala ---
@@ -256,10 +256,12 @@ class UDFSuite extends QueryTest with 
SharedSQLContext {
   val sparkPlan = spark.sessionState.executePlan(explain).executedPlan
   
sparkPlan.executeCollect().map(_.getString(0).trim).headOption.getOrElse("")
 }
-val udf1 = "myUdf1"
-val udf2 = "myUdf2"
-spark.udf.register(udf1, (n: Int) => { n + 1 })
-spark.udf.register(udf2, (n: Int) => { n * 1 })
-assert(explainStr(sql("SELECT 
myUdf1(myUdf2(1))")).contains(s"UDF:$udf1(UDF:$udf2(1))"))
+val udf1Name = "myUdf1"
+val udf2Name = "myUdf2"
+val udf1 = spark.udf.register(udf1Name, (n: Int) => { n + 1 })
--- End diff --

Remove `{` and `}` (as they're not needed)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17670: [SPARK-20281][SQL] Print the identical Range parameters ...

2017-04-18 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17670
  
I think the change should rather be [here](ResolveTableValuedFunctions) 
where the built-in table-valued function `range` is resolved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17657: [TEST][MINOR] Replace repartitionBy with distribu...

2017-04-17 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17657

[TEST][MINOR] Replace repartitionBy with distribute in 
CollapseRepartitionSuite

## What changes were proposed in this pull request?

Replace non-existent `repartitionBy` with `distribute` in 
`CollapseRepartitionSuite`.

## How was this patch tested?

local build and `catalyst/testOnly *CollapseRepartitionSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark CollapseRepartitionSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17657.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17657


commit 427924ab99332f475ed0b46b261eb55eee560c4a
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-04-17T08:24:38Z

[TEST][MINOR] Replace repartitionBy with distribute in 
CollapseRepartitionSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [DOCS] Docs-only improvements

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r10842
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -60,7 +60,7 @@ import org.apache.spark.util.Utils
  * The builder can also be used to create a new session:
  *
  * {{{
- *   SparkSession.builder()
+ *   SparkSession.builder
--- End diff --

Consistency (and one of the recommended coding styles of mine).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [DOCS] Docs-only improvements

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108777513
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala
 ---
@@ -26,7 +26,8 @@ import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval
 
 /**
- * Test basic expression parsing. If a type of expression is supported it 
should be tested here.
+ * Test basic expression parsing.
+ * If the type of an expression is supported it should be tested here.
--- End diff --

Almost. I replaced `a` with `the` and added `an` before `expression`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [DOCS] Docs-only improvements

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108777037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -75,7 +75,6 @@ case class WindowSpecDefinition(
   frameSpecification.isInstanceOf[SpecifiedWindowFrame]
 
   override def nullable: Boolean = true
-  override def foldable: Boolean = false
--- End diff --

Correct. Reverting...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [DOCS] Docs-only improvements

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108776915
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 ---
@@ -52,16 +52,15 @@
  * This class implements sort-based shuffle's hash-style shuffle fallback 
path. This write path
  * writes incoming records to separate files, one file per reduce 
partition, then concatenates these
  * per-partition files to form a single output file, regions of which are 
served to reducers.
- * Records are not buffered in memory. This is essentially identical to
- * {@link org.apache.spark.shuffle.hash.HashShuffleWriter}, except that it 
writes output in a format
+ * Records are not buffered in memory. It writes output in a format
--- End diff --

`HashShuffleWriter` is long gone.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17417: [DOCS] Docs-only improvements

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17417
  
Executed `cd docs && SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve`  to check 
the changes and they've seemed fine. I had to fix some extra javadoc-related 
places to please jekyll.

@srowen Ready to review the changes once more? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17434#discussion_r108665073
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -492,7 +492,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   }
 
   /**
-   * Add an [[Aggregate]] to a logical plan.
+   * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan.
--- End diff --

The package `org.apache.spark.sql.catalyst.parser` and hence `AstBuilder` 
don't show in javadoc/scaladoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/17434


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17434
  
Closing as it was merged into https://github.com/apache/spark/pull/17417.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17434#discussion_r108634808
  
--- Diff: core/src/main/java/org/apache/spark/memory/MemoryConsumer.java ---
@@ -60,8 +60,6 @@ protected long getUsed() {
 
   /**
* Force spill during building.
-   *
--- End diff --

It's not "for testing only". I'd even say that it's more often used in a 
non-test code than test code. That made the comment no longer correct. See 
`ShuffleExternalSorter` and `UnsafeShuffleWriter` which both are 
`ShuffleWriter` instances.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17434#discussion_r108633245
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -323,7 +323,7 @@ class SparkSession private(
*  // |-- age: integer (nullable = true)
*
*  dataFrame.createOrReplaceTempView("people")
-   *  sparkSession.sql("select name from people").collect.foreach(println)
+   *  sparkSession.sql("select name from people").show
--- End diff --

I agree. Just for the record, `collect.foreach(println)` should not be 
"endorsed" in Spark 2's official docs as too RDD-ish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17434#discussion_r108632961
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -492,7 +492,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   }
 
   /**
-   * Add an [[Aggregate]] to a logical plan.
+   * Add an [[Aggregate]] or [[GroupingSets]] to a logical plan.
--- End diff --

Didn't check the javadoc and the change follows `Aggregate`. I'll see how 
to generate javadoc and check this out (as I've been meaning to do it anyway 
for some time). Thanks for inspiration!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [MINOR] Typo fixes

2017-03-29 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17434#discussion_r108632590
  
--- Diff: 
core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala ---
@@ -82,13 +82,13 @@ private[spark] class SortShuffleManager(conf: 
SparkConf) extends ShuffleManager
   override val shuffleBlockResolver = new IndexShuffleBlockResolver(conf)
 
   /**
-   * Register a shuffle with the manager and obtain a handle for it to 
pass to tasks.
+   * Obtains a [[ShuffleHandle]] to pass to tasks.
--- End diff --

It's a copy from the main method this one overrides. Since the method does 
**not** do what the scaladoc said I thought I'd make it current. It might've 
"Register(ed) a shuffle with the manager" in the past but not today. It was 
misleading.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17417: [SQL][DOC] Use recommended values for row boundaries in ...

2017-03-28 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17417
  
I'm going to merge the two PRs with your comments applied (i.e. excluding 
changes that are not necessarily doc-only). Thanks a lot for your time, Sean. 
Appreciate a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17434: [MINOR] Typo fixes

2017-03-27 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17434
  
You'd asked I delivered @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17417: [SQL][DOC] Use recommended values for row boundaries in ...

2017-03-27 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17417
  
Hey @srowen Would appreciate your looking at the changes again and comments 
(or merge). Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17434: [SQL][DOC][MINOR] Squashing a typo in from_json f...

2017-03-26 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17434

[SQL][DOC][MINOR] Squashing a typo in from_json function

## What changes were proposed in this pull request?

Just squashing a typo in `from_json` function

## How was this patch tested?

local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark from_json-typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17434


commit 3b385039bf88ec32321c4e888cc17180a37b279e
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-03-26T10:13:09Z

[SQL][DOC][MINOR] Squashing a typo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...

2017-03-25 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108035549
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala ---
@@ -113,12 +113,12 @@ object Window {
* Creates a [[WindowSpec]] with the frame boundaries defined,
* from `start` (inclusive) to `end` (inclusive).
*
-   * Both `start` and `end` are relative positions from the current row. 
For example, "0" means
+   * Both `start` and `end` are relative positions to the current row. For 
example, "0" means
* "current row", while "-1" means the row before the current row, and 
"5" means the fifth row
* after the current row.
*
-   * We recommend users use `Window.unboundedPreceding`, 
`Window.unboundedFollowing`,
-   * and `Window.currentRow` to specify special boundary values, rather 
than using integral
+   * We recommend users to use [[Window.unboundedPreceding]], 
[[Window.unboundedFollowing]],
--- End diff --

Leaving 'that' aside is incorrect -- see 
http://dictionary.cambridge.org/dictionary/english/recommend where *to* is even 
highlighted to make the point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...

2017-03-25 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108035498
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala ---
@@ -131,9 +131,9 @@ object Window {
*   import org.apache.spark.sql.expressions.Window
*   val df = Seq((1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, 
"b"))
* .toDF("id", "category")
-   *   df.withColumn("sum",
-   *   sum('id) over 
Window.partitionBy('category).orderBy('id).rowsBetween(0,1))
-   * .show()
+   *   val byCategoryOrderedById =
+   * 
Window.partitionBy('category).orderBy('id).rowsBetween(Window.currentRow, 1)
--- End diff --

See the above change where Spark devs "recommend that users use" the values 
by their aliases not their numeric values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...

2017-03-25 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108035475
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala ---
@@ -113,12 +113,12 @@ object Window {
* Creates a [[WindowSpec]] with the frame boundaries defined,
* from `start` (inclusive) to `end` (inclusive).
*
-   * Both `start` and `end` are relative positions from the current row. 
For example, "0" means
+   * Both `start` and `end` are relative positions to the current row. For 
example, "0" means
--- End diff --

I'll fix it to be more accurate (that's the purpose of this particular 
change so the more accurate the merrier). Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...

2017-03-25 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17417#discussion_r108035464
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala ---
@@ -22,7 +22,7 @@ import org.apache.spark.sql.Column
 import org.apache.spark.sql.catalyst.expressions._
 
 /**
- * Utility functions for defining window in DataFrames.
+ * Utility functions for defining window in Datasets.
--- End diff --

Sure. I'm going to revert the changes. There's a little value in them. I'd 
rather see the changes approved in general than fight for DataFrame vs Dataset.

p.s. There's no `DataFrame` in Spark SQL which is just a type alias of 
`Dataset[Row]` -- see 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L46.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17417: [SQL][DOC] Use recommended values for row boundar...

2017-03-24 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17417

[SQL][DOC] Use recommended values for row boundaries in Window's scal…

…adoc

## What changes were proposed in this pull request?

Use recommended values for row boundaries in Window's scaladoc, i.e. 
`Window.unboundedPreceding`, `Window.unboundedFollowing`, and 
`Window.currentRow` (that were introduced in 2.1.0).

## How was this patch tested?

Local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
window-expression-scaladoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17417


commit efc420bfc0cb19bed5adec919587cc20182293fe
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-03-24T21:43:23Z

[SQL][DOC] Use recommended values for row boundaries in Window's scaladoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17409: [SQL][MINOR] Fix for typo in Analyzer

2017-03-24 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17409

[SQL][MINOR] Fix for typo in Analyzer

## What changes were proposed in this pull request?

Fix for typo in Analyzer

## How was this patch tested?

local build


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark analyzer-typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17409


commit cb28134c911dcaebeb8e2b168fa133f6bbb8ae3f
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-03-24T13:57:08Z

[SQL][MINOR] Fix for typo in Analyzer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17337: [SQL][MINOR] Fix scaladoc for UDFRegistration

2017-03-17 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17337

[SQL][MINOR] Fix scaladoc for UDFRegistration

## What changes were proposed in this pull request?

Fix scaladoc for UDFRegistration

## How was this patch tested?

local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark udfregistration-scaladoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17337.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17337


commit 4480e72cb789212c2cc86cea50c61c20d3faa2d1
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-03-18T01:15:50Z

[SQL][MINOR] Fix scaladoc for UDFRegistration




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2017-03-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r104206250
  
--- Diff: kubernetes/pom.xml ---
@@ -0,0 +1,54 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.1.0-SNAPSHOT
+../pom.xml
+  
+
+  spark-kubernetes_2.11
--- End diff --

`${scala.binary.version}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2017-03-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r104206163
  
--- Diff: kubernetes/pom.xml ---
@@ -0,0 +1,54 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.1.0-SNAPSHOT
--- End diff --

`2.1.0-SNAPSHOT`???


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16061: [SPARK-18278] [Scheduler] Support native submissi...

2017-03-03 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16061#discussion_r104205883
  
--- Diff: kubernetes/README.md ---
@@ -0,0 +1,21 @@
+# Pre-requisites
+* maven, JDK and all other pre-requisites for building Spark.
+
+# Steps to compile
+
+* Clone the fork of spark: https://github.com/foxish/spark/ and switch to 
the k8s-support branch.
+* Build the project
+* ./build/mvn -Pkubernetes -Phadoop-2.4 -Dhadoop.version=2.4.0 
-DskipTests package
--- End diff --

I think `hadoop-2.4` is gone in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17042: [CORE][MINOR] Fix scaladoc

2017-02-24 Thread jaceklaskowski
Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/17042


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17042: [CORE][MINOR] Fix scaladoc

2017-02-24 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/17042
  
Makes sense. Thanks @srowen!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17042: [CORE][MINOR] Fix scaladoc

2017-02-23 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/17042

[CORE][MINOR] Fix scaladoc

## What changes were proposed in this pull request?

Minor change to scaladoc of `HeartbeatReceiver` (the method is certainly 
not for tests only)

## How was this patch tested?

local build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
heartbeatreceiver-remove-only-for-test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17042.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17042


commit b34a8295c1559f77237af66069135b90139c460e
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-02-23T21:09:01Z

[CORE][MINOR] Fix scaladoc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16812: [SPARK-19465][SQL] Added options for custom boole...

2017-02-07 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16812#discussion_r99935530
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -139,6 +140,20 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 assert(result.schema === expectedSchema)
   }
 
+  test("test inferring booleans with custom values") {
+val result = spark.read
+  .format("csv")
+  .option("header", "true")
--- End diff --

Could you use `true` (boolean literal) instead to promote the more 
type-safe approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16812: [SPARK-19465][SQL] Added options for custom boole...

2017-02-07 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16812#discussion_r99935349
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala
 ---
@@ -73,6 +73,12 @@ class CSVInferSchemaSuite extends SparkFunSuite {
 assert(CSVInferSchema.inferField(LongType, "2015-08 14:49:00", 
options) == StringType)
   }
 
+  test("Boolean field types are inferred correctly from custom value") {
+val options = new CSVOptions(Map("trueValue" -> "yes"))
+assert(CSVInferSchema.inferField(BooleanType, "yES", options) == 
BooleanType)
--- End diff --

What's the reason for `==` not `===` as it's in the other asserts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16812: [SPARK-19465][SQL] Added options for custom boole...

2017-02-07 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16812#discussion_r99934599
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -110,7 +110,11 @@ private[csv] class UnivocityParser(
   }
 
 case _: BooleanType => (d: String) =>
-  nullSafeDatum(d, name, nullable, options)(_.toBoolean)
+  nullSafeDatum(d, name, nullable, options) { datum =>
+if (datum.equalsIgnoreCase(options.trueValue)) true
--- End diff --

Is there any reason why you compare `datum` to `options.trueValue` and not 
vice versa as it's in `CSVInferSchema.scala` above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16550: [SPARK-19178][SQL] convert string of large number...

2017-01-12 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16550#discussion_r95871812
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -835,6 +835,187 @@ public UTF8String translate(Map<Character, Character> 
dict) {
 return fromString(sb.toString());
   }
 
+  private int getDigit(byte b) {
+if (b >= '0' && b <= '9') {
+  return b - '0';
+}
+throw new NumberFormatException(toString());
+  }
+
+  /**
+   * Parses this UTF8String to long.
+   *
+   * Note that, in this method we accumulate the result in negative 
format, and convert it to
+   * positive format at the end, if this string is not started with '-'. 
This is because min value
+   * is bigger than max value in digits, e.g. Integer.MAX_VALUE is 
'2147483647' and
+   * Integer.MIN_VALUE is '-2147483648'.
+   *
+   * These codes are mostly copied from LazyLong.parseLong in Hive.
+   */
+  public long toLong() {
+if (numBytes == 0) {
+  throw new NumberFormatException("Empty string!");
--- End diff --

Could you remove `!` from the exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16481: [SPARK-19092] [SQL] Save() API of DataFrameWriter...

2017-01-07 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16481#discussion_r95065596
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -494,8 +500,13 @@ case class DataSource(
 catalogTable = catalogTable,
 fileIndex = fileIndex)
 sparkSession.sessionState.executePlan(plan).toRdd
-// Replace the schema with that of the DataFrame we just wrote out 
to avoid re-inferring it.
-copy(userSpecifiedSchema = 
Some(data.schema.asNullable)).resolveRelation()
+if (isForWriteOnly) {
+  // Exit earlier and return null
--- End diff --

I'd remove "and return null"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHand...

2017-01-07 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16492#discussion_r95065464
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala ---
@@ -235,7 +235,10 @@ trait StreamTest extends QueryTest with 
SharedSQLContext with Timeouts {
*/
   def testStream(
   _stream: Dataset[_],
-  outputMode: OutputMode = OutputMode.Append)(actions: StreamAction*): 
Unit = {
+  outputMode: OutputMode = OutputMode.Append)(actions: StreamAction*): 
Unit = synchronized {
+// `synchronized` is added to prevent the user from calling 
`testStream` concurrently because
--- End diff --

Can you elaborate why `StreamingQueryListener` can be used concurrently? 
I'm reading the comment: "it does not work because something else does not 
work". Why? What's wrong with `StreamingQueryListener`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHand...

2017-01-07 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/16492#discussion_r95065415
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -238,7 +238,7 @@ class StreamSuite extends StreamTest {
 }
   }
 
-  testQuietly("fatal errors from a source should be sent to the user") {
+  testQuietly("handle fatal errors thrown from the stream thread 
correctly") {
--- End diff --

`correctly` is superfluous and could be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...

2017-01-06 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16475
  
Closed as per @rxin's request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16475: [MINOR][CORE] Remove code duplication (so the int...

2017-01-06 Thread jaceklaskowski
Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/16475


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...

2017-01-05 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16475
  
Partially agree @srowen. The reason for the change was `blockId.isShuffle` 
condition that both methods use to do their shuffle-specific handling. The 
change might not be the most correct one but I'm sure something's wrong here, 
but can't nail it and hence the change to review this part of shuffle system.

> "is a distant problem waiting to happen"

I was concerned about that too, but that's what the tests are supposed to 
cover. If there are few missing, they should be added, shouldn't they?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16475: [MINOR][CORE] Remove code duplication (so the interface ...

2017-01-05 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16475
  
Proposed the changes since it made easier to understand the role of 
`getBlockData` vs `getLocalBytes` and in the end `ShuffleBlockResolver`. I'm 
not saying it should be accepted, but I'd like to hear your take on why it 
should not. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16475: [MINOR][CORE] Remove code duplication (so the int...

2017-01-05 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/16475

[MINOR][CORE] Remove code duplication (so the interface is used instead)

## What changes were proposed in this pull request?

Removed code duplication and used the interface instead.

/cc @JoshRosen @rxin

## How was this patch tested?

Local build. Waiting for OK from Jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
blockmanager-code-deduplication

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16475


commit 0c181ef686243c8c4c8138160fd8bad6fbc37953
Author: Jacek Laskowski <ja...@japila.pl>
Date:   2017-01-05T09:26:51Z

[MINOR][CORE] Remove code duplication (so the interface is used)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [WIP][SPARK-18896][TESTS] Update to ScalaTest 3.0.1

2016-12-25 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
For reference: [scala-xml 
releases](https://github.com/scala/scala-xml/releases)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-18 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
The reason for the update to `scala-xml_2.11-1.0.5.jar` was that once I 
updated ScalaTest I got the issue from Jenkins that the dependency list 
changed. That's when I was told about `./dev/test-dependencies.sh 
--replace-manifest` (see 
https://github.com/apache/spark/pull/16309#issuecomment-267585832 and 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70249/console).

You may be right that updating the jar could have impacted the XML issue 
Jenkins's reporting. Let me see if I can _somehow_ relate them by:

1. Reviewing the changes between `scala-xml_2.11-1.0.2.jar` and 
`scala-xml_2.11-1.0.5.jar`
2. Looking into the XML not generated issue in the recent builds.

Thanks for the hint!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-17 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
The tests ran locally on my laptop have finished after...`7431 s` which is 
2 hours (!)

```
[error] (sql/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] (streaming-kafka-0-10/test:test) sbt.TestsFailedException: Tests 
unsuccessful
[error] Total time: 7431 s, completed Dec 17, 2016 2:56:59 PM
[error] Not a valid command: Pkinesis-asl (similar: itunes-pause)
[error] Not a valid key: Pkinesis-asl
[error] Pkinesis-asl
[error] ^
```

Not sure why I'm getting `Not a valid key: Pkinesis-asl` -- it looks like 
the profile might not have been picked up correctly (?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-17 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
@srowen Please help as I'm stuck with the `OutOfMemoryError: GC overhead 
limit exceeded` error. Should Jenkins run the tests with 6g?

What's even more interesting is that the tests seem to have all passed 
according to the report in [Test build #70306 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70306/testReport/).

BTW, I'm running the tests on my laptop with the following settings `export 
SBT_OPTS="-Xmx6g -XX:ReservedCodeCacheSize=512m"` (as recommended).

```
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
scala.collection.TraversableLike$class.withFilter(TraversableLike.scala:655)
at 
scala.collection.AbstractTraversable.withFilter(Traversable.scala:104)
at 
scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder$scoping$2$.getMerged(GenASM.scala:2067)
at 
scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genLocalVariableTable$1(GenASM.scala:2112)
at 
scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genCode(GenASM.scala:2753)
at 
scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genMethod(GenASM.scala:1471)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-17 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Rebasing with master to trigger tests on Jenkins...(hoping this time they 
pass)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
I'm testing the changes locally with the following:

```
export SCALACTIC_FILL_FILE_PATHNAMES=yes
export SBT_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

sbt -Phadoop-2.3 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive 
test:package streaming-kafka-0-8-assembly/assembly 
streaming-flume-assembly/assembly

sbt -Phadoop-2.3 -Phive -Pyarn -Pmesos -Phive-thriftserver -Pkinesis-asl 
-Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest
 test
```

It's been running for the past one hour and looks fine so far. I faced some 
SQL UDF-related errors that didn't look like mine so had to re-test everything 
again. I however got no changes to push so...how can I re-trigger tests on 
Jenkins to see if it's something temporary? Is there some magic Jenkins trigger?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Hey @srowen, any idea about the following error? I'd appreciate any hints 
to help me fix it.

```
[info] - can use a custom recovery mode factory (57 milliseconds)
Exception in thread "Thread-89" [info] - master/worker web ui available 
(225 milliseconds)
[info] - master/worker web ui available with reverseProxy (336 milliseconds)
java.lang.OutOfMemoryError: GC overhead limit exceeded
[info] - basic scheduling - spread out (36 milliseconds)
[info] - basic scheduling - no spread out (34 milliseconds)
at 
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3054)
[info] - basic scheduling with more memory - spread out (33 milliseconds)
at 
java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2874)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1639)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at 
org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1$React.react(Framework.scala:809)
at 
org.scalatest.tools.Framework$ScalaTestRunner$Skeleton$1.run(Framework.scala:798)
at java.lang.Thread.run(Thread.java:745)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Think the latest test failures are somehow related to:

```
Please set the environment variable SCALACTIC_FILL_FILE_PATHNAMES to yes at 
compile time to enable this feature.
```

Exploring...

BTW, I'm using `sbt -Phadoop-2.3 -Phive -Pyarn -Pmesos -Phive-thriftserver 
-Pkinesis-asl test` locally and the tests are still being executed (for the 
past 40 mins or so) and green.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Learnt about `./dev/test-dependencies.sh --replace-manifest` just now. 
(Where's this all described?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Just learnt about `export SPARK_TESTING=1` to avoid some test failures.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16309: [SPARK-18896][TESTS] Suppress ScalaCheck warning

2016-12-16 Thread jaceklaskowski
Github user jaceklaskowski commented on the issue:

https://github.com/apache/spark/pull/16309
  
Had to introduce changes to the tests given "Expired deprecations" in 
[ScalaTest 3.0.0](http://www.scalatest.org/release_notes/3.0.0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   >