[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18347


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-21 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r123322259
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala 
---
@@ -60,5 +70,23 @@ class ConsoleSinkProvider extends StreamSinkProvider 
with DataSourceRegister {
 new ConsoleSink(parameters)
   }
 
+  def createRelation(
+  sqlContext: SQLContext,
+  mode: SaveMode,
+  parameters: Map[String, String],
+  data: DataFrame): BaseRelation = {
+// Number of rows to display, by default 20 rows
+val numRowsToShow = 
parameters.get("numRows").map(_.toInt).getOrElse(20)
+
+// Truncate the displayed data if it is too long, by default it is true
+val isTruncated = 
parameters.get("truncate").map(_.toBoolean).getOrElse(true)
+
+data.sparkSession.createDataFrame(
--- End diff --

You can just call `data.showInternal(numRowsToShow, isTruncated)`. This is 
a hack in ConsoleSink to avoid using a wrong planner. That's not a problem in 
the batch DataFrames.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-21 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r123321846
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala 
---
@@ -51,7 +53,15 @@ class ConsoleSink(options: Map[String, String]) extends 
Sink with Logging {
   }
 }
 
-class ConsoleSinkProvider extends StreamSinkProvider with 
DataSourceRegister {
+case class ConsoleRelation(Context: SQLContext, data: DataFrame) extends 
BaseRelation {
--- End diff --

nit: you can use
```
case class ConsoleRelation(override val sqlContext: SQLContext, data: 
DataFrame) extends BaseRelation {
  override def schema: StructType = data.schema
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-19 Thread lubozhan
Github user lubozhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122639236
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
+data.show(data.count().toInt, false)
--- End diff --

Sorry for late reply.
Yes, it is right to use underscore since dataSource is not used. 
Considering it is no need to create a new ConsoleSink and no access to the 
private variable, i will use caseInsensitiveOptions instead to extract the 
numRows and truncate, 
Thanks for your comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122617147
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
+data.show(data.count().toInt, false)
--- End diff --

`ConsoleSink`  [has two 
options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L27-L30)
 that could be used here -- `numRows` and `truncate`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/18347#discussion_r122616876
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -465,6 +465,8 @@ case class DataSource(
 providingClass.newInstance() match {
   case dataSource: CreatableRelationProvider =>
 SaveIntoDataSourceCommand(data, dataSource, 
caseInsensitiveOptions, mode)
+  case dataSource: ConsoleSinkProvider =>
--- End diff --

Underscore `dataSource` since it's not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...

2017-06-18 Thread lubozhan
GitHub user lubozhan opened a pull request:

https://github.com/apache/spark/pull/18347

[SPARK-20599][SS] ConsoleSink should work with (batch)

## What changes were proposed in this pull request?

Currently, if we read a batch and want to display it on the console sink, 
it will lead a runtime exception.

Changes:

- In this PR, we add a match rule to check whether it is a 
ConsoleSinkProvider, we will display the Dataset
 if using console format.

## How was this patch tested?

spark.read.schema().json(path).write.format("console").save


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lubozhan/spark dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18347


commit dfd81b22061ab9bcbe5f7b511b929de5d31b636a
Author: Lubo Zhang 
Date:   2017-06-15T07:01:31Z

support console for write batch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org