[GitHub] [spark] HyukjinKwon closed pull request #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
HyukjinKwon closed pull request #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
HyukjinKwon commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496375843
 
 
   There's virtually no diff:
   
   ```scala
   case class Person(name: String, age: Long)
   val df = spark.createDataFrame[A]("/tmp/csv")
   ```
   
   vs 
   
   ```scala
   case class Person(name: String, age: Long)
   spark.read.schema("name string, age long").csv("/tmp/csv").as[Person]
   ```
   
   and it's super confusing that `createDataFrame` takes CSV. how about JSON 
and other formats?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
HyukjinKwon commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496375267
 
 
   API itself is two lines. It's one liner or two liner - workaround is easy. I 
don't think we need this and I would like to avoid to introduce some other 
variants like this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24711: [SPARK-27859][SS] Use efficient sorting instead of `.sorted.reverse` sequence

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24711: [SPARK-27859][SS] Use efficient 
sorting instead of `.sorted.reverse` sequence
URL: https://github.com/apache/spark/pull/24711#issuecomment-496374107
 
 
   You're welcome, @wenxuanguan .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
dongjoon-hyun edited a comment on issue #24724: User friendly dataset, 
dataframe generation for csv datasources without explicit StructType 
definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496373474
 
 
   First of all, the followings are the most frequent use cases. (And, the 
recommended way.)
   1. HEADER and INFERSCHEMA
   ```scala
   scala> spark.read.option("header", true).option("inferSchema", 
true).csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
   ```
   
   2. USER-DEFINED SCHEMA or Hive MetaStore
   ```scala
   scala> case class Person(name: String, age: Long)
   scala> spark.read.schema("name string, age long").csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
   ```
   
   I believe the above two are more natural.
   
   Anyway, cc @HyukjinKwon and @MaxGekk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496373474
 
 
   First of all, the followings are the most frequent use cases.
   1. HEADER and INFERSCHEMA
   ```
   scala> spark.read.option("header", true).option("inferSchema", 
true).csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
   ```
   
   2. USER-DEFINED SCHEMA or Hive MetaStore
   ```
   scala> case class Person(name: String, age: Long)
   scala> spark.read.schema("name string, age long").csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
   ```
   
   I believe the above two are more natural.
   
   Anyway, cc @HyukjinKwon and @MaxGekk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
dongjoon-hyun edited a comment on issue #24724: User friendly dataset, 
dataframe generation for csv datasources without explicit StructType 
definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496373474
 
 
   First of all, the followings are the most frequent use cases.
   1. HEADER and INFERSCHEMA
   ```scala
   scala> spark.read.option("header", true).option("inferSchema", 
true).csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: int]
   ```
   
   2. USER-DEFINED SCHEMA or Hive MetaStore
   ```scala
   scala> case class Person(name: String, age: Long)
   scala> spark.read.schema("name string, age long").csv("/tmp/csv").as[Person]
   res0: org.apache.spark.sql.Dataset[Person] = [name: string, age: bigint]
   ```
   
   I believe the above two are more natural.
   
   Anyway, cc @HyukjinKwon and @MaxGekk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #24716: [SPARK-27848][R][BUILD] AppVeyor change to latest R version (3.6.0)

2019-05-27 Thread GitBox
HyukjinKwon closed pull request #24716: [SPARK-27848][R][BUILD] AppVeyor change 
to latest R version (3.6.0)
URL: https://github.com/apache/spark/pull/24716
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #24716: [SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.6.0)

2019-05-27 Thread GitBox
HyukjinKwon commented on issue #24716: [SPARK-25944][R][BUILD] AppVeyor change 
to latest R version (3.6.0)
URL: https://github.com/apache/spark/pull/24716#issuecomment-496372569
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wenxuanguan commented on issue #24711: [SPARK-27859][SS] Use efficient sorting instead of `.sorted.reverse` sequence

2019-05-27 Thread GitBox
wenxuanguan commented on issue #24711: [SPARK-27859][SS] Use efficient sorting 
instead of `.sorted.reverse` sequence
URL: https://github.com/apache/spark/pull/24711#issuecomment-496371722
 
 
   @srowen @dongjoon-hyun Thank you for review


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support 
merge schema for ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496369660
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24043: [SPARK-11412][SQL] Support 
merge schema for ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496369663
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105853/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge 
schema for ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496369660
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24043: [SPARK-11412][SQL] Support merge 
schema for ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496369663
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105853/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
SparkQA removed a comment on issue #24043: [SPARK-11412][SQL] Support merge 
schema for ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496340435
 
 
   **[Test build #105853 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105853/testReport)**
 for PR 24043 at commit 
[`7d833b0`](https://github.com/apache/spark/commit/7d833b0d37c3cb646810d723651e9ceaa96da1fb).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
SparkQA commented on issue #24043: [SPARK-11412][SQL] Support merge schema for 
ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496369350
 
 
   **[Test build #105853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105853/testReport)**
 for PR 24043 at commit 
[`7d833b0`](https://github.com/apache/spark/commit/7d833b0d37c3cb646810d723651e9ceaa96da1fb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24382: [SPARK-27330][SS] support task 
abort in foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-483678508
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-05-27 Thread GitBox
HeartSaVioR commented on issue #24382: [SPARK-27330][SS] support task abort in 
foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-496369272
 
 
   test this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] swapnilushinde edited a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
swapnilushinde edited a comment on issue #24724: User friendly dataset, 
dataframe generation for csv datasources without explicit StructType 
definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496367606
 
 
   Hi, @dongjoon-hyun Thanks for reply. Yes, I use this API sometimes as well. 
Passing schema as DDL string is one-liner but would require to define case 
class for Dataset creation anyways. So, creating dataset would require to 
define schema as both DDL string and case class. for instance, 
   ```
   case class A(id: Int, name: String, subject: String, marks: Int, result: 
Boolean)
   val df = spark.read.schema("id int, name string, subject string, marks int, 
result boolean").load("/tmp/csv")
   val ds = df.as[A]
   ```
   Above change would need to define schema just once with Product class and 
dataset/dataframes can be created easily.
   Furthermore, this API is in line with all other similar APIs of creating 
dataset/dataframe. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #24716: [SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.6.0)

2019-05-27 Thread GitBox
HyukjinKwon commented on issue #24716: [SPARK-25944][R][BUILD] AppVeyor change 
to latest R version (3.6.0)
URL: https://github.com/apache/spark/pull/24716#issuecomment-496368610
 
 
   Oops, thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] swapnilushinde commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
swapnilushinde commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496367606
 
 
   Hi, @dongjoon-hyun Thanks for reply. Yes, I use this API sometimes as well. 
Passing schema as DDL string is one-liner but would require to define case 
class for Dataset creation anyways. So, creating dataset would require to 
define schema as both DDL string and case class. Above change would need to 
define schema just once with Product class and dataset/dataframes can be 
created easily.
   Furthermore, this API is in line with all other similar APIs of creating 
dataset/dataframe. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24716: [SPARK-25944][R][BUILD] AppVeyor change to latest R version (3.6.0)

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24716: [SPARK-25944][R][BUILD] AppVeyor 
change to latest R version (3.6.0)
URL: https://github.com/apache/spark/pull/24716#issuecomment-496366021
 
 
   BTW, @HyukjinKwon . Could you fix the PR description?
   > R 3.5.1 is released 2019-04-26. 
   
   It seems to be a typo of `3.6.0` because
   - R version 3.6.0 (Planting of a Tree) has been released on 2019-04-26.
   - R version 3.5.3 (Great Truth) has been released on 2019-03-11.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496365550
 
 
   Hi, @swapnilushinde . Thank you for making a PR, but do you the following? 
It's one-liner.
   ```scala
   scala> spark.version
   res0: String = 2.4.3
   
   scala> spark.read.schema("id int, name string, subject string, marks int, 
result boolean").load("/tmp/csv").printSchema
   root
|-- id: integer (nullable = true)
|-- name: string (nullable = true)
|-- subject: string (nullable = true)
|-- marks: integer (nullable = true)
|-- result: boolean (nullable = true)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #24711: [SPARK-27859][SS] Use efficient sorting instead of `.sorted.reverse` sequence

2019-05-27 Thread GitBox
dongjoon-hyun closed pull request #24711: [SPARK-27859][SS] Use efficient 
sorting instead of `.sorted.reverse` sequence
URL: https://github.com/apache/spark/pull/24711
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24711: [SPARK-27859][SS] Use efficient sorting instead of `.sorted.reverse` sequence

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24711: [SPARK-27859][SS] Use efficient 
sorting instead of `.sorted.reverse` sequence
URL: https://github.com/apache/spark/pull/24711#issuecomment-496364119
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24711: [Minor][SS] Use efficient sorting instead of `.sorted.reverse` sequence

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24711: [Minor][SS] Use efficient sorting 
instead of `.sorted.reverse` sequence
URL: https://github.com/apache/spark/pull/24711#issuecomment-496363920
 
 
   I'll create for you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24711: [Minor][SS]avoid inefficient sort when getLatest in HDFSMetadataLog

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24711: [Minor][SS]avoid inefficient sort when 
getLatest in HDFSMetadataLog
URL: https://github.com/apache/spark/pull/24711#issuecomment-496363597
 
 
   Also, please update PR title and description. You didn't include the changes 
in `streaming/ui/BatchPage.scala`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24711: [Minor][SS]avoid inefficient sort when getLatest in HDFSMetadataLog

2019-05-27 Thread GitBox
dongjoon-hyun edited a comment on issue #24711: [Minor][SS]avoid inefficient 
sort when getLatest in HDFSMetadataLog
URL: https://github.com/apache/spark/pull/24711#issuecomment-496362843
 
 
   Thank you for pinging me, @wenxuanguan . Please make a JIRA issue and use 
the ID in the PR title. This is trivial but worth for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24711: [Minor][SS]avoid inefficient sort when getLatest in HDFSMetadataLog

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24711: [Minor][SS]avoid inefficient sort when 
getLatest in HDFSMetadataLog
URL: https://github.com/apache/spark/pull/24711#issuecomment-496362843
 
 
   Thank you for pinging me, @wenxuanguan . Please make a JIRA issue and use 
the ID. This is trivial but worth for it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24724: User friendly dataset, 
dataframe generation for csv datasources without explicit StructType 
definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496360880
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496361177
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24724: User friendly dataset, 
dataframe generation for csv datasources without explicit StructType 
definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496360804
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24671: 
[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and 
spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-496360724
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105854/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24671: 
[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and 
spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-496360721
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496360880
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wenxuanguan commented on issue #24711: [Minor][SS]avoid inefficient sort when getLatest in HDFSMetadataLog

2019-05-27 Thread GitBox
wenxuanguan commented on issue #24711: [Minor][SS]avoid inefficient sort when 
getLatest in HDFSMetadataLog
URL: https://github.com/apache/spark/pull/24711#issuecomment-496360902
 
 
   @dongjoon-hyun @HyukjinKwon  Can you please have a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs 
about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-496360721
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24724: User friendly dataset, dataframe 
generation for csv datasources without explicit StructType definitions.
URL: https://github.com/apache/spark/pull/24724#issuecomment-496360804
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs 
about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-496360724
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105854/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] swapnilushinde opened a new pull request #24724: User friendly dataset, dataframe generation for csv datasources without explicit StructType definitions.

2019-05-27 Thread GitBox
swapnilushinde opened a new pull request #24724: User friendly dataset, 
dataframe generation for csv datasources without explicit StructType 
definitions.
URL: https://github.com/apache/spark/pull/24724
 
 
   ## What changes were proposed in this pull request?
   Many users frequently load structured data from csv datasources. It's is 
very common with current APIs to load csv as Dataframe where schema needs to be 
defined as StructType object. Many users then convert Dataframe to Dataset with 
objects of Product (case classes).
   Loading CSV files becomes relatively complex which can be easily simplified. 
This change would help to work with csv files more user friendly.
   
   **Input -** 
   ```
   csv file with five columns - {id: Int,
name: String,
subject: String,
marks: Int,
result: Boolean}
   ```
   **Current approach -**
   ```
   val schema = StructType(StructField(id,IntegerType,false),
   StructField(name,StringType,false),
   StructField(subject,StringType,false),
   StructField(marks,IntegerType,false),
   StructField(result,Booleanype,false))
   
   val df = spark.read.schema(schema).csv()
   case class A(id: Int, name: String, subject: String, marks: Int, result: 
Boolean) 
   val ds = df.as[A]
   ```
   
   **Proposed change -**
   ```
   case class A (id: Int, name: String, subject: String, marks: Int, result: 
Boolean) 
   val df = spark.createDataframe[A](optionsMap, )
   val ds = spark.createDataset[A](optionsMap, )
   ```
   
   - No explicit schema definition with StructType is needed as it can be 
resolved by Product classes.
   - Redundant codebase in applications to define verbose structType can be 
avoided with this change.
   - Proposed APIs are similar to current APIs so easy to use. All current and 
future csv options can be used as is with no changes needed. (exception - 
inferSchema is internally disabled as it's useless/confusing with this api)
   - Similar to current createDataset/createDataframe APIs, it would make 
loading csv files for debug purpose more convenient.
   
   
   
   ## How was this patch tested?
   This change is manually tested. I didnt see similar 
createDataset/createDataframe unit test cases. Please let me know best place to 
add unit test cases for this and existing similar APIs.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-27 Thread GitBox
SparkQA commented on issue #24671: [SPARK-27811][Core][Docs]Improve docs about 
spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-496360449
 
 
   **[Test build #105854 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105854/testReport)**
 for PR 24671 at commit 
[`1f31fc6`](https://github.com/apache/spark/commit/1f31fc6aac694889f1b1450be4f30773deb51ad5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.

2019-05-27 Thread GitBox
SparkQA removed a comment on issue #24671: [SPARK-27811][Core][Docs]Improve 
docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.
URL: https://github.com/apache/spark/pull/24671#issuecomment-496341756
 
 
   **[Test build #105854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105854/testReport)**
 for PR 24671 at commit 
[`1f31fc6`](https://github.com/apache/spark/commit/1f31fc6aac694889f1b1450be4f30773deb51ad5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support 
INTERVAL ... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496360440
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24472: [SPARK-27578][SQL] Support 
INTERVAL ... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496360443
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105852/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL 
... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496360440
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL 
... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496360443
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105852/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
SparkQA removed a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL 
... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496340414
 
 
   **[Test build #105852 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105852/testReport)**
 for PR 24472 at commit 
[`29fcc08`](https://github.com/apache/spark/commit/29fcc087fbd10ce4188f228c7ccf11337912f225).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
SparkQA commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR 
TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496360207
 
 
   **[Test build #105852 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105852/testReport)**
 for PR 24472 at commit 
[`29fcc08`](https://github.com/apache/spark/commit/29fcc087fbd10ce4188f228c7ccf11337912f225).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
dongjoon-hyun edited a comment on issue #24472: [SPARK-27578][SQL] Support 
INTERVAL ... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496358371
 
 
   Hi, @gatorsmile and @cloud-fan .
   
   Could you give us some directional advice, please?
   - First, this PR wants to support `INTERVAL ... HOUR TO SECOND` like 
`INTERVAL ... DAY TO SECOND` like Presto/Terradata. It looks reasonable to me, 
too.
   - Second, originally, this PR added a new pattern and new function (which is 
similar to the existing one). To avoid maintaining two similar functions, I 
recommended to extend the existing pattern and handling `DAY` and `HOUR` with 
the same function. To sum up, we will support 2~4 additionally.
   1. SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND (Current Spark)
   1. SELECT INTERVAL '23:59:59.155' HOUR TO SECOND
   1. SELECT INTERVAL '23:59:59.155' DAY TO SECOND
   1. SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND
   
   If you think these are okay, I want to merge this PR. How do you think about 
this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined

2019-05-27 Thread GitBox
SparkQA commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle 
lookupCatalog function not defined
URL: https://github.com/apache/spark/pull/24689#issuecomment-496359179
 
 
   **[Test build #105857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105857/testReport)**
 for PR 24689 at commit 
[`0f4d9aa`](https://github.com/apache/spark/commit/0f4d9aa403d61790c88cdea027352294abcf340d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] 
Make separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-496358953
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] 
Make separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-496358955
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105851/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] 
Handle lookupCatalog function not defined
URL: https://github.com/apache/spark/pull/24689#issuecomment-496358913
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24689: [SPARK-26946][SQL][FOLLOWUP] 
Handle lookupCatalog function not defined
URL: https://github.com/apache/spark/pull/24689#issuecomment-496358914
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make 
separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-496358955
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105851/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make 
separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-496358953
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle 
lookupCatalog function not defined
URL: https://github.com/apache/spark/pull/24689#issuecomment-496358913
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24689: [SPARK-26946][SQL][FOLLOWUP] Handle 
lookupCatalog function not defined
URL: https://github.com/apache/spark/pull/24689#issuecomment-496358914
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-27 Thread GitBox
SparkQA removed a comment on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make 
separate PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-496328630
 
 
   **[Test build #105851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105851/testReport)**
 for PR 24700 at commit 
[`9fbc9e1`](https://github.com/apache/spark/commit/9fbc9e12840a44f40cd750b0e841eb2aaab7f67d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate PySpark/SparkR vectorization configurations

2019-05-27 Thread GitBox
SparkQA commented on issue #24700: [SPARK-27834][SQL][R][PYTHON] Make separate 
PySpark/SparkR vectorization configurations
URL: https://github.com/apache/spark/pull/24700#issuecomment-496358643
 
 
   **[Test build #105851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105851/testReport)**
 for PR 24700 at commit 
[`9fbc9e1`](https://github.com/apache/spark/commit/9fbc9e12840a44f40cd750b0e841eb2aaab7f67d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL ... HOUR TO SECOND syntax

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24472: [SPARK-27578][SQL] Support INTERVAL 
... HOUR TO SECOND syntax
URL: https://github.com/apache/spark/pull/24472#issuecomment-496358371
 
 
   Hi, @gatorsmile and @cloud-fan .
   
   Could you give us some directional advice, please?
   - First, this PR wants to support `INTERVAL ... HOUR TO SECOND` like 
`INTERVAL ... DAY TO SECOND` like Presto/Terradata. It looks reasonable to me, 
too.
   - Second, originally, this PR added a new pattern and new function (which is 
similar to the existing one). To avoid maintaining two similar functions, I 
recommended to extend the existing pattern and handling `DAY` and `HOUR` with 
the same function. So, we will support 2~4 additionally.
   1. SELECT INTERVAL '0 23:59:59.155' DAY TO SECOND (Current Spark)
   1. SELECT INTERVAL '23:59:59.155' HOUR TO SECOND
   1. SELECT INTERVAL '23:59:59.155' DAY TO SECOND
   1. SELECT INTERVAL '1 23:59:59.155' HOUR TO SECOND
   
   If you think these are okay, I want to merge this PR. How do you think about 
this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496356858
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496356864
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105856/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #24569: [SPARK-23191][CORE] Warn rather than terminate when duplicate worker register happens

2019-05-27 Thread GitBox
cloud-fan closed pull request #24569: [SPARK-23191][CORE] Warn rather than 
terminate when duplicate worker register happens
URL: https://github.com/apache/spark/pull/24569
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24689: [SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined

2019-05-27 Thread GitBox
jzhuge commented on a change in pull request #24689: 
[SPARK-26946][SQL][FOLLOWUP] Handle lookupCatalog function not defined
URL: https://github.com/apache/spark/pull/24689#discussion_r287922975
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/LookupCatalog.scala
 ##
 @@ -26,27 +28,31 @@ import org.apache.spark.sql.catalyst.TableIdentifier
 @Experimental
 trait LookupCatalog {
 
-  def lookupCatalog: Option[(String) => CatalogPlugin] = None
+  def lookupCatalog: Option[String => CatalogPlugin] = None
 
   type CatalogObjectIdentifier = (Option[CatalogPlugin], Identifier)
 
   /**
* Extract catalog plugin and identifier from a multi-part identifier.
*/
   object CatalogObjectIdentifier {
-def unapply(parts: Seq[String]): Option[CatalogObjectIdentifier] = 
lookupCatalog.map { lookup =>
-  parts match {
-case Seq(name) =>
-  (None, Identifier.of(Array.empty, name))
-case Seq(catalogName, tail @ _*) =>
-  try {
-val catalog = lookup(catalogName)
-(Some(catalog), Identifier.of(tail.init.toArray, tail.last))
-  } catch {
-case _: CatalogNotFoundException =>
-  (None, Identifier.of(parts.init.toArray, parts.last))
-  }
-  }
+def unapply(parts: Seq[String]): Option[CatalogObjectIdentifier] = parts 
match {
+  case Seq(name) =>
+Some((None, Identifier.of(Array.empty, name)))
+  case Seq(catalogName, tail @ _*) =>
+lookupCatalog match {
+  case Some(lookup) =>
+Try(lookup(catalogName)) match {
 
 Review comment:
   Thanks @HyukjinKwon for pointing out the style guide. Back to try/catch. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496356858
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496356864
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105856/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
SparkQA removed a comment on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496346720
 
 
   **[Test build #105856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105856/testReport)**
 for PR 24717 at commit 
[`9261f16`](https://github.com/apache/spark/commit/9261f16ff3ded7e10fc69c50df8131be589cde49).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics 
& MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496356711
 
 
   **[Test build #105856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105856/testReport)**
 for PR 24717 at commit 
[`9261f16`](https://github.com/apache/spark/commit/9261f16ff3ded7e10fc69c50df8131be589cde49).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #24569: [SPARK-23191][CORE] Warn rather than terminate when duplicate worker register happens

2019-05-27 Thread GitBox
cloud-fan commented on issue #24569: [SPARK-23191][CORE] Warn rather than 
terminate when duplicate worker register happens
URL: https://github.com/apache/spark/pull/24569#issuecomment-496356630
 
 
   thanks, merging to master!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #24696: [SPARK-27832][SQL] Don't decompress and create column batch when the task is completed

2019-05-27 Thread GitBox
cloud-fan commented on issue #24696: [SPARK-27832][SQL] Don't decompress and 
create column batch when the task is completed
URL: https://github.com/apache/spark/pull/24696#issuecomment-496355838
 
 
   > At the moment, the returned batch is also immediately closed
   
   I'm a little lost here. Can you give an events sequence that can cause the 
error?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
dongjoon-hyun edited a comment on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496351652
 
 
   @gcmerz . What is your id in Apache JIRA? If you don't have, please create 
one. Then, I can assign that issue to you.
   - https://issues.apache.org/jira/browse/SPARK-27858
   
   And, FYI, in GitHub personal setting, you can additionally add your Palantir 
email (used in this PR). Then, your commit with Palantir ID also will show your 
GitHub profile.
   - https://github.com/apache/spark/commits/master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496351652
 
 
   @gcmerz . What is your id in Apache JIRA? If you don't have, please create 
one. Then, I can assign that issue to you.
   - https://issues.apache.org/jira/browse/SPARK-27858


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on issue #14325: [SPARK-16692] [ML] Add multi label classification evaluator, DataFrame

2019-05-27 Thread GitBox
zhengruifeng commented on issue #14325: [SPARK-16692] [ML] Add multi label 
classification evaluator, DataFrame
URL: https://github.com/apache/spark/pull/14325#issuecomment-496350955
 
 
   What's the progress now? @liwzhi @WeichenXu123 @srowen 
   If @liwzhi are not working on this, can I take it over?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
dongjoon-hyun closed pull request #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for 
avro deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496349036
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for 
avro deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496349037
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105855/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496349037
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105855/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496349036
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496348939
 
 
   Merged to `master` and `branch-2.4`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
SparkQA removed a comment on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344329
 
 
   **[Test build #105855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105855/testReport)**
 for PR 24722 at commit 
[`4edbe09`](https://github.com/apache/spark/commit/4edbe093b1a4e6369fe7327675e2f49de62cb934).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
SparkQA commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496348853
 
 
   **[Test build #105855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105855/testReport)**
 for PR 24722 at commit 
[`4edbe09`](https://github.com/apache/spark/commit/4edbe093b1a4e6369fe7327675e2f49de62cb934).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496348246
 
 
   You're welcome. Thank you for swift update.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
dongjoon-hyun commented on issue #24043: [SPARK-11412][SQL] Support merge 
schema for ORC
URL: https://github.com/apache/spark/pull/24043#issuecomment-496347909
 
 
   Lastly, it would be great if you can add some performance comparisons 
between Parquet/ORC merge schema in the PR description. This PR aims to add new 
features for ORC/Parquet feature parity. So, if there is a big slowness on new 
code, it's not desirable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496347648
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496347641
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496347648
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24717: [SPARK-27847][ML] One-Pass 
MultilabelMetrics & MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496347641
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics & MulticlassMetrics

2019-05-27 Thread GitBox
SparkQA commented on issue #24717: [SPARK-27847][ML] One-Pass MultilabelMetrics 
& MulticlassMetrics
URL: https://github.com/apache/spark/pull/24717#issuecomment-496346720
 
 
   **[Test build #105856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105856/testReport)**
 for PR 24717 at commit 
[`9261f16`](https://github.com/apache/spark/commit/9261f16ff3ded7e10fc69c50df8131be589cde49).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #24043: [SPARK-11412][SQL] 
Support merge schema for ORC
URL: https://github.com/apache/spark/pull/24043#discussion_r287913435
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala
 ##
 @@ -101,6 +101,19 @@ private[hive] object OrcFileOperator extends Logging {
 }
   }
 
+  /**
+   * Read single ORC file schema using Hive ORC library
+   */
+  def singleFileSchemaReader(file: String, conf: Configuration, 
ignoreCorruptFiles: Boolean)
+  : Option[StructType] = {
 
 Review comment:
   ditto. 2-space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #24043: [SPARK-11412][SQL] 
Support merge schema for ORC
URL: https://github.com/apache/spark/pull/24043#discussion_r287912775
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ##
 @@ -82,14 +83,95 @@ object OrcUtils extends Logging {
   : Option[StructType] = {
 val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 val conf = sparkSession.sessionState.newHadoopConf()
-// TODO: We need to support merge schema. Please see SPARK-11412.
 files.toIterator.map(file => readSchema(file.getPath, conf, 
ignoreCorruptFiles)).collectFirst {
   case Some(schema) =>
 logDebug(s"Reading schema from file $files, got Hive schema string: 
$schema")
 
CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]
 }
   }
 
+  /**
+   * Read single ORC file schema using native version of ORC
+   */
+  def singleFileSchemaReader(file: String, conf: Configuration, 
ignoreCorruptFiles: Boolean)
+  : Option[StructType] = {
+OrcUtils.readSchema(new Path(file), conf, ignoreCorruptFiles)
+  .map(s => 
CatalystSqlParser.parseDataType(s.toString).asInstanceOf[StructType])
+  }
+
+  /**
+   * Figures out a merged ORC schema with a distributed Spark job.
+   */
+  def mergeSchemasInParallel(
+  sparkSession: SparkSession,
+  files: Seq[FileStatus],
+  singleFileSchemaReader: (String, Configuration, Boolean) => 
Option[StructType])
+  : Option[StructType] = {
 
 Review comment:
   ditto. 2-space.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24043: [SPARK-11412][SQL] Support merge schema for ORC

2019-05-27 Thread GitBox
dongjoon-hyun commented on a change in pull request #24043: [SPARK-11412][SQL] 
Support merge schema for ORC
URL: https://github.com/apache/spark/pull/24043#discussion_r287912742
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ##
 @@ -82,14 +83,95 @@ object OrcUtils extends Logging {
   : Option[StructType] = {
 val ignoreCorruptFiles = sparkSession.sessionState.conf.ignoreCorruptFiles
 val conf = sparkSession.sessionState.newHadoopConf()
-// TODO: We need to support merge schema. Please see SPARK-11412.
 files.toIterator.map(file => readSchema(file.getPath, conf, 
ignoreCorruptFiles)).collectFirst {
   case Some(schema) =>
 logDebug(s"Reading schema from file $files, got Hive schema string: 
$schema")
 
CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]
 }
   }
 
+  /**
+   * Read single ORC file schema using native version of ORC
+   */
+  def singleFileSchemaReader(file: String, conf: Configuration, 
ignoreCorruptFiles: Boolean)
+  : Option[StructType] = {
 
 Review comment:
   Unfortunately, the existing code around here follows a wrong indentation 
rule. Let's use correct indentation at least at new code. `: 
Option[StructType]` should have 2-space indentation instead of 4-space.
   ```scala
  def singleFileSchemaReader(file: String, conf: Configuration, 
ignoreCorruptFiles: Boolean)
   -  : Option[StructType] = {
   +: Option[StructType] = {
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
SparkQA commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344329
 
 
   **[Test build #105855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105855/testReport)**
 for PR 24722 at commit 
[`4edbe09`](https://github.com/apache/spark/commit/4edbe093b1a4e6369fe7327675e2f49de62cb934).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gcmerz commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
gcmerz commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344189
 
 
   Applied the tweaks--thank you so much for the quick review!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for 
avro deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344048
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins removed a comment on issue #24722: [SPARK-27858][SQL] Fix for 
avro deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344050
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gcmerz commented on a change in pull request #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
gcmerz commented on a change in pull request #24722: [SPARK-27858][SQL] Fix for 
avro deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#discussion_r287912404
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -247,6 +247,32 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("Union type: More than one non-null type") {
 
 Review comment:
   Also done!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gcmerz commented on a change in pull request #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
gcmerz commented on a change in pull request #24722: [SPARK-27858][SQL] Fix for 
avro deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#discussion_r287912367
 
 

 ##
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##
 @@ -247,6 +247,32 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 }
   }
 
+  test("Union type: More than one non-null type") {
+withTempDir { dir =>
+  val complexNullUnionType = Schema.createUnion(
+List(Schema.create(Type.INT), Schema.create(Type.NULL), 
Schema.create(Type.STRING)).asJava)
+  val fields = Seq(
+new Field("field1", complexNullUnionType, "doc", 
null.asInstanceOf[AnyVal])).asJava
+  val schema = Schema.createRecord("name", "docs", "namespace", false)
+  schema.setFields(fields)
+  val datumWriter = new GenericDatumWriter[GenericRecord](schema)
+  val dataFileWriter = new DataFileWriter[GenericRecord](datumWriter)
+  dataFileWriter.create(schema, new File(s"$dir.avro"))
+  val avroRec = new GenericData.Record(schema)
+  avroRec.put("field1", 42)
+  dataFileWriter.append(avroRec)
+  val avroRec2 = new GenericData.Record(schema)
+  avroRec2.put("field1", "Alice")
+  dataFileWriter.append(avroRec2)
+  dataFileWriter.flush()
+  dataFileWriter.close()
+
+  val df = spark.read.format("avro").load(s"$dir.avro")
+  assertResult(42)(df.selectExpr("field1.member0").take(1)(0).get(0))
+  
assertResult("Alice")(df.selectExpr("field1.member1").take(2).drop(1)(0).get(0))
 
 Review comment:
   Done! Agreed this is much cleaner/more robust


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344050
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro deserialization on union types with multiple non-null types

2019-05-27 Thread GitBox
AmplabJenkins commented on issue #24722: [SPARK-27858][SQL] Fix for avro 
deserialization on union types with multiple non-null types
URL: https://github.com/apache/spark/pull/24722#issuecomment-496344048
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >