[jira] [Updated] (SPARK-27719) Set maxDisplayLogSize for spark history server

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27719:
--
Flags:   (was: Patch)

> Set maxDisplayLogSize for spark history server
> --
>
> Key: SPARK-27719
> URL: https://issues.apache.org/jira/browse/SPARK-27719
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: hao.li
>Priority: Minor
>
> Sometimes a very large eventllog may be useless, and parses it may waste many 
> resources.
> It may be useful to  avoid parse large enventlogs by setting a configuration 
> spark.history.fs.maxDisplayLogSize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27719) Set maxDisplayLogSize for spark history server

2019-06-14 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864605#comment-16864605
 ] 

Dongjoon Hyun commented on SPARK-27719:
---

After some searching, I found that this happens usually when we do *backfill*.

> Set maxDisplayLogSize for spark history server
> --
>
> Key: SPARK-27719
> URL: https://issues.apache.org/jira/browse/SPARK-27719
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: hao.li
>Priority: Minor
>
> Sometimes a very large eventllog may be useless, and parses it may waste many 
> resources.
> It may be useful to  avoid parse large enventlogs by setting a configuration 
> spark.history.fs.maxDisplayLogSize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28004) Update jquery to 3.4.1

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28004.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24843
[https://github.com/apache/spark/pull/24843]

> Update jquery to 3.4.1
> --
>
> Key: SPARK-28004
> URL: https://issues.apache.org/jira/browse/SPARK-28004
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
> Fix For: 3.0.0
>
>
> We're using an old-ish jQuery, 1.12.4, and should probably update for Spark 3 
> to keep up in general, but also to keep up with CVEs. In fact, we know of at 
> least one resolved in only 3.4.0+ 
> (https://nvd.nist.gov/vuln/detail/CVE-2019-11358). They may not affect Spark, 
> but, if the update isn't painful, maybe worthwhile in order to make future 
> 3.x updates easier.
> jQuery 1 -> 2 doesn't sound like a breaking change, as 2.0 is supposed to 
> maintain compatibility with 1.9+ 
> (https://blog.jquery.com/2013/04/18/jquery-2-0-released/)
> 2 -> 3 has breaking changes: https://jquery.com/upgrade-guide/3.0/. It's hard 
> to evaluate each one, but the most likely area for problems is in ajax(). 
> However, our usage of jQuery (and plugins) is pretty simple. 
> I've tried updating and testing the UI, and can't see any warnings, errors, 
> or problematic functionality. This includes the Spark UI, master UI, worker 
> UI, and docs (well, I wasn't able to build R docs)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23160) Add window.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23160:
--
Description: In this ticket, we plan to add the regression test cases of 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql.
  (was: We should also cover the window sql interface, example in 
`sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
funny to see whether we can generate consistent results for window tests in 
other major databases.)

> Add window.sql
> --
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/window.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23160) Add window.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23160:
--
Affects Version/s: (was: 2.3.0)
   3.0.0

> Add window.sql
> --
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23160) Add window.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23160:
--
Component/s: Tests

> Add window.sql
> --
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23160) Add window.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23160:
--
Summary: Add window.sql  (was: Add more window sql tests)

> Add window.sql
> --
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23160) Add more window sql tests

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23160:
--
Issue Type: Sub-task  (was: Test)
Parent: SPARK-27763

> Add more window sql tests
> -
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23160) Add more window sql tests

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23160:
--
Issue Type: Test  (was: Sub-task)
Parent: (was: SPARK-22359)

> Add more window sql tests
> -
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23160) Add more window sql tests

2019-06-14 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864598#comment-16864598
 ] 

Dongjoon Hyun commented on SPARK-23160:
---

Hi, [~DylanGuedes]. I'll move this to the recent active context (SPARK-27763).
You can do port 
`https://github.com/postgres/postgres/blob/master/src/test/regress/sql/window.sql`.
cc [~smilegator]

> Add more window sql tests
> -
>
> Key: SPARK-23160
> URL: https://issues.apache.org/jira/browse/SPARK-23160
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xingbo Jiang
>Priority: Minor
>
> We should also cover the window sql interface, example in 
> `sql/core/src/test/resources/sql-tests/inputs/window.sql`, it should also be 
> funny to see whether we can generate consistent results for window tests in 
> other major databases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22360) Add unit test for Window Specifications

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-22360:
--
Component/s: Tests

> Add unit test for Window Specifications
> ---
>
> Key: SPARK-22360
> URL: https://issues.apache.org/jira/browse/SPARK-22360
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 2.3.0
>Reporter: Xingbo Jiang
>Priority: Major
>
> * different partition clauses (none, one, multiple)
> * different order clauses (none, one, multiple, asc/desc, nulls first/last)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28033) String concatenation should low priority than other operators

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28033:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-27764

> String concatenation should low priority than other operators
> -
>
> Key: SPARK-28033
> URL: https://issues.apache.org/jira/browse/SPARK-28033
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.3
>Reporter: Yuming Wang
>Priority: Major
>
> Spark SQL:
> {code:sql}
> spark-sql> explain select 'four: ' || 2 + 2;
> == Physical Plan ==
> *(1) Project [null AS (CAST(concat(four: , CAST(2 AS STRING)) AS DOUBLE) + 
> CAST(2 AS DOUBLE))#2]
> +- Scan OneRowRelation[]
> spark-sql> select 'four: ' || 2 + 2;
> NULL
> {code}
> Hive:
> {code:sql}
> hive> select 'four: ' || 2 + 2;
> OK
> four: 4
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28002) Support WITH clause column aliases

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28002.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24842
[https://github.com/apache/spark/pull/24842]

> Support WITH clause column aliases
> --
>
> Key: SPARK-28002
> URL: https://issues.apache.org/jira/browse/SPARK-28002
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.0.0
>
>
> PostgreSQL supports column aliasing in a CTE so this is valid query:
> {noformat}
> WITH t(x) AS (SELECT 1)
> SELECT * FROM t WHERE x = 1{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28002) Support WITH clause column aliases

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28002:
-

Assignee: Peter Toth

> Support WITH clause column aliases
> --
>
> Key: SPARK-28002
> URL: https://issues.apache.org/jira/browse/SPARK-28002
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
>
> PostgreSQL supports column aliasing in a CTE so this is valid query:
> {noformat}
> WITH t(x) AS (SELECT 1)
> SELECT * FROM t WHERE x = 1{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28051) Exposing JIRA issue component types at GitHub PRs

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28051:
-

Assignee: Dongjoon Hyun

> Exposing JIRA issue component types at GitHub PRs
> -
>
> Key: SPARK-28051
> URL: https://issues.apache.org/jira/browse/SPARK-28051
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to expose JIRA issue component types at GitHub PRs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28051) Exposing JIRA issue component types at GitHub PRs

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28051.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24871
[https://github.com/apache/spark/pull/24871]

> Exposing JIRA issue component types at GitHub PRs
> -
>
> Key: SPARK-28051
> URL: https://issues.apache.org/jira/browse/SPARK-28051
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>
> This issue aims to expose JIRA issue component types at GitHub PRs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26949) Prevent "purge" to remove needed batch files in CompactibleFileStreamLog

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26949.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23850
[https://github.com/apache/spark/pull/23850]

> Prevent "purge" to remove needed batch files in CompactibleFileStreamLog
> 
>
> Key: SPARK-26949
> URL: https://issues.apache.org/jira/browse/SPARK-26949
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> I've seen couple of trials (in opened PRs, even I've also tried) which calls 
> purge() in CompactibleFileStreamLog, but after looking at the codebase of 
> CompactibleFileStreamLog, I've realized that purging latest compaction batch 
> would break internal of CompactibleFileStreamLog and it throws 
> IllegalStateException.
> Given that CompactibleFileStreamLog maintains the batches and purges 
> according to its configuration, it would be safer to just rely on 
> CompactibleFileStreamLog to purge and prevent calling `purge` outside of 
> CompactibleFileStreamLog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28060) Float type can not accept some special inputs

2019-06-14 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28060:
---

 Summary: Float type can not accept some special inputs
 Key: SPARK-28060
 URL: https://issues.apache.org/jira/browse/SPARK-28060
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


||Query||Spark SQL||PostgreSQL||
|SELECT float('nan');|NULL|NaN|
|SELECT float('   NAN  ');|NULL|NaN|
|SELECT float('infinity');|NULL|Infinity|
|SELECT float('  -INFINiTY   ');|NULL|-Infinity|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28059) Add int4.sql

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28059:


Assignee: (was: Apache Spark)

> Add int4.sql
> 
>
> Key: SPARK-28059
> URL: https://issues.apache.org/jira/browse/SPARK-28059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int4.sql].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28059) Add int4.sql

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28059:


Assignee: Apache Spark

> Add int4.sql
> 
>
> Key: SPARK-28059
> URL: https://issues.apache.org/jira/browse/SPARK-28059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int4.sql].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28059) Add int4.sql

2019-06-14 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28059:
---

 Summary: Add int4.sql
 Key: SPARK-28059
 URL: https://issues.apache.org/jira/browse/SPARK-28059
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.0.0
Reporter: Yuming Wang


In this ticket, we plan to add the regression test cases of 
[https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int4.sql].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28027) Missing some mathematical operators

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28027:

Description: 
||Operator||Description||Example||Result||
|{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
|{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
|{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
|{{\!}}|factorial|{{5 !}}|{{120}}|
|{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
|{{@}}|absolute value|{{@ -5.0}}|{{5}}|
|{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
|{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
|{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|

 
 Please note that we have {{^}}, {{\!}} and {{\!!\}}, but it has different 
meanings.

[https://www.postgresql.org/docs/11/functions-math.html]
 
[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
 [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]

  was:
||Operator||Description||Example||Result||
|{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
|{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
|{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
|{{!}}|factorial|{{5 !}}|{{120}}|
|{{!!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
|{{@}}|absolute value|{{@ -5.0}}|{{5}}|
|{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
|{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
|{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|

 
 Please note that we have {{^}}, {{!}} and {{!!}}, but it has different 
meanings.

[https://www.postgresql.org/docs/11/functions-math.html]
 
[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
 [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]


> Missing some mathematical operators
> ---
>
> Key: SPARK-28027
> URL: https://issues.apache.org/jira/browse/SPARK-28027
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Operator||Description||Example||Result||
> |{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
> |{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
> |{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
> |{{\!}}|factorial|{{5 !}}|{{120}}|
> |{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
> |{{@}}|absolute value|{{@ -5.0}}|{{5}}|
> |{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
> |{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
> |{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|
>  
>  Please note that we have {{^}}, {{\!}} and {{\!!\}}, but it has different 
> meanings.
> [https://www.postgresql.org/docs/11/functions-math.html]
>  
> [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
>  [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28027) Missing some mathematical operators

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28027:

Description: 
||Operator||Description||Example||Result||
|{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
|{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
|{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
|{{!}}|factorial|{{5 !}}|{{120}}|
|{{!!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
|{{@}}|absolute value|{{@ -5.0}}|{{5}}|
|{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
|{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
|{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|

 
 Please note that we have {{^}}, {{!}} and {{!!}}, but it has different 
meanings.

[https://www.postgresql.org/docs/11/functions-math.html]
 
[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
 [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]

  was:
||Operator||Description||Example||Result||
|{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
|{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
|{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
|{{!}}|factorial|{{5 !}}|{{120}}|
|{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
|{{@}}|absolute value|{{@ -5.0}}|{{5}}|
|{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
|{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
|{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|
 

[https://www.postgresql.org/docs/11/functions-math.html]
 
[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
 [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]


> Missing some mathematical operators
> ---
>
> Key: SPARK-28027
> URL: https://issues.apache.org/jira/browse/SPARK-28027
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Operator||Description||Example||Result||
> |{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
> |{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
> |{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
> |{{!}}|factorial|{{5 !}}|{{120}}|
> |{{!!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
> |{{@}}|absolute value|{{@ -5.0}}|{{5}}|
> |{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
> |{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
> |{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|
>  
>  Please note that we have {{^}}, {{!}} and {{!!}}, but it has different 
> meanings.
> [https://www.postgresql.org/docs/11/functions-math.html]
>  
> [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
>  [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28027) Missing some mathematical operators

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28027:

Summary: Missing some mathematical operators  (was: Add Bitwise shift 
left/right)

> Missing some mathematical operators
> ---
>
> Key: SPARK-28027
> URL: https://issues.apache.org/jira/browse/SPARK-28027
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Operator||Description||Example||Result||
> |{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
> |{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
> |{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
> |{{!}}|factorial|{{5 !}}|{{120}}|
> |{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
> |{{@}}|absolute value|{{@ -5.0}}|{{5}}|
> |{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
> |{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
> |{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|
>  
> [https://www.postgresql.org/docs/11/functions-math.html]
>  
> [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
>  [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28027) Add Bitwise shift left/right

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28027:

Description: 
||Operator||Description||Example||Result||
|{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
|{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
|{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
|{{!}}|factorial|{{5 !}}|{{120}}|
|{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
|{{@}}|absolute value|{{@ -5.0}}|{{5}}|
|{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
|{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
|{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|
 

[https://www.postgresql.org/docs/11/functions-math.html]
 
[https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
 [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]

  was:
||Operator||Description||Example||Result||
|{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
|{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|

https://www.postgresql.org/docs/11/functions-math.html
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm
https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html



> Add Bitwise shift left/right
> 
>
> Key: SPARK-28027
> URL: https://issues.apache.org/jira/browse/SPARK-28027
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Operator||Description||Example||Result||
> |{{^}}|exponentiation (associates left to right)|{{2.0 ^ 3.0}}|{{8}}|
> |{{\|/}}|square root|{{\|/ 25.0}}|{{5}}|
> |{{\|\|/}}|cube root|{{\|\|/ 27.0}}|{{3}}|
> |{{!}}|factorial|{{5 !}}|{{120}}|
> |{{\!\!}}|factorial (prefix operator)|{{!! 5}}|{{120}}|
> |{{@}}|absolute value|{{@ -5.0}}|{{5}}|
> |{{#}}|bitwise XOR|{{17 # 5}}|{{20}}|
> |{{<<}}|bitwise shift left|{{1 << 4}}|{{16}}|
> |{{>>}}|bitwise shift right|{{8 >> 2}}|{{2}}|
>  
> [https://www.postgresql.org/docs/11/functions-math.html]
>  
> [https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Operators/BitwiseOperators.htm]
>  [https://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26796) Testcases failing with "org.apache.hadoop.fs.ChecksumException" error

2019-06-14 Thread David Mavashev (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864538#comment-16864538
 ] 

David Mavashev commented on SPARK-26796:


I have this exact issue happening on a windows box but not on a linux box

> Testcases failing with "org.apache.hadoop.fs.ChecksumException" error
> -
>
> Key: SPARK-26796
> URL: https://issues.apache.org/jira/browse/SPARK-26796
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.3.2, 2.4.0
> Environment: Ubuntu 16.04 
> Java Version
> openjdk version "1.8.0_192"
>  OpenJDK Runtime Environment (build 1.8.0_192-b12_openj9)
>  Eclipse OpenJ9 VM (build openj9-0.11.0, JRE 1.8.0 Compressed References 
> 20181107_80 (JIT enabled, AOT enabled)
>  OpenJ9 - 090ff9dcd
>  OMR - ea548a66
>  JCL - b5a3affe73 based on jdk8u192-b12)
>  
> Hadoop  Version
> Hadoop 2.7.1
>  Subversion Unknown -r Unknown
>  Compiled by test on 2019-01-29T09:09Z
>  Compiled with protoc 2.5.0
>  From source with checksum 5e94a235f9a71834e2eb73fb36ee873f
>  This command was run using 
> /home/test/hadoop-release-2.7.1/hadoop-dist/target/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar
>  
>  
>  
>Reporter: Anuja Jakhade
>Priority: Major
>
> Observing test case failures due to Checksum error 
> Below is the error log
> [ERROR] checkpointAndComputation(test.org.apache.spark.JavaAPISuite) Time 
> elapsed: 1.232 s <<< ERROR!
> org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost, executor 
> driver): org.apache.hadoop.fs.ChecksumException: Checksum error: 
> file:/home/test/spark/core/target/tmp/1548319689411-0/fd0ba388-539c-49aa-bf76-e7d50aa2d1fc/rdd-0/part-0
>  at 0 exp: 222499834 got: 1400184476
>  at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:323)
>  at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:279)
>  at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:214)
>  at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:232)
>  at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:196)
>  at java.io.DataInputStream.read(DataInputStream.java:149)
>  at 
> java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2769)
>  at 
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2785)
>  at 
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3262)
>  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:968)
>  at java.io.ObjectInputStream.(ObjectInputStream.java:390)
>  at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:63)
>  at 
> org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:63)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122)
>  at 
> org.apache.spark.rdd.ReliableCheckpointRDD$.readCheckpointFile(ReliableCheckpointRDD.scala:300)
>  at 
> org.apache.spark.rdd.ReliableCheckpointRDD.compute(ReliableCheckpointRDD.scala:100)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:322)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:813)
> Driver stacktrace:
>  at 
> test.org.apache.spark.JavaAPISuite.checkpointAndComputation(JavaAPISuite.java:1243)
> Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error:
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28057.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add method `clone` in catalyst TreeNode
> ---
>
> Key: SPARK-28057
> URL: https://issues.apache.org/jira/browse/SPARK-28057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
> instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-28057:
-

Assignee: Maryann Xue

> Add method `clone` in catalyst TreeNode
> ---
>
> Key: SPARK-28057
> URL: https://issues.apache.org/jira/browse/SPARK-28057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
>
> Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
> instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28058) Reading csv with DROPMALFORMED sometimes doesn't drop malformed records

2019-06-14 Thread Stuart White (JIRA)
Stuart White created SPARK-28058:


 Summary: Reading csv with DROPMALFORMED sometimes doesn't drop 
malformed records
 Key: SPARK-28058
 URL: https://issues.apache.org/jira/browse/SPARK-28058
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3, 2.4.1
Reporter: Stuart White


The spark sql csv reader is not dropping malformed records as expected.

Consider this file (fruit.csv).  Notice it contains a header record, 3 valid 
records, and one malformed record.

{noformat}
fruit,color,price,quantity
apple,red,1,3
banana,yellow,2,4
orange,orange,3,5
xxx
{noformat}

If I read this file using the spark sql csv reader as follows, everything looks 
good.  The malformed record is dropped.

{noformat}
scala> spark.read.option("header", "true").option("mode", 
"DROPMALFORMED").csv("fruit.csv").show(truncate=false)
+--+--+-++  
|fruit |color |price|quantity|
+--+--+-++
|apple |red   |1|3   |
|banana|yellow|2|4   |
|orange|orange|3|5   |
+--+--+-++
{noformat}

However, if I select a subset of the columns, the malformed record is not 
dropped.  The malformed data is placed in the first column, and the remaining 
column(s) are filled with nulls.

{noformat}
scala> spark.read.option("header", "true").option("mode", 
"DROPMALFORMED").csv("fruit.csv").select('fruit).show(truncate=false)
+--+
|fruit |
+--+
|apple |
|banana|
|orange|
|xxx   |
+--+

scala> spark.read.option("header", "true").option("mode", 
"DROPMALFORMED").csv("fruit.csv").select('fruit, 'color).show(truncate=false)
+--+--+
|fruit |color |
+--+--+
|apple |red   |
|banana|yellow|
|orange|orange|
|xxx   |null  |
+--+--+

scala> spark.read.option("header", "true").option("mode", 
"DROPMALFORMED").csv("fruit.csv").select('fruit, 'color, 
'price).show(truncate=false)
+--+--+-+
|fruit |color |price|
+--+--+-+
|apple |red   |1|
|banana|yellow|2|
|orange|orange|3|
|xxx   |null  |null |
+--+--+-+
{noformat}

And finally, if I manually select all of the columns, the malformed record is 
once again dropped.

{noformat}
scala> spark.read.option("header", "true").option("mode", 
"DROPMALFORMED").csv("fruit.csv").select('fruit, 'color, 'price, 
'quantity).show(truncate=false)
+--+--+-++
|fruit |color |price|quantity|
+--+--+-++
|apple |red   |1|3   |
|banana|yellow|2|4   |
|orange|orange|3|5   |
+--+--+-++
{noformat}

I would expect the malformed record(s) to be dropped regardless of which 
columns are being selected from the file.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28057:


Assignee: (was: Apache Spark)

> Add method `clone` in catalyst TreeNode
> ---
>
> Key: SPARK-28057
> URL: https://issues.apache.org/jira/browse/SPARK-28057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Priority: Minor
>
> Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
> instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28057:


Assignee: Apache Spark

> Add method `clone` in catalyst TreeNode
> ---
>
> Key: SPARK-28057
> URL: https://issues.apache.org/jira/browse/SPARK-28057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Apache Spark
>Priority: Minor
>
> Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
> instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Maryann Xue (JIRA)
Maryann Xue created SPARK-28057:
---

 Summary: Add method `clone` in catalyst TreeNode
 Key: SPARK-28057
 URL: https://issues.apache.org/jira/browse/SPARK-28057
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maryann Xue


Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28056) Document SCALAR_ITER Pandas UDF

2019-06-14 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-28056:
-

 Summary: Document SCALAR_ITER Pandas UDF
 Key: SPARK-28056
 URL: https://issues.apache.org/jira/browse/SPARK-28056
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, PySpark
Affects Versions: 3.0.0
Reporter: Xiangrui Meng


After SPARK-26412, we should document the new SCALAR_ITER Pandas UDF so user 
can discover the feature and learn how to use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26209) Allow for dataframe bucketization without Hive

2019-06-14 Thread Sam hendley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864355#comment-16864355
 ] 

Sam hendley commented on SPARK-26209:
-

It seems like the proposal in SPARK-27067 goes a long way towards solving this 
problem. 

> Allow for dataframe bucketization without Hive
> --
>
> Key: SPARK-26209
> URL: https://issues.apache.org/jira/browse/SPARK-26209
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output, Java API, SQL
>Affects Versions: 3.0.0
>Reporter: Walt Elder
>Priority: Minor
>
> As a DataFrame author, I can elect to bucketize my output without involving 
> Hive or HMS, so that my hive-less environment can benefit from this 
> query-optimization technique. 
>  
> https://issues.apache.org/jira/browse/SPARK-19256?focusedCommentId=16345397=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16345397
>  identifies this as a shortcoming with the umbrella feature in provided via 
> SPARK-19256.
>  
> In short, relying on Hive to store metadata *precludes* environments which 
> don't have/use hive from making use of bucketization features. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21882) OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

2019-06-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21882:
--
Labels:   (was: bulk-closed)

> OutputMetrics doesn't count written bytes correctly in the 
> saveAsHadoopDataset function
> ---
>
> Key: SPARK-21882
> URL: https://issues.apache.org/jira/browse/SPARK-21882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.2.0
>Reporter: linxiaojun
>Assignee: linxiaojun
>Priority: Minor
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
> Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does 
> not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 
> 0). The reason is that we did not initialize the callback function called to 
> find bytes written in the right way. As usual, statisticsTable which records 
> statistics in a FileSystem must be initialized at the beginning (this will be 
> triggered when open SparkHadoopWriter). The solution for this issue is to 
> adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-21882) OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

2019-06-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-21882:
---

> OutputMetrics doesn't count written bytes correctly in the 
> saveAsHadoopDataset function
> ---
>
> Key: SPARK-21882
> URL: https://issues.apache.org/jira/browse/SPARK-21882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.2.0
>Reporter: linxiaojun
>Assignee: linxiaojun
>Priority: Minor
>  Labels: bulk-closed
> Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does 
> not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 
> 0). The reason is that we did not initialize the callback function called to 
> find bytes written in the right way. As usual, statisticsTable which records 
> statistics in a FileSystem must be initialized at the beginning (this will be 
> triggered when open SparkHadoopWriter). The solution for this issue is to 
> adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21882) OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

2019-06-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-21882.
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   2.4.4
   2.3.4

Resolved by https://github.com/apache/spark/pull/24863

> OutputMetrics doesn't count written bytes correctly in the 
> saveAsHadoopDataset function
> ---
>
> Key: SPARK-21882
> URL: https://issues.apache.org/jira/browse/SPARK-21882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.2.0
>Reporter: linxiaojun
>Assignee: linxiaojun
>Priority: Minor
>  Labels: bulk-closed
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
> Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does 
> not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 
> 0). The reason is that we did not initialize the callback function called to 
> find bytes written in the right way. As usual, statisticsTable which records 
> statistics in a FileSystem must be initialized at the beginning (this will be 
> triggered when open SparkHadoopWriter). The solution for this issue is to 
> adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21882) OutputMetrics doesn't count written bytes correctly in the saveAsHadoopDataset function

2019-06-14 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-21882:
-

Assignee: linxiaojun

> OutputMetrics doesn't count written bytes correctly in the 
> saveAsHadoopDataset function
> ---
>
> Key: SPARK-21882
> URL: https://issues.apache.org/jira/browse/SPARK-21882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 2.2.0
>Reporter: linxiaojun
>Assignee: linxiaojun
>Priority: Minor
>  Labels: bulk-closed
> Attachments: SPARK-21882.patch
>
>
> The first job called from saveAsHadoopDataset, running in each executor, does 
> not calculate the writtenBytes of OutputMetrics correctly (writtenBytes is 
> 0). The reason is that we did not initialize the callback function called to 
> find bytes written in the right way. As usual, statisticsTable which records 
> statistics in a FileSystem must be initialized at the beginning (this will be 
> triggered when open SparkHadoopWriter). The solution for this issue is to 
> adjust the order of callback function initialization. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-14 Thread ChenKai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ChenKai updated SPARK-28054:

Description: 
{code:java}
-- create sql and column name is upper case
CREATE TABLE src (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING)

-- insert sql
INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
{code}

The error is:

{code:java}
Error in query: 
org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: 
Partition spec {ds=, DS=1} contains non-partition columns;
{code}

  was:
{code:java}
-- create sql and column name is upper case
CREATE TABLE src (KEY INT, VALUE STRING) PARTITIONED BY (DS STRING)

-- insert sql
INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
{code}

The error is:

{code:java}
Error in query: 
org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: 
Partition spec {ds=, DS=1} contains non-partition columns;
{code}


> Unable to insert partitioned table dynamically when partition name is upper 
> case
> 
>
> Key: SPARK-28054
> URL: https://issues.apache.org/jira/browse/SPARK-28054
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ChenKai
>Priority: Major
>
> {code:java}
> -- create sql and column name is upper case
> CREATE TABLE src (KEY STRING, VALUE STRING) PARTITIONED BY (DS STRING)
> -- insert sql
> INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
> {code}
> The error is:
> {code:java}
> Error in query: 
> org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: 
> Partition spec {ds=, DS=1} contains non-partition columns;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28039) Add float4.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28039:
--
Component/s: Tests

> Add float4.sql
> --
>
> Key: SPARK-28039
> URL: https://issues.apache.org/jira/browse/SPARK-28039
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27934) Add case.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27934:
--
Component/s: Tests

> Add case.sql
> 
>
> Key: SPARK-27934
> URL: https://issues.apache.org/jira/browse/SPARK-27934
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27988) Add aggregates.sql - Part3

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27988:
--
Component/s: Tests

> Add aggregates.sql - Part3
> --
>
> Key: SPARK-27988
> URL: https://issues.apache.org/jira/browse/SPARK-27988
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L352-L605



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28020) Add date.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28020:
--
Component/s: Tests

> Add date.sql
> 
>
> Key: SPARK-28020
> URL: https://issues.apache.org/jira/browse/SPARK-28020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28038) Add text.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28038:
--
Component/s: Tests

> Add text.sql
> 
>
> Key: SPARK-28038
> URL: https://issues.apache.org/jira/browse/SPARK-28038
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/text.sql].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27883) Add aggregates.sql - Part2

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27883:
--
Component/s: Tests

> Add aggregates.sql - Part2
> --
>
> Key: SPARK-27883
> URL: https://issues.apache.org/jira/browse/SPARK-27883
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28000) Add comments.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28000:
--
Component/s: Tests

> Add comments.sql
> 
>
> Key: SPARK-28000
> URL: https://issues.apache.org/jira/browse/SPARK-28000
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/comments.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28034) Add with.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28034:
--
Component/s: Tests

> Add with.sql
> 
>
> Key: SPARK-28034
> URL: https://issues.apache.org/jira/browse/SPARK-28034
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/with.sql]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27918) Add boolean.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27918:
--
Component/s: Tests

> Add boolean.sql
> ---
>
> Key: SPARK-27918
> URL: https://issues.apache.org/jira/browse/SPARK-27918
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27770) Add aggregates.sql - Part1

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27770:
--
Component/s: Tests

> Add aggregates.sql - Part1
> --
>
> Key: SPARK-27770
> URL: https://issues.apache.org/jira/browse/SPARK-27770
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/02ddd499322ab6f2f0d58692955dc9633c2150fc/src/test/regress/sql/aggregates.sql#L1-L143



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28029) Add int2.sql

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28029:
--
Component/s: Tests

> Add int2.sql
> 
>
> Key: SPARK-28029
> URL: https://issues.apache.org/jira/browse/SPARK-28029
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> [https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-14 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864189#comment-16864189
 ] 

Liang-Chi Hsieh commented on SPARK-28054:
-

Is this query working on Hive?

> Unable to insert partitioned table dynamically when partition name is upper 
> case
> 
>
> Key: SPARK-28054
> URL: https://issues.apache.org/jira/browse/SPARK-28054
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ChenKai
>Priority: Major
>
> {code:java}
> -- create sql and column name is upper case
> CREATE TABLE src (KEY INT, VALUE STRING) PARTITIONED BY (DS STRING)
> -- insert sql
> INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
> {code}
> The error is:
> {code:java}
> Error in query: 
> org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: 
> Partition spec {ds=, DS=1} contains non-partition columns;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-14 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864107#comment-16864107
 ] 

Liang-Chi Hsieh commented on SPARK-28043:
-

To make duplicate JSON keys work, I think about it and look at our current 
implementation. One concern is that how do we know which key maps to which 
Spark SQL field?

Suppose we have two duplicate keys "a" as above. We infer the schema of Spark 
SQL as "a string, a string". Does the order of keys in JSON string imply the 
order of fields? In our current implementation, such mapping doesn't exist. It 
means the order of keys can be different in each JSON string.

Isn't it prone to unware error when reading JSON?

Another option to forbid duplicate JSON keys. Maybe add a legacy config for 
fallback to current behavior, if we don't want to break existing code?





> Reading json with duplicate columns drops the first column value
> 
>
> Key: SPARK-28043
> URL: https://issues.apache.org/jira/browse/SPARK-28043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Mukul Murthy
>Priority: Major
>
> When reading a JSON blob with duplicate fields, Spark appears to ignore the 
> value of the first one. JSON recommends unique names but does not require it; 
> since JSON and Spark SQL both allow duplicate field names, we should fix the 
> bug where the first column value is getting dropped.
>  
> I'm guessing somewhere when parsing JSON, we're turning it into a Map which 
> is causing the first value to be overridden.
>  
> Repro (Python, 2.4):
> {code}
> scala> val jsonRDD = spark.sparkContext.parallelize(Seq("[{ \"a\": \"blah\", 
> \"a\": \"blah2\"} ]"))
> jsonRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at 
> parallelize at :23
> scala> val df = spark.read.json(jsonRDD)
> df: org.apache.spark.sql.DataFrame = [a: string, a: string]   
>   
> scala> df.show
> ++-+
> |   a|a|
> ++-+
> |null|blah2|
> ++-+
> {code}
>  
> The expected response would be:
> {code}
> ++-+
> |   a|a|
> ++-+
> |blah|blah2|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28040) sql() fails to process output of glue::glue_data()

2019-06-14 Thread Michael Chirico (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16864043#comment-16864043
 ] 

Michael Chirico commented on SPARK-28040:
-

glue_data returns class('glue','character') object which is essentially
just character (case in point: as.character.glue(x) returns unclass(x),
i.e. it simply drops the glue class and then you have plain character)

I'll try and file a PR on this this weekend. should be as simple as
checking is.character on input (which glue object would pass).




> sql() fails to process output of glue::glue_data()
> --
>
> Key: SPARK-28040
> URL: https://issues.apache.org/jira/browse/SPARK-28040
> Project: Spark
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.4.3
>Reporter: Michael Chirico
>Priority: Major
>
> {{glue}} package is quite natural for sending parameterized queries to Spark 
> from R. Very similar to Python's {{format}} for strings. Error is as simple as
> {code:java}
> library(glue)
> library(sparkR)
> sparkR.session()
> query = glue_data(list(val = 4), 'select {val}')
> sql(query){code}
> Error in writeType(con, serdeType) : 
>   Unsupported type for serialization glue
> {{sql(as.character(query))}} works as expected but this is a bit awkward / 
> post-hoc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28055) Add delegation token custom AdminClient configurations.

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28055:


Assignee: (was: Apache Spark)

> Add delegation token custom AdminClient configurations.
> ---
>
> Key: SPARK-28055
> URL: https://issues.apache.org/jira/browse/SPARK-28055
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28055) Add delegation token custom AdminClient configurations.

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28055:


Assignee: Apache Spark

> Add delegation token custom AdminClient configurations.
> ---
>
> Key: SPARK-28055
> URL: https://issues.apache.org/jira/browse/SPARK-28055
> Project: Spark
>  Issue Type: Improvement
>  Components: DStreams, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28032) DataFrame.saveAsTable( in AVRO format with Timestamps create bad Hive tables

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28032.
--
Resolution: Won't Fix

> DataFrame.saveAsTable( in AVRO format with Timestamps create bad Hive tables
> 
>
> Key: SPARK-28032
> URL: https://issues.apache.org/jira/browse/SPARK-28032
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Hive 1.1.0
>Reporter: Mathew Wicks
>Priority: Major
>
> I am not sure if it's my very old version of Hive (1.1.0), but when I use the 
> following code, I end up with a table which Spark can read, but Hive cannot.
> That is to say, when writing AVRO format tables, they cannot be read in Hive 
> if they contain timestamp types.
> *Hive error:*
> {code:java}
> Error while compiling statement: FAILED: UnsupportedOperationException 
> timestamp is not supported.
> {code}
> *Spark Code:*
> {code:java}
> import java.sql.Timestamp
> import spark.implicits._
> val currentTime = new Timestamp(System.currentTimeMillis())
>  
> val df = Seq(
>  (currentTime)
> ).toDF()
> df.write.mode("overwrite").format("avro").saveAsTable("database.table_name")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28032) DataFrame.saveAsTable( in AVRO format with Timestamps create bad Hive tables

2019-06-14 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863909#comment-16863909
 ] 

Hyukjin Kwon commented on SPARK-28032:
--

Looks like the error message just described its limitation clearly. What's an 
issue? You can upgrade your Hive version to read.

> DataFrame.saveAsTable( in AVRO format with Timestamps create bad Hive tables
> 
>
> Key: SPARK-28032
> URL: https://issues.apache.org/jira/browse/SPARK-28032
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Spark 2.4.3
> Hive 1.1.0
>Reporter: Mathew Wicks
>Priority: Major
>
> I am not sure if it's my very old version of Hive (1.1.0), but when I use the 
> following code, I end up with a table which Spark can read, but Hive cannot.
> That is to say, when writing AVRO format tables, they cannot be read in Hive 
> if they contain timestamp types.
> *Hive error:*
> {code:java}
> Error while compiling statement: FAILED: UnsupportedOperationException 
> timestamp is not supported.
> {code}
> *Spark Code:*
> {code:java}
> import java.sql.Timestamp
> import spark.implicits._
> val currentTime = new Timestamp(System.currentTimeMillis())
>  
> val df = Seq(
>  (currentTime)
> ).toDF()
> df.write.mode("overwrite").format("avro").saveAsTable("database.table_name")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28035) Test JoinSuite."equi-join is hash-join" is incompatible with its title.

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28035.
--
Resolution: Invalid

It sounds like a question. Let's interact with Spark mailing list before filing 
it as an issue.

> Test JoinSuite."equi-join is hash-join" is incompatible with its title.
> ---
>
> Key: SPARK-28035
> URL: https://issues.apache.org/jira/browse/SPARK-28035
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.4.3
>Reporter: Jiatao Tao
>Priority: Trivial
> Attachments: image-2019-06-13-10-32-06-759.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28040) sql() fails to process output of glue::glue_data()

2019-06-14 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863907#comment-16863907
 ] 

Hyukjin Kwon commented on SPARK-28040:
--

What does {{glue_data}} return? {{sql}} takes strings so it looks natural.

> sql() fails to process output of glue::glue_data()
> --
>
> Key: SPARK-28040
> URL: https://issues.apache.org/jira/browse/SPARK-28040
> Project: Spark
>  Issue Type: Bug
>  Components: R
>Affects Versions: 2.4.3
>Reporter: Michael Chirico
>Priority: Major
>
> {{glue}} package is quite natural for sending parameterized queries to Spark 
> from R. Very similar to Python's {{format}} for strings. Error is as simple as
> {code:java}
> library(glue)
> library(sparkR)
> sparkR.session()
> query = glue_data(list(val = 4), 'select {val}')
> sql(query){code}
> Error in writeType(con, serdeType) : 
>   Unsupported type for serialization glue
> {{sql(as.character(query))}} works as expected but this is a bit awkward / 
> post-hoc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28043:
-
Description: 
When reading a JSON blob with duplicate fields, Spark appears to ignore the 
value of the first one. JSON recommends unique names but does not require it; 
since JSON and Spark SQL both allow duplicate field names, we should fix the 
bug where the first column value is getting dropped.

 

I'm guessing somewhere when parsing JSON, we're turning it into a Map which is 
causing the first value to be overridden.

 

Repro (Python, 2.4):

{code}
scala> val jsonRDD = spark.sparkContext.parallelize(Seq("[{ \"a\": \"blah\", 
\"a\": \"blah2\"} ]"))
jsonRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at 
parallelize at :23
scala> val df = spark.read.json(jsonRDD)
df: org.apache.spark.sql.DataFrame = [a: string, a: string] 
scala> df.show
++-+
|   a|a|
++-+
|null|blah2|
++-+
{code}

 

The expected response would be:

{code}
++-+
|   a|a|
++-+
|blah|blah2|
++-+
{code}


  was:
When reading a JSON blob with duplicate fields, Spark appears to ignore the 
value of the first one. JSON recommends unique names but does not require it; 
since JSON and Spark SQL both allow duplicate field names, we should fix the 
bug where the first column value is getting dropped.

 

I'm guessing somewhere when parsing JSON, we're turning it into a Map which is 
causing the first value to be overridden.

 

Repro (Python, 2.4):

{code}
>>> jsonRDD = spark.sparkContext.parallelize(["\\{ \"a\": \"blah\", \"a\": 
>>> \"blah2\"}"])
 >>> df = spark.read.json(jsonRDD)
 >>> df.show()
 +-++
|a|a|
+-++
|null|blah2|
+-++
{code}

 

The expected response would be:

{code}

+-++
|a|a|
+-++
|blah|blah2|
+-++
{code}



> Reading json with duplicate columns drops the first column value
> 
>
> Key: SPARK-28043
> URL: https://issues.apache.org/jira/browse/SPARK-28043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Mukul Murthy
>Priority: Major
>
> When reading a JSON blob with duplicate fields, Spark appears to ignore the 
> value of the first one. JSON recommends unique names but does not require it; 
> since JSON and Spark SQL both allow duplicate field names, we should fix the 
> bug where the first column value is getting dropped.
>  
> I'm guessing somewhere when parsing JSON, we're turning it into a Map which 
> is causing the first value to be overridden.
>  
> Repro (Python, 2.4):
> {code}
> scala> val jsonRDD = spark.sparkContext.parallelize(Seq("[{ \"a\": \"blah\", 
> \"a\": \"blah2\"} ]"))
> jsonRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[1] at 
> parallelize at :23
> scala> val df = spark.read.json(jsonRDD)
> df: org.apache.spark.sql.DataFrame = [a: string, a: string]   
>   
> scala> df.show
> ++-+
> |   a|a|
> ++-+
> |null|blah2|
> ++-+
> {code}
>  
> The expected response would be:
> {code}
> ++-+
> |   a|a|
> ++-+
> |blah|blah2|
> ++-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28043) Reading json with duplicate columns drops the first column value

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28043:
-
Description: 
When reading a JSON blob with duplicate fields, Spark appears to ignore the 
value of the first one. JSON recommends unique names but does not require it; 
since JSON and Spark SQL both allow duplicate field names, we should fix the 
bug where the first column value is getting dropped.

 

I'm guessing somewhere when parsing JSON, we're turning it into a Map which is 
causing the first value to be overridden.

 

Repro (Python, 2.4):

{code}
>>> jsonRDD = spark.sparkContext.parallelize(["\\{ \"a\": \"blah\", \"a\": 
>>> \"blah2\"}"])
 >>> df = spark.read.json(jsonRDD)
 >>> df.show()
 +-++
|a|a|
+-++
|null|blah2|
+-++
{code}

 

The expected response would be:

{code}

+-++
|a|a|
+-++
|blah|blah2|
+-++
{code}


  was:
When reading a JSON blob with duplicate fields, Spark appears to ignore the 
value of the first one. JSON recommends unique names but does not require it; 
since JSON and Spark SQL both allow duplicate field names, we should fix the 
bug where the first column value is getting dropped.

 

I'm guessing somewhere when parsing JSON, we're turning it into a Map which is 
causing the first value to be overridden.

 

Repro (Python, 2.4):

>>> jsonRDD = spark.sparkContext.parallelize(["\\{ \"a\": \"blah\", \"a\": 
>>> \"blah2\"}"])
 >>> df = spark.read.json(jsonRDD)
 >>> df.show()
 +-++
|a|a|

+-++
|null|blah2|

+-++

 

The expected response would be:

+-++
|a|a|

+-++
|blah|blah2|

+-++


> Reading json with duplicate columns drops the first column value
> 
>
> Key: SPARK-28043
> URL: https://issues.apache.org/jira/browse/SPARK-28043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Mukul Murthy
>Priority: Major
>
> When reading a JSON blob with duplicate fields, Spark appears to ignore the 
> value of the first one. JSON recommends unique names but does not require it; 
> since JSON and Spark SQL both allow duplicate field names, we should fix the 
> bug where the first column value is getting dropped.
>  
> I'm guessing somewhere when parsing JSON, we're turning it into a Map which 
> is causing the first value to be overridden.
>  
> Repro (Python, 2.4):
> {code}
> >>> jsonRDD = spark.sparkContext.parallelize(["\\{ \"a\": \"blah\", \"a\": 
> >>> \"blah2\"}"])
>  >>> df = spark.read.json(jsonRDD)
>  >>> df.show()
>  +-++
> |a|a|
> +-++
> |null|blah2|
> +-++
> {code}
>  
> The expected response would be:
> {code}
> +-++
> |a|a|
> +-++
> |blah|blah2|
> +-++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28048) pyspark.sql.functions.explode will abondon the row which has a empty list column when applied to the column

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28048:
-
Shepherd:   (was: Dongjoon Hyun)

> pyspark.sql.functions.explode will abondon the row which has a empty list 
> column when applied to the column
> ---
>
> Key: SPARK-28048
> URL: https://issues.apache.org/jira/browse/SPARK-28048
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Ma Xinmin
>Priority: Major
>
> {code}
> from pyspark.sql import Row
> from pyspark.sql.functions import explode
> eDF = spark.createDataFrame([Row(a=1, intlist=[1,2], mapfield={"a": "b"}), 
> Row(a=2, intlist=[], mapfield={"a": "b"})])
> eDF = eDF.withColumn('another', explode(eDF.intlist)).collect()
> eDF
> {code}
> The `a=2` row is missing in the output



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28048) pyspark.sql.functions.explode will abondon the row which has a empty list column when applied to the column

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28048.
--
Resolution: Not A Problem

> pyspark.sql.functions.explode will abondon the row which has a empty list 
> column when applied to the column
> ---
>
> Key: SPARK-28048
> URL: https://issues.apache.org/jira/browse/SPARK-28048
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Ma Xinmin
>Priority: Major
>
> {code}
> from pyspark.sql import Row
> from pyspark.sql.functions import explode
> eDF = spark.createDataFrame([Row(a=1, intlist=[1,2], mapfield={"a": "b"}), 
> Row(a=2, intlist=[], mapfield={"a": "b"})])
> eDF = eDF.withColumn('another', explode(eDF.intlist)).collect()
> eDF
> {code}
> The `a=2` row is missing in the output



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28048) pyspark.sql.functions.explode will abondon the row which has a empty list column when applied to the column

2019-06-14 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863901#comment-16863901
 ] 

Hyukjin Kwon commented on SPARK-28048:
--

of course it's missing because no rows to explode. What output do you expect? I 
think it works by design

> pyspark.sql.functions.explode will abondon the row which has a empty list 
> column when applied to the column
> ---
>
> Key: SPARK-28048
> URL: https://issues.apache.org/jira/browse/SPARK-28048
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
>Reporter: Ma Xinmin
>Priority: Major
>
> {code}
> from pyspark.sql import Row
> from pyspark.sql.functions import explode
> eDF = spark.createDataFrame([Row(a=1, intlist=[1,2], mapfield={"a": "b"}), 
> Row(a=2, intlist=[], mapfield={"a": "b"})])
> eDF = eDF.withColumn('another', explode(eDF.intlist)).collect()
> eDF
> {code}
> The `a=2` row is missing in the output



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28052) ArrayExists should follow the three-valued boolean logic.

2019-06-14 Thread Takuya Ueshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-28052:
--
Description: 
Currently {{ArrayExists}} always returns boolean values (if the arguments are 
not null), but it should follow the three-valued boolean logic:
 - {{true}} if the predicate holds at least one {{true}}
 - otherwise, {{null}} if the predicate holds {{null}}
 - otherwise, {{false}}

  was:
Currently {{ArrayExists}} always returns boolean values (if the arguments are 
not null), but it should follow the three-valued boolean logic:
 - {{true}} if the predicate holds at least {{true}}
 - otherwise, {{null}} if the predicate holds {{null}}
 - otherwise, {{false}}


> ArrayExists should follow the three-valued boolean logic.
> -
>
> Key: SPARK-28052
> URL: https://issues.apache.org/jira/browse/SPARK-28052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{ArrayExists}} always returns boolean values (if the arguments are 
> not null), but it should follow the three-valued boolean logic:
>  - {{true}} if the predicate holds at least one {{true}}
>  - otherwise, {{null}} if the predicate holds {{null}}
>  - otherwise, {{false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28055) Add delegation token custom AdminClient configurations.

2019-06-14 Thread Gabor Somogyi (JIRA)
Gabor Somogyi created SPARK-28055:
-

 Summary: Add delegation token custom AdminClient configurations.
 Key: SPARK-28055
 URL: https://issues.apache.org/jira/browse/SPARK-28055
 Project: Spark
  Issue Type: Improvement
  Components: DStreams, Structured Streaming
Affects Versions: 3.0.0
Reporter: Gabor Somogyi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28054) Unable to insert partitioned table dynamically when partition name is upper case

2019-06-14 Thread ChenKai (JIRA)
ChenKai created SPARK-28054:
---

 Summary: Unable to insert partitioned table dynamically when 
partition name is upper case
 Key: SPARK-28054
 URL: https://issues.apache.org/jira/browse/SPARK-28054
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: ChenKai


{code:java}
-- create sql and column name is upper case
CREATE TABLE src (KEY INT, VALUE STRING) PARTITIONED BY (DS STRING)

-- insert sql
INSERT INTO TABLE src PARTITION(ds) SELECT 'k' key, 'v' value, '1' ds
{code}

The error is:

{code:java}
Error in query: 
org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: 
Partition spec {ds=, DS=1} contains non-partition columns;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28050) DataFrameWriter support insertInto a specific table partition

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28050:

Fix Version/s: (was: 2.4.3)
   (was: 2.3.3)

> DataFrameWriter support insertInto a specific table partition
> -
>
> Key: SPARK-28050
> URL: https://issues.apache.org/jira/browse/SPARK-28050
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Leanken.Lin
>Priority: Minor
>
> {code:java}
> // Some comments here
> val ptTableName = "mc_test_pt_table"
> sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY 
> (pt1 STRING, pt2 STRING)")
> val df = spark.sparkContext.parallelize(0 to 99, 2)
>   .map(f =>
> {
>   (s"name-$f", f)
> })
>   .toDF("name", "num")
> // if i want to insert df into a specific partition
> // say pt1='2018',pt2='0601' current api does not supported
> // only with following work around
> df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
> sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
> select * from ${ptTableName}_tmp_view")
> {code}
> Propose to have another API in DataframeWriter that can do somethink like:
> {code:java}
> df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
> {code}
> we have a lot of this kind of scenario in our production env. providing a api 
> like this will make us less painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28050) DataFrameWriter support insertInto a specific table partition

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28050:

Target Version/s:   (was: 2.3.3, 2.4.3)

> DataFrameWriter support insertInto a specific table partition
> -
>
> Key: SPARK-28050
> URL: https://issues.apache.org/jira/browse/SPARK-28050
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Leanken.Lin
>Priority: Minor
> Fix For: 2.3.3, 2.4.3
>
>
> {code:java}
> // Some comments here
> val ptTableName = "mc_test_pt_table"
> sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY 
> (pt1 STRING, pt2 STRING)")
> val df = spark.sparkContext.parallelize(0 to 99, 2)
>   .map(f =>
> {
>   (s"name-$f", f)
> })
>   .toDF("name", "num")
> // if i want to insert df into a specific partition
> // say pt1='2018',pt2='0601' current api does not supported
> // only with following work around
> df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
> sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
> select * from ${ptTableName}_tmp_view")
> {code}
> Propose to have another API in DataframeWriter that can do somethink like:
> {code:java}
> df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
> {code}
> we have a lot of this kind of scenario in our production env. providing a api 
> like this will make us less painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-06-14 Thread pin_zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863796#comment-16863796
 ] 

pin_zhang commented on SPARK-21067:
---

We also encounter this issue, any plan to fix this bug?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at 

[jira] [Resolved] (SPARK-28053) Handle a corner case where there is no `Link` header

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28053.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24874
[https://github.com/apache/spark/pull/24874]

> Handle a corner case where there is no `Link` header
> 
>
> Key: SPARK-28053
> URL: https://issues.apache.org/jira/browse/SPARK-28053
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Currently, `github_jira_sync.py` assumes that there is `Link` always. 
> However, it will fail when the number of the open PR is less than 100 (the 
> default paging number). It will not happen in Apache Spark, but we had better 
> fix that because it happens during review process for `github_jira_sync.py` 
> script.
> {code}
> Traceback (most recent call last):
>   File "dev/github_jira_sync.py", line 139, in 
> jira_prs = get_jira_prs()
>   File "dev/github_jira_sync.py", line 83, in get_jira_prs
> link_header = filter(lambda k: k.startswith("Link"), 
> page.info().headers)[0]
> IndexError: list index out of range
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28053) Handle a corner case where there is no `Link` header

2019-06-14 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28053:


Assignee: Dongjoon Hyun

> Handle a corner case where there is no `Link` header
> 
>
> Key: SPARK-28053
> URL: https://issues.apache.org/jira/browse/SPARK-28053
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>
> Currently, `github_jira_sync.py` assumes that there is `Link` always. 
> However, it will fail when the number of the open PR is less than 100 (the 
> default paging number). It will not happen in Apache Spark, but we had better 
> fix that because it happens during review process for `github_jira_sync.py` 
> script.
> {code}
> Traceback (most recent call last):
>   File "dev/github_jira_sync.py", line 139, in 
> jira_prs = get_jira_prs()
>   File "dev/github_jira_sync.py", line 83, in get_jira_prs
> link_header = filter(lambda k: k.startswith("Link"), 
> page.info().headers)[0]
> IndexError: list index out of range
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28053) Handle a corner case where there is no `Link` header

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28053:


Assignee: (was: Apache Spark)

> Handle a corner case where there is no `Link` header
> 
>
> Key: SPARK-28053
> URL: https://issues.apache.org/jira/browse/SPARK-28053
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>
> Currently, `github_jira_sync.py` assumes that there is `Link` always. 
> However, it will fail when the number of the open PR is less than 100 (the 
> default paging number). It will not happen in Apache Spark, but we had better 
> fix that because it happens during review process for `github_jira_sync.py` 
> script.
> {code}
> Traceback (most recent call last):
>   File "dev/github_jira_sync.py", line 139, in 
> jira_prs = get_jira_prs()
>   File "dev/github_jira_sync.py", line 83, in get_jira_prs
> link_header = filter(lambda k: k.startswith("Link"), 
> page.info().headers)[0]
> IndexError: list index out of range
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28053) Handle a corner case where there is no `Link` header

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28053:


Assignee: Apache Spark

> Handle a corner case where there is no `Link` header
> 
>
> Key: SPARK-28053
> URL: https://issues.apache.org/jira/browse/SPARK-28053
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Trivial
>
> Currently, `github_jira_sync.py` assumes that there is `Link` always. 
> However, it will fail when the number of the open PR is less than 100 (the 
> default paging number). It will not happen in Apache Spark, but we had better 
> fix that because it happens during review process for `github_jira_sync.py` 
> script.
> {code}
> Traceback (most recent call last):
>   File "dev/github_jira_sync.py", line 139, in 
> jira_prs = get_jira_prs()
>   File "dev/github_jira_sync.py", line 83, in get_jira_prs
> link_header = filter(lambda k: k.startswith("Link"), 
> page.info().headers)[0]
> IndexError: list index out of range
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28023) Trim the string when cast string type to other types

2019-06-14 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28023:

Comment: was deleted

(was: I'm working on.)

> Trim the string when cast string type to other types
> 
>
> Key: SPARK-28023
> URL: https://issues.apache.org/jira/browse/SPARK-28023
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> For example:
> {code:sql}
> SELECT bool '   f   ';
> select int2 '  21234 ';
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28053) Handle a corner case where there is no `Link` header

2019-06-14 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28053:
-

 Summary: Handle a corner case where there is no `Link` header
 Key: SPARK-28053
 URL: https://issues.apache.org/jira/browse/SPARK-28053
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


Currently, `github_jira_sync.py` assumes that there is `Link` always. However, 
it will fail when the number of the open PR is less than 100 (the default 
paging number). It will not happen in Apache Spark, but we had better fix that 
because it happens during review process for `github_jira_sync.py` script.
{code}
Traceback (most recent call last):
  File "dev/github_jira_sync.py", line 139, in 
jira_prs = get_jira_prs()
  File "dev/github_jira_sync.py", line 83, in get_jira_prs
link_header = filter(lambda k: k.startswith("Link"), page.info().headers)[0]
IndexError: list index out of range
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28052) ArrayExists should follow the three-valued boolean logic.

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28052:


Assignee: Apache Spark

> ArrayExists should follow the three-valued boolean logic.
> -
>
> Key: SPARK-28052
> URL: https://issues.apache.org/jira/browse/SPARK-28052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Currently {{ArrayExists}} always returns boolean values (if the arguments are 
> not null), but it should follow the three-valued boolean logic:
>  - {{true}} if the predicate holds at least {{true}}
>  - otherwise, {{null}} if the predicate holds {{null}}
>  - otherwise, {{false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28052) ArrayExists should follow the three-valued boolean logic.

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28052:


Assignee: (was: Apache Spark)

> ArrayExists should follow the three-valued boolean logic.
> -
>
> Key: SPARK-28052
> URL: https://issues.apache.org/jira/browse/SPARK-28052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{ArrayExists}} always returns boolean values (if the arguments are 
> not null), but it should follow the three-valued boolean logic:
>  - {{true}} if the predicate holds at least {{true}}
>  - otherwise, {{null}} if the predicate holds {{null}}
>  - otherwise, {{false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28052) ArrayExists should follow the three-valued boolean logic.

2019-06-14 Thread Takuya Ueshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-28052:
--
Description: 
Currently {{ArrayExists}} always returns boolean values (if the arguments are 
not null), but it should follow the three-valued boolean logic:
 - {{true}} if the predicate holds at least {{true}}
 - otherwise, {{null}} if the predicate holds {{null}}
 - otherwise, {{false}}

  was:
Currently {{ArrayExists}} always returns boolean values (if the arguments are 
not null), but it should follow the three-valued boolean logic:
 - {{true}} if the predicate holds {{true}}
 - otherwise, {{null}} if the predicate holds {{null}}
 - otherwise, {{false}}


> ArrayExists should follow the three-valued boolean logic.
> -
>
> Key: SPARK-28052
> URL: https://issues.apache.org/jira/browse/SPARK-28052
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Takuya Ueshin
>Priority: Major
>
> Currently {{ArrayExists}} always returns boolean values (if the arguments are 
> not null), but it should follow the three-valued boolean logic:
>  - {{true}} if the predicate holds at least {{true}}
>  - otherwise, {{null}} if the predicate holds {{null}}
>  - otherwise, {{false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28023) Trim the string when cast string type to other types

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28023:


Assignee: (was: Apache Spark)

> Trim the string when cast string type to other types
> 
>
> Key: SPARK-28023
> URL: https://issues.apache.org/jira/browse/SPARK-28023
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> For example:
> {code:sql}
> SELECT bool '   f   ';
> select int2 '  21234 ';
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28023) Trim the string when cast string type to other types

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28023:


Assignee: Apache Spark

> Trim the string when cast string type to other types
> 
>
> Key: SPARK-28023
> URL: https://issues.apache.org/jira/browse/SPARK-28023
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> For example:
> {code:sql}
> SELECT bool '   f   ';
> select int2 '  21234 ';
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28052) ArrayExists should follow the three-valued boolean logic.

2019-06-14 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-28052:
-

 Summary: ArrayExists should follow the three-valued boolean logic.
 Key: SPARK-28052
 URL: https://issues.apache.org/jira/browse/SPARK-28052
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: Takuya Ueshin


Currently {{ArrayExists}} always returns boolean values (if the arguments are 
not null), but it should follow the three-valued boolean logic:
 - {{true}} if the predicate holds {{true}}
 - otherwise, {{null}} if the predicate holds {{null}}
 - otherwise, {{false}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28051) Exposing JIRA issue component types at GitHub PRs

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28051:


Assignee: (was: Apache Spark)

> Exposing JIRA issue component types at GitHub PRs
> -
>
> Key: SPARK-28051
> URL: https://issues.apache.org/jira/browse/SPARK-28051
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue aims to expose JIRA issue component types at GitHub PRs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28051) Exposing JIRA issue component types at GitHub PRs

2019-06-14 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28051:


Assignee: Apache Spark

> Exposing JIRA issue component types at GitHub PRs
> -
>
> Key: SPARK-28051
> URL: https://issues.apache.org/jira/browse/SPARK-28051
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> This issue aims to expose JIRA issue component types at GitHub PRs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28050) DataFrameWriter support insertInto a specific table partition

2019-06-14 Thread Leanken.Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leanken.Lin updated SPARK-28050:

Description: 
{code:java}
// Some comments here
val ptTableName = "mc_test_pt_table"
sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY (pt1 
STRING, pt2 STRING)")

val df = spark.sparkContext.parallelize(0 to 99, 2)
  .map(f =>
{
  (s"name-$f", f)
})
  .toDF("name", "num")

// if i want to insert df into a specific partition
// say pt1='2018',pt2='0601' current api does not supported
// only with following work around
df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
select * from ${ptTableName}_tmp_view")
{code}

Propose to have another API in DataframeWriter that can do somethink like:

{code:java}
df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
{code}


we have a lot of this kind of scenario in our production env. providing a api 
like this will make us less painful.

  was:
```
val ptTableName = "mc_test_pt_table"
sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY (pt1 
STRING, pt2 STRING)")

val df = spark.sparkContext.parallelize(0 to 99, 2)
  .map(f =>
{
  (s"name-$f", f)
})
  .toDF("name", "num")

// if i want to insert df into a specific partition
// say pt1='2018',pt2='0601' current api does not supported
// only with following work around
df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
select * from ${ptTableName}_tmp_view")
```

Propose to have another API in DataframeWriter that can do somethink like:

```
df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
```

we have a lot of this kind of scenario in our production env. providing a api 
like this will make us less painful.


> DataFrameWriter support insertInto a specific table partition
> -
>
> Key: SPARK-28050
> URL: https://issues.apache.org/jira/browse/SPARK-28050
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Leanken.Lin
>Priority: Minor
> Fix For: 2.3.3, 2.4.3
>
>
> {code:java}
> // Some comments here
> val ptTableName = "mc_test_pt_table"
> sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY 
> (pt1 STRING, pt2 STRING)")
> val df = spark.sparkContext.parallelize(0 to 99, 2)
>   .map(f =>
> {
>   (s"name-$f", f)
> })
>   .toDF("name", "num")
> // if i want to insert df into a specific partition
> // say pt1='2018',pt2='0601' current api does not supported
> // only with following work around
> df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
> sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
> select * from ${ptTableName}_tmp_view")
> {code}
> Propose to have another API in DataframeWriter that can do somethink like:
> {code:java}
> df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
> {code}
> we have a lot of this kind of scenario in our production env. providing a api 
> like this will make us less painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28051) Exposing JIRA issue component types at GitHub PRs

2019-06-14 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-28051:
-

 Summary: Exposing JIRA issue component types at GitHub PRs
 Key: SPARK-28051
 URL: https://issues.apache.org/jira/browse/SPARK-28051
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


This issue aims to expose JIRA issue component types at GitHub PRs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28050) DataFrameWriter support insertInto a specific table partition

2019-06-14 Thread Leanken.Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leanken.Lin updated SPARK-28050:

Description: 
```
val ptTableName = "mc_test_pt_table"
sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY (pt1 
STRING, pt2 STRING)")

val df = spark.sparkContext.parallelize(0 to 99, 2)
  .map(f =>
{
  (s"name-$f", f)
})
  .toDF("name", "num")

// if i want to insert df into a specific partition
// say pt1='2018',pt2='0601' current api does not supported
// only with following work around
df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
select * from ${ptTableName}_tmp_view")
```

Propose to have another API in DataframeWriter that can do somethink like:

```
df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
```

we have a lot of this kind of scenario in our production env. providing a api 
like this will make us less painful.

  was:
val ptTableName = "mc_test_pt_table"
sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY (pt1 
STRING, pt2 STRING)")

val df = spark.sparkContext.parallelize(0 to 99, 2)
  .map(f =>
{
  (s"name-$f", f)
})
  .toDF("name", "num")

// if i want to insert df into a specific partition
// say pt1='2018',pt2='0601' current api does not supported
// only with following work around
df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
select * from ${ptTableName}_tmp_view")

Propose to have another API in DataframeWriter that can do somethink like:
df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")

we have a lot of this kind of scenario in our production env. providing a api 
like this will make us less painful.


> DataFrameWriter support insertInto a specific table partition
> -
>
> Key: SPARK-28050
> URL: https://issues.apache.org/jira/browse/SPARK-28050
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Leanken.Lin
>Priority: Minor
> Fix For: 2.3.3, 2.4.3
>
>
> ```
> val ptTableName = "mc_test_pt_table"
> sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY 
> (pt1 STRING, pt2 STRING)")
> val df = spark.sparkContext.parallelize(0 to 99, 2)
>   .map(f =>
> {
>   (s"name-$f", f)
> })
>   .toDF("name", "num")
> // if i want to insert df into a specific partition
> // say pt1='2018',pt2='0601' current api does not supported
> // only with following work around
> df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
> sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
> select * from ${ptTableName}_tmp_view")
> ```
> Propose to have another API in DataframeWriter that can do somethink like:
> ```
> df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")
> ```
> we have a lot of this kind of scenario in our production env. providing a api 
> like this will make us less painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28050) DataFrameWriter support insertInto a specific table partition

2019-06-14 Thread Leanken.Lin (JIRA)
Leanken.Lin created SPARK-28050:
---

 Summary: DataFrameWriter support insertInto a specific table 
partition
 Key: SPARK-28050
 URL: https://issues.apache.org/jira/browse/SPARK-28050
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.3, 2.3.3
Reporter: Leanken.Lin
 Fix For: 2.4.3, 2.3.3


val ptTableName = "mc_test_pt_table"
sql(s"CREATE TABLE ${ptTableName} (name STRING, num BIGINT) PARTITIONED BY (pt1 
STRING, pt2 STRING)")

val df = spark.sparkContext.parallelize(0 to 99, 2)
  .map(f =>
{
  (s"name-$f", f)
})
  .toDF("name", "num")

// if i want to insert df into a specific partition
// say pt1='2018',pt2='0601' current api does not supported
// only with following work around
df.createOrReplaceTempView(s"${ptTableName}_tmp_view")
sql(s"insert into table ${ptTableName} partition (pt1='2018', pt2='0601') 
select * from ${ptTableName}_tmp_view")

Propose to have another API in DataframeWriter that can do somethink like:
df.write.insertInto(ptTableName, "pt1='2018',pt2='0601'")

we have a lot of this kind of scenario in our production env. providing a api 
like this will make us less painful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27100) dag-scheduler-event-loop" java.lang.StackOverflowError

2019-06-14 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27100:
--
Component/s: (was: MLlib)
 SQL

> dag-scheduler-event-loop" java.lang.StackOverflowError
> --
>
> Key: SPARK-27100
> URL: https://issues.apache.org/jira/browse/SPARK-27100
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.3.3
>Reporter: KaiXu
>Priority: Major
> Attachments: SPARK-27100-Overflow.txt, stderr
>
>
> ALS in Spark MLlib causes StackOverflow:
>  /opt/sparkml/spark213/bin/spark-submit  --properties-file 
> /opt/HiBench/report/als/spark/conf/sparkbench/spark.conf --class 
> com.intel.hibench.sparkbench.ml.ALSExample --master yarn-client 
> --num-executors 3 --executor-memory 322g 
> /opt/HiBench/sparkbench/assembly/target/sparkbench-assembly-7.1-SNAPSHOT-dist.jar
>  --numUsers 4 --numProducts 6 --rank 100 --numRecommends 20 
> --numIterations 100 --kryo false --implicitPrefs true --numProductBlocks -1 
> --numUserBlocks -1 --lambda 1.0 hdfs://bdw-slave20:8020/HiBench/ALS/Input
>  
> Exception in thread "dag-scheduler-event-loop" java.lang.StackOverflowError
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1534)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>  at 
> scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
>  at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>  at 
> scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
>  at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>  at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>  at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>  at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28049) i want to a first ticket in zira

2019-06-14 Thread sanjeet (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sanjeet resolved SPARK-28049.
-
Resolution: Fixed

code change has been delivered

> i want to a first ticket in zira
> 
>
> Key: SPARK-28049
> URL: https://issues.apache.org/jira/browse/SPARK-28049
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 2.2.2, 2.4.3
> Environment: I just want to test in zira
>Reporter: sanjeet
>Priority: Minor
>  Labels: test
> Fix For: 2.4.4, 2.4.3
>
>
> I just want to test in zira



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28049) i want to a first ticket in zira

2019-06-14 Thread sanjeet (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863742#comment-16863742
 ] 

sanjeet commented on SPARK-28049:
-

2 nd comment

> i want to a first ticket in zira
> 
>
> Key: SPARK-28049
> URL: https://issues.apache.org/jira/browse/SPARK-28049
> Project: Spark
>  Issue Type: Test
>  Components: Build
>Affects Versions: 2.2.2, 2.4.3
> Environment: I just want to test in zira
>Reporter: sanjeet
>Priority: Minor
>  Labels: test
> Fix For: 2.4.4, 2.4.3
>
>
> I just want to test in zira



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28049) i want to a first ticket in zira

2019-06-14 Thread sanjeet (JIRA)
sanjeet created SPARK-28049:
---

 Summary: i want to a first ticket in zira
 Key: SPARK-28049
 URL: https://issues.apache.org/jira/browse/SPARK-28049
 Project: Spark
  Issue Type: Test
  Components: Build
Affects Versions: 2.4.3, 2.2.2
 Environment: I just want to test in zira
Reporter: sanjeet
 Fix For: 2.4.4, 2.4.3


I just want to test in zira



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org