[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2018-08-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters.

[jira] [Resolved] (SPARK-23555) Add BinaryType support for Arrow in PySpark

2018-08-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-23555. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20725

[jira] [Assigned] (SPARK-23555) Add BinaryType support for Arrow in PySpark

2018-08-17 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler reassigned SPARK-23555: Assignee: Bryan Cutler > Add BinaryType support for Arrow in PySpark >

[jira] [Assigned] (SPARK-25149) Personalized Page Rank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25149: Assignee: Apache Spark > Personalized Page Rank raises an error if vertexIDs are >

[jira] [Commented] (SPARK-25149) Personalized Page Rank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584513#comment-16584513 ] Apache Spark commented on SPARK-25149: -- User 'MrBago' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25149) Personalized Page Rank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25149: Assignee: (was: Apache Spark) > Personalized Page Rank raises an error if vertexIDs

[jira] [Reopened] (SPARK-25126) avoid creating OrcFile.Reader for all orc files

2018-08-17 Thread Rao Fu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rao Fu reopened SPARK-25126: Updated the bug description to accurately describe the issue.  > avoid creating OrcFile.Reader for all orc

[jira] [Updated] (SPARK-25126) avoid creating OrcFile.Reader for all orc files

2018-08-17 Thread Rao Fu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rao Fu updated SPARK-25126: --- Priority: Minor (was: Major) Description: We have a spark job that starts by reading orc files

[jira] [Assigned] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25151: Assignee: (was: Apache Spark) > Apply Apache Commons Pool to KafkaDataConsumer >

[jira] [Commented] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584492#comment-16584492 ] Apache Spark commented on SPARK-25151: -- User 'HeartSaVioR' has created a pull request for this

[jira] [Assigned] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25151: Assignee: Apache Spark > Apply Apache Commons Pool to KafkaDataConsumer >

[jira] [Resolved] (SPARK-25143) Support data source name mapping configuration

2018-08-17 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-25143. --- Resolution: Won't Do According to the discussion on the PR, I close this. > Support data

[jira] [Created] (SPARK-25152) Enable Spark on Kubernetes R Integration Tests

2018-08-17 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-25152: -- Summary: Enable Spark on Kubernetes R Integration Tests Key: SPARK-25152 URL: https://issues.apache.org/jira/browse/SPARK-25152 Project: Spark Issue Type: Test

[jira] [Updated] (SPARK-24433) Add Spark R support

2018-08-17 Thread Matt Cheah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Cheah updated SPARK-24433: --- Fix Version/s: 2.4.0 > Add Spark R support > --- > > Key:

[jira] [Resolved] (SPARK-24433) Add Spark R support

2018-08-17 Thread Matt Cheah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Cheah resolved SPARK-24433. Resolution: Fixed > Add Spark R support > --- > > Key:

[jira] [Updated] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-25151: - Description: KafkaDataConsumer contains its own logic for caching InternalKafkaConsumer which

[jira] [Commented] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584478#comment-16584478 ] Jungtaek Lim commented on SPARK-25151: -- Working on it. Will provide a patch shortly. > Apply

[jira] [Updated] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-25151: - Environment: (was: KafkaDataConsumer contains its own logic for caching

[jira] [Created] (SPARK-25151) Apply Apache Commons Pool to KafkaDataConsumer

2018-08-17 Thread Jungtaek Lim (JIRA)
Jungtaek Lim created SPARK-25151: Summary: Apply Apache Commons Pool to KafkaDataConsumer Key: SPARK-25151 URL: https://issues.apache.org/jira/browse/SPARK-25151 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-25116) Fix the "exit code 1" error when terminating Kafka tests

2018-08-17 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-25116. -- Resolution: Fixed Fix Version/s: 2.4.0 > Fix the "exit code 1" error when terminating

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584393#comment-16584393 ] koert kuipers commented on SPARK-17916: --- now the particular unit test that broke for us, where

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584381#comment-16584381 ] koert kuipers commented on SPARK-17916: --- we also use csv format to write files like for example

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584378#comment-16584378 ] koert kuipers commented on SPARK-17916: --- my first observation is that if i do this: {code:scala}

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584374#comment-16584374 ] koert kuipers commented on SPARK-17916: --- hi [~maxgekk] i saw your unit test for the old behavior.

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread Maxim Gekk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584357#comment-16584357 ] Maxim Gekk commented on SPARK-17916: > he default behavior in 2.3.x for csv format is that when i

[jira] [Updated] (SPARK-25149) Personalized Page Rank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-25149: Summary: Personalized Page Rank raises an error if vertexIDs are > MaxInt (was:

[jira] [Commented] (SPARK-25083) remove the type erasure hack in data source scan

2018-08-17 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584349#comment-16584349 ] Ryan Blue commented on SPARK-25083: --- [~cloud_fan], how large of a refactor is this? I think this is

[jira] [Comment Edited] (SPARK-24882) data source v2 API improvement

2018-08-17 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584343#comment-16584343 ] Ryan Blue edited comment on SPARK-24882 at 8/17/18 8:21 PM: One more thing:

[jira] [Commented] (SPARK-24882) data source v2 API improvement

2018-08-17 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584343#comment-16584343 ] Ryan Blue commented on SPARK-24882: --- One more thing: I think we should separate BatchOverwriteSupport

[jira] [Comment Edited] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584324#comment-16584324 ] koert kuipers edited comment on SPARK-17916 at 8/17/18 8:05 PM: the

[jira] [Comment Edited] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584324#comment-16584324 ] koert kuipers edited comment on SPARK-17916 at 8/17/18 8:03 PM: the

[jira] [Comment Edited] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584324#comment-16584324 ] koert kuipers edited comment on SPARK-17916 at 8/17/18 7:54 PM: the

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-08-17 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584324#comment-16584324 ] koert kuipers commented on SPARK-17916: --- the default behavior in 2.3.x for csv format is that when

[jira] [Commented] (SPARK-24882) data source v2 API improvement

2018-08-17 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584305#comment-16584305 ] Ryan Blue commented on SPARK-24882: --- {{BatchOverwriteSupport}} should extend {{BatchWriteSupport}}. If

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584279#comment-16584279 ] Tomasz Gawęda commented on SPARK-25150: --- [~nchammas] Maybe it's related to:

[jira] [Assigned] (SPARK-23042) Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier

2018-08-17 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-23042: --- Assignee: Liang-Chi Hsieh > Use OneHotEncoderModel to encode labels in

[jira] [Resolved] (SPARK-23042) Use OneHotEncoderModel to encode labels in MultilayerPerceptronClassifier

2018-08-17 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-23042. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20232

[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584251#comment-16584251 ] Apache Spark commented on SPARK-25124: -- User 'huaxingao' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25124: Assignee: (was: Apache Spark) > VectorSizeHint.size is buggy, breaking streaming

[jira] [Assigned] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25124: Assignee: Apache Spark > VectorSizeHint.size is buggy, breaking streaming pipeline >

[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584239#comment-16584239 ] Nicholas Chammas edited comment on SPARK-25150 at 8/17/18 6:15 PM: --- I

[jira] [Commented] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584239#comment-16584239 ] Nicholas Chammas commented on SPARK-25150: -- I know there are a bunch of pending bug fixes in

[jira] [Updated] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-25150: - Attachment: zombie-analysis.py states.csv persons.csv

[jira] [Created] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-25150: Summary: Joining DataFrames derived from the same source yields confusing/incorrect results Key: SPARK-25150 URL: https://issues.apache.org/jira/browse/SPARK-25150

[jira] [Created] (SPARK-25149) ParallelPersonalizedPageRank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-25149: --- Summary: ParallelPersonalizedPageRank raises an error if vertexIDs are > MaxInt Key: SPARK-25149 URL: https://issues.apache.org/jira/browse/SPARK-25149

[jira] [Commented] (SPARK-25124) VectorSizeHint.size is buggy, breaking streaming pipeline

2018-08-17 Thread Huaxin Gao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584215#comment-16584215 ] Huaxin Gao commented on SPARK-25124: I will submit a PR very soon.  > VectorSizeHint.size is buggy,

[jira] [Created] (SPARK-25148) Executors launched with Spark on K8s client mode should prefix name with spark.app.name

2018-08-17 Thread Timothy Chen (JIRA)
Timothy Chen created SPARK-25148: Summary: Executors launched with Spark on K8s client mode should prefix name with spark.app.name Key: SPARK-25148 URL: https://issues.apache.org/jira/browse/SPARK-25148

[jira] [Commented] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-17 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584190#comment-16584190 ] shane knapp commented on SPARK-23874: - confirmed that pyarrow/pyspark-sql tests pass w/arrow 0.10.0

[jira] [Created] (SPARK-25147) GroupedData.apply pandas_udf crashing

2018-08-17 Thread Mike Sukmanowsky (JIRA)
Mike Sukmanowsky created SPARK-25147: Summary: GroupedData.apply pandas_udf crashing Key: SPARK-25147 URL: https://issues.apache.org/jira/browse/SPARK-25147 Project: Spark Issue Type:

[jira] [Commented] (SPARK-25138) Spark Shell should show the Scala prompt after initialization is complete

2018-08-17 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584086#comment-16584086 ] DB Tsai commented on SPARK-25138: - [~smilegator] This regression is introduced by newer version of Scala

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584075#comment-16584075 ] Liang-Chi Hsieh commented on SPARK-25144: - I'm not sure if there is, can you build it? >

[jira] [Commented] (SPARK-25138) Spark Shell should show the Scala prompt after initialization is complete

2018-08-17 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584072#comment-16584072 ] Xiao Li commented on SPARK-25138: - [~mgaido] Thanks! > Spark Shell should show the Scala prompt after

[jira] [Assigned] (SPARK-25093) CodeFormatter could avoid creating regex object again and again

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25093: Assignee: Apache Spark > CodeFormatter could avoid creating regex object again and again

[jira] [Assigned] (SPARK-25093) CodeFormatter could avoid creating regex object again and again

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25093: Assignee: (was: Apache Spark) > CodeFormatter could avoid creating regex object

[jira] [Commented] (SPARK-25093) CodeFormatter could avoid creating regex object again and again

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584060#comment-16584060 ] Apache Spark commented on SPARK-25093: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Ayoub Benali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584042#comment-16584042 ] Ayoub Benali commented on SPARK-25144: -- [~viirya] I haven't tried on master branch, is there any

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584037#comment-16584037 ] Liang-Chi Hsieh commented on SPARK-25144: - Have you tried on master branch? Tried with

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Ayoub Benali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584021#comment-16584021 ] Ayoub Benali commented on SPARK-25144: -- Somehow adding cache before calling "isEmpty" avoids the

[jira] [Updated] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Ayoub Benali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayoub Benali updated SPARK-25144: - Description: The following code example:  {code} case class Foo(bar: Option[String]) val ds =

[jira] [Commented] (SPARK-24882) data source v2 API improvement

2018-08-17 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584017#comment-16584017 ] Wenchen Fan commented on SPARK-24882: - I'm trying to add it in my PR, but having some problems

[jira] [Commented] (SPARK-25146) avg() returns null on some decimals

2018-08-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584016#comment-16584016 ] Marco Gaido commented on SPARK-25146: - No problem, thanks for reporting this anyway. > avg()

[jira] [Commented] (SPARK-25146) avg() returns null on some decimals

2018-08-17 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584015#comment-16584015 ] Daniel Darabos commented on SPARK-25146: Wonderful, thanks! Sorry I missed the fix. > avg()

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Ayoub Benali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583990#comment-16583990 ] Ayoub Benali commented on SPARK-25144: -- [~viirya] the same happens with spark 2.2.1 I assume that

[jira] [Updated] (SPARK-25117) Add EXEPT ALL and INTERSECT ALL support in R.

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-25117: Issue Type: Improvement (was: Bug) > Add EXEPT ALL and INTERSECT ALL support in R. >

[jira] [Commented] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583966#comment-16583966 ] Liang-Chi Hsieh commented on SPARK-25144: - Hmm, can't reproduce this on master branch, so it is

[jira] [Commented] (SPARK-24787) Events being dropped at an alarming rate due to hsync being slow for eventLogging

2018-08-17 Thread Sanket Reddy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583964#comment-16583964 ] Sanket Reddy commented on SPARK-24787: -- Thanks [~ste...@apache.org] [~vanzin] [~tgraves] it seems

[jira] [Commented] (SPARK-25146) avg() returns null on some decimals

2018-08-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583936#comment-16583936 ] Marco Gaido commented on SPARK-25146: - This has been fixed by SPARK-24957. On the current master

[jira] [Resolved] (SPARK-25146) avg() returns null on some decimals

2018-08-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-25146. - Resolution: Duplicate > avg() returns null on some decimals >

[jira] [Commented] (SPARK-25145) Buffer size too small on spark.sql query with filterPushdown predicate=True

2018-08-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583928#comment-16583928 ] Marco Gaido commented on SPARK-25145: - cc [~dongjoon] > Buffer size too small on spark.sql query

[jira] [Updated] (SPARK-25145) Buffer size too small on spark.sql query with filterPushdown predicate=True

2018-08-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-25145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørnar Jensen updated SPARK-25145: --- Attachment: report.txt > Buffer size too small on spark.sql query with filterPushdown

[jira] [Updated] (SPARK-25146) avg() returns null on some decimals

2018-08-17 Thread Daniel Darabos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Darabos updated SPARK-25146: --- Description: We compute some 0-10 numbers in a pipeline using Spark SQL. Then we average

[jira] [Updated] (SPARK-25145) Buffer size too small on spark.sql query with filterPushdown predicate=True

2018-08-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-25145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bjørnar Jensen updated SPARK-25145: --- Attachment: create_bug.py > Buffer size too small on spark.sql query with filterPushdown

[jira] [Created] (SPARK-25146) avg() returns null on some decimals

2018-08-17 Thread Daniel Darabos (JIRA)
Daniel Darabos created SPARK-25146: -- Summary: avg() returns null on some decimals Key: SPARK-25146 URL: https://issues.apache.org/jira/browse/SPARK-25146 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-25145) Buffer size too small on spark.sql query with filterPushdown predicate=True

2018-08-17 Thread JIRA
Bjørnar Jensen created SPARK-25145: -- Summary: Buffer size too small on spark.sql query with filterPushdown predicate=True Key: SPARK-25145 URL: https://issues.apache.org/jira/browse/SPARK-25145

[jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source

2018-08-17 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583868#comment-16583868 ] Dongjoon Hyun commented on SPARK-24924: --- Hi, All. I created SPARK-25143 as a more general and

[jira] [Created] (SPARK-25144) distinct on Dataset leads to exception due to Managed memory leak detected

2018-08-17 Thread Ayoub Benali (JIRA)
Ayoub Benali created SPARK-25144: Summary: distinct on Dataset leads to exception due to Managed memory leak detected Key: SPARK-25144 URL: https://issues.apache.org/jira/browse/SPARK-25144

[jira] [Commented] (SPARK-25143) Support data source name mapping configuration

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583859#comment-16583859 ] Apache Spark commented on SPARK-25143: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Assigned] (SPARK-25143) Support data source name mapping configuration

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25143: Assignee: (was: Apache Spark) > Support data source name mapping configuration >

[jira] [Assigned] (SPARK-25143) Support data source name mapping configuration

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25143: Assignee: Apache Spark > Support data source name mapping configuration >

[jira] [Updated] (SPARK-25143) Support data source name mapping configuration

2018-08-17 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25143: -- Summary: Support data source name mapping configuration (was: Support custom map for data

[jira] [Created] (SPARK-25143) Support custom map for data sources

2018-08-17 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-25143: - Summary: Support custom map for data sources Key: SPARK-25143 URL: https://issues.apache.org/jira/browse/SPARK-25143 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25143) Support data source name mapping configuration

2018-08-17 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25143: -- Description: Currently, for better UX, Apache Spark provides data source backward

[jira] [Resolved] (SPARK-25066) Provide Spark R image for deploying Spark on kubernetes.

2018-08-17 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma resolved SPARK-25066. - Resolution: Duplicate Ahh found it, sorry for the mess. > Provide Spark R image for

[jira] [Updated] (SPARK-25065) Provide a way to add a custom logging configuration file.

2018-08-17 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma updated SPARK-25065: Issue Type: Bug (was: Improvement) > Provide a way to add a custom logging configuration

[jira] [Updated] (SPARK-25065) Driver and executors pick the wrong logging configuration file.

2018-08-17 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Sharma updated SPARK-25065: Summary: Driver and executors pick the wrong logging configuration file. (was: Provide a

[jira] [Commented] (SPARK-25065) Provide a way to add a custom logging configuration file.

2018-08-17 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583789#comment-16583789 ] Prashant Sharma commented on SPARK-25065: - Since this picks up the wrong logging properties file

[jira] [Commented] (SPARK-25066) Provide Spark R image for deploying Spark on kubernetes.

2018-08-17 Thread Prashant Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583788#comment-16583788 ] Prashant Sharma commented on SPARK-25066: - Yes, I was planning to work on R support, is it

[jira] [Commented] (SPARK-25129) Make the mapping of com.databricks.spark.avro to built-in module configurable

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583761#comment-16583761 ] Apache Spark commented on SPARK-25129: -- User 'gengliangwang' has created a pull request for this

[jira] [Assigned] (SPARK-25142) Add error messages when Python worker could not open socket in `_load_from_socket`.

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25142: Assignee: (was: Apache Spark) > Add error messages when Python worker could not open

[jira] [Commented] (SPARK-25142) Add error messages when Python worker could not open socket in `_load_from_socket`.

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583757#comment-16583757 ] Apache Spark commented on SPARK-25142: -- User 'ueshin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-25142) Add error messages when Python worker could not open socket in `_load_from_socket`.

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25142: Assignee: Apache Spark > Add error messages when Python worker could not open socket in

[jira] [Updated] (SPARK-25129) Make the mapping of com.databricks.spark.avro to built-in module configurable

2018-08-17 Thread Gengliang Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-25129: --- Description: In https://issues.apache.org/jira/browse/SPARK-24924, the data source provider 

[jira] [Created] (SPARK-25142) Add error messages when Python worker could not open socket in `_load_from_socket`.

2018-08-17 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-25142: - Summary: Add error messages when Python worker could not open socket in `_load_from_socket`. Key: SPARK-25142 URL: https://issues.apache.org/jira/browse/SPARK-25142

[jira] [Commented] (SPARK-25132) Spark returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases

2018-08-17 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583688#comment-16583688 ] Yuming Wang commented on SPARK-25132: - Stackoverflow has been asked this question [Spark SQL returns

[jira] [Assigned] (SPARK-25141) Modify tests for higher-order functions to check bind method.

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25141: Assignee: Apache Spark > Modify tests for higher-order functions to check bind method. >

[jira] [Assigned] (SPARK-25141) Modify tests for higher-order functions to check bind method.

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25141: Assignee: (was: Apache Spark) > Modify tests for higher-order functions to check

[jira] [Commented] (SPARK-25141) Modify tests for higher-order functions to check bind method.

2018-08-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583664#comment-16583664 ] Apache Spark commented on SPARK-25141: -- User 'ueshin' has created a pull request for this issue:

[jira] [Created] (SPARK-25141) Modify tests for higher-order functions to check bind method.

2018-08-17 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-25141: - Summary: Modify tests for higher-order functions to check bind method. Key: SPARK-25141 URL: https://issues.apache.org/jira/browse/SPARK-25141 Project: Spark

[jira] [Commented] (SPARK-25138) Spark Shell should show the Scala prompt after initialization is complete

2018-08-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583585#comment-16583585 ] Marco Gaido commented on SPARK-25138: - [~smilegator] this is caused by SPARK-24418 and it is a

[jira] [Resolved] (SPARK-25138) Spark Shell should show the Scala prompt after initialization is complete

2018-08-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marco Gaido resolved SPARK-25138. - Resolution: Duplicate > Spark Shell should show the Scala prompt after initialization is

  1   2   >