[jira] [Updated] (SPARK-23345) Flaky test: FileBasedDataSourceSuite

2018-02-06 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-23345: --- Component/s: Tests > Flaky test: FileBasedDataSourceSuite >

[jira] [Commented] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-06 Thread Dilip Biswal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354384#comment-16354384 ] Dilip Biswal commented on SPARK-23271: -- Thank you [~smilegator]. I will try to create a PR to fix

[jira] [Assigned] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22977: Assignee: Apache Spark > DataFrameWriter operations do not show details in SQL tab >

[jira] [Assigned] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22977: Assignee: (was: Apache Spark) > DataFrameWriter operations do not show details in SQL

[jira] [Commented] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354208#comment-16354208 ] Apache Spark commented on SPARK-22977: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354248#comment-16354248 ] Márcio Furlani Carmona commented on SPARK-23308: Yeah, I set it back to

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354250#comment-16354250 ] Márcio Furlani Carmona commented on SPARK-23308: Yeah, I set it back to

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-06 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354336#comment-16354336 ] Thomas Graves commented on SPARK-23309: --- I pulled in that patch

[jira] [Commented] (SPARK-22683) DynamicAllocation wastes resources by allocating containers that will barely be used

2018-02-06 Thread Julien Cuquemelle (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353953#comment-16353953 ] Julien Cuquemelle commented on SPARK-22683: --- [~tgraves]: the issue appears at each stage

[jira] [Updated] (SPARK-23344) Add KMeans distanceMeasure param to PySpark

2018-02-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23344: -- Priority: Minor (was: Major) (Aside: I'm not sure it's so constructive to add an API and only later

[jira] [Commented] (SPARK-23329) Update the function descriptions with the arguments and returned values of the trigonometric functions

2018-02-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354178#comment-16354178 ] Sean Owen commented on SPARK-23329: --- Sure, though I might simplify that a bit. {code} /** * @param e

[jira] [Commented] (SPARK-18356) KMeans should cache RDD before training

2018-02-06 Thread Lovasoa (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354196#comment-16354196 ] Lovasoa commented on SPARK-18356: - Will this also be fixed for BisectingKMeans ? > KMeans should cache

[jira] [Commented] (SPARK-20659) Remove StorageStatus, or make it private.

2018-02-06 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354191#comment-16354191 ] Attila Zsolt Piros commented on SPARK-20659: I would like to take this issue. > Remove

[jira] [Issue Comment Deleted] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márcio Furlani Carmona updated SPARK-23308: --- Comment: was deleted (was: Yeah, I set it back to `ignoreCorruptFiles=false`

[jira] [Created] (SPARK-23345) Flaky test: FileBasedDataSourceSuite

2018-02-06 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-23345: -- Summary: Flaky test: FileBasedDataSourceSuite Key: SPARK-23345 URL: https://issues.apache.org/jira/browse/SPARK-23345 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-23290: Assignee: Takuya Ueshin > inadvertent change in handling of DateType when converting to

[jira] [Resolved] (SPARK-23334) Fix pandas_udf with return type StringType() to handle str type properly in Python 2.

2018-02-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23334. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20507

[jira] [Assigned] (SPARK-23334) Fix pandas_udf with return type StringType() to handle str type properly in Python 2.

2018-02-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-23334: Assignee: Takuya Ueshin > Fix pandas_udf with return type StringType() to handle str type

[jira] [Updated] (SPARK-23053) taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

2018-02-06 Thread huangtengfei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huangtengfei updated SPARK-23053: - Description: When we run concurrent jobs using the same rdd which is marked to do checkpoint.

[jira] [Assigned] (SPARK-23342) Add ORC configuration tests for ORC data source

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23342: Assignee: Apache Spark > Add ORC configuration tests for ORC data source >

[jira] [Commented] (SPARK-23342) Add ORC configuration tests for ORC data source

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353608#comment-16353608 ] Apache Spark commented on SPARK-23342: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Assigned] (SPARK-23342) Add ORC configuration tests for ORC data source

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23342: Assignee: (was: Apache Spark) > Add ORC configuration tests for ORC data source >

[jira] [Resolved] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23290. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20515

[jira] [Updated] (SPARK-23337) withWatermark raises an exception on struct objects

2018-02-06 Thread Aydin Kocas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aydin Kocas updated SPARK-23337: Description: Hi,   when using a nested object (I mean an object within a struct, here concrete:

[jira] [Commented] (SPARK-19870) Repeatable deadlock on BlockInfoManager and TorrentBroadcast

2018-02-06 Thread Eyal Farago (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353651#comment-16353651 ] Eyal Farago commented on SPARK-19870: - we've also seen something very similar (stack traces attach)

[jira] [Commented] (SPARK-23053) taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

2018-02-06 Thread huangtengfei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353761#comment-16353761 ] huangtengfei commented on SPARK-23053: -- here is the stack trace of exception.

[jira] [Comment Edited] (SPARK-23053) taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

2018-02-06 Thread huangtengfei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353761#comment-16353761 ] huangtengfei edited comment on SPARK-23053 at 2/6/18 11:48 AM: --- here is the

[jira] [Updated] (SPARK-23336) Upgrade snappy-java to 1.1.7.1

2018-02-06 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-23336: Summary: Upgrade snappy-java to 1.1.7.1 (was: Upgrade snappy-java to 1.1.4) > Upgrade snappy-java

[jira] [Issue Comment Deleted] (SPARK-10063) Remove DirectParquetOutputCommitter

2018-02-06 Thread Henrique dos Santos Goulart (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henrique dos Santos Goulart updated SPARK-10063: Comment: was deleted (was: There is any alternative right now that

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354784#comment-16354784 ] Steve Loughran commented on SPARK-23308: bq. I have not heard this come up before as an issue in

[jira] [Comment Edited] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-06 Thread Dilip Biswal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354384#comment-16354384 ] Dilip Biswal edited comment on SPARK-23271 at 2/6/18 7:23 PM: -- Thank you

[jira] [Commented] (SPARK-19870) Repeatable deadlock on BlockInfoManager and TorrentBroadcast

2018-02-06 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354680#comment-16354680 ] Imran Rashid commented on SPARK-19870: -- [~eyalfa] my recollection is a bit rusty, but you're

[jira] [Comment Edited] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-06 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354784#comment-16354784 ] Steve Loughran edited comment on SPARK-23308 at 2/7/18 12:26 AM: - bq. I

[jira] [Commented] (SPARK-20659) Remove StorageStatus, or make it private.

2018-02-06 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354628#comment-16354628 ] Marcelo Vanzin commented on SPARK-20659: I haven't really looked in detail at this stuff, so I

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-06 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354665#comment-16354665 ] Li Jin commented on SPARK-23314: I figured out what the issue is. Will have a patch soon. > Pandas

[jira] [Updated] (SPARK-22158) convertMetastore should not ignore storage properties

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22158: Summary: convertMetastore should not ignore storage properties (was: convertMetastore should not ignore

[jira] [Resolved] (SPARK-23327) Update the description of three external API or functions

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23327. - Resolution: Fixed Fix Version/s: 2.3.0 > Update the description of three external API or

[jira] [Commented] (SPARK-10925) Exception when joining DataFrames

2018-02-06 Thread Clay Stevens (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354627#comment-16354627 ] Clay Stevens commented on SPARK-10925: -- I was having the same problem and " `df.rdd.toDF()` before

[jira] [Updated] (SPARK-23313) Add a migration guide for ORC

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23313: Target Version/s: 2.3.0 > Add a migration guide for ORC > - > >

[jira] [Commented] (SPARK-18067) SortMergeJoin adds shuffle if join predicates have non partitioned columns

2018-02-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354429#comment-16354429 ] Tejas Patil commented on SPARK-18067: - [~eyalfa] :  Re #1: Theoretically this idea should help

[jira] [Commented] (SPARK-19256) Hive bucketing support

2018-02-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354440#comment-16354440 ] Tejas Patil commented on SPARK-19256: - [~ferdonline] : The feature you are referring to is not in the

[jira] [Commented] (SPARK-20659) Remove StorageStatus, or make it private.

2018-02-06 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354430#comment-16354430 ] Attila Zsolt Piros commented on SPARK-20659: [~vanzin] I would like to ask a few questions: -

[jira] [Resolved] (SPARK-23315) failed to get output from canonicalized data source v2 related plans

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23315. - Resolution: Fixed Fix Version/s: 2.3.0 > failed to get output from canonicalized data source v2

[jira] [Commented] (SPARK-22158) convertMetastore should not ignore table properties

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354525#comment-16354525 ] Apache Spark commented on SPARK-22158: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Commented] (SPARK-23329) Update the function descriptions with the arguments and returned values of the trigonometric functions

2018-02-06 Thread Mihaly Toth (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354166#comment-16354166 ] Mihaly Toth commented on SPARK-23329: - How about this approach? {code:scala} /** * Computes the

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-06 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354045#comment-16354045 ] Li Jin commented on SPARK-23314: I think this is related to how Pandas deals with timestamp localization.

[jira] [Commented] (SPARK-23344) Add KMeans distanceMeasure param to PySpark

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353992#comment-16353992 ] Apache Spark commented on SPARK-23344: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23344) Add KMeans distanceMeasure param to PySpark

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23344: Assignee: Apache Spark > Add KMeans distanceMeasure param to PySpark >

[jira] [Assigned] (SPARK-23344) Add KMeans distanceMeasure param to PySpark

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23344: Assignee: (was: Apache Spark) > Add KMeans distanceMeasure param to PySpark >

[jira] [Created] (SPARK-23344) Add KMeans distanceMeasure param to PySpark

2018-02-06 Thread Marco Gaido (JIRA)
Marco Gaido created SPARK-23344: --- Summary: Add KMeans distanceMeasure param to PySpark Key: SPARK-23344 URL: https://issues.apache.org/jira/browse/SPARK-23344 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353998#comment-16353998 ] Xiao Li commented on SPARK-23271: - After a discussion with [~cloud_fan], we think the behavior should be

[jira] [Updated] (SPARK-23346) Failed tasks reported as success if the failure reason is not ExceptionFailure

2018-02-06 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-23346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 吴志龙 updated SPARK-23346: Description: !企业微信截图_15179715023606.png! !企业微信截图_15179714603307.png! We have many other failure reasons, such

[jira] [Updated] (SPARK-21097) Dynamic allocation will preserve cached data

2018-02-06 Thread Brad (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad updated SPARK-21097: - Description: We want to use dynamic allocation to distribute resources among many notebook users on our spark

[jira] [Commented] (SPARK-17217) Codegeneration fails for describe() on many columns

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354848#comment-16354848 ] Xiao Li commented on SPARK-17217: - It should be resolved by

[jira] [Resolved] (SPARK-17217) Codegeneration fails for describe() on many columns

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-17217. - Resolution: Duplicate > Codegeneration fails for describe() on many columns >

[jira] [Created] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Ted Yu (JIRA)
Ted Yu created SPARK-23347: -- Summary: Introduce buffer between Java data stream and gzip stream Key: SPARK-23347 URL: https://issues.apache.org/jira/browse/SPARK-23347 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23346) Failed tasks reported as success if the failure reason is not ExceptionFailure

2018-02-06 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-23346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 吴志龙 updated SPARK-23346: Attachment: 企业微信截图_15179714603307.png > Failed tasks reported as success if the failure reason is not

[jira] [Updated] (SPARK-23346) Failed tasks reported as success if the failure reason is not ExceptionFailure

2018-02-06 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-23346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] 吴志龙 updated SPARK-23346: Attachment: 企业微信截图_15179715023606.png > Failed tasks reported as success if the failure reason is not

[jira] [Updated] (SPARK-22226) splitExpression can create too many method calls (generating a Constant Pool limit error)

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-6: Issue Type: Sub-task (was: Bug) Parent: SPARK-22510 > splitExpression can create too many method

[jira] [Updated] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-16845: Issue Type: Sub-task (was: Bug) Parent: SPARK-22510 >

[jira] [Updated] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-18016: Issue Type: Sub-task (was: Bug) Parent: SPARK-22510 > Code Generation: Constant Pool Past Limit

[jira] [Commented] (SPARK-23122) Deprecate register* for UDFs in SQLContext and Catalog in PySpark

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354813#comment-16354813 ] Apache Spark commented on SPARK-23122: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Created] (SPARK-23346) Failed tasks reported as success if the failure reason is not ExceptionFailure

2018-02-06 Thread JIRA
吴志龙 created SPARK-23346: --- Summary: Failed tasks reported as success if the failure reason is not ExceptionFailure Key: SPARK-23346 URL: https://issues.apache.org/jira/browse/SPARK-23346 Project: Spark

[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354903#comment-16354903 ] Sean Owen commented on SPARK-23347: --- GZipOutputStream is buffered already. As you say it implements the

[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Ted Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354907#comment-16354907 ] Ted Yu commented on SPARK-23347: See JDK 1.8 code: {code} class DeflaterOutputStream { public void

[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Ted Yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354939#comment-16354939 ] Ted Yu commented on SPARK-23347: {code} public final byte[] serialize(Object o) throws Exception {

[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354919#comment-16354919 ] Sean Owen commented on SPARK-23347: --- Yes, but that's the opposite of what this JIRA suggests is a

[jira] [Comment Edited] (SPARK-23139) Read eventLog file with mixed encodings

2018-02-06 Thread DENG FEI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355022#comment-16355022 ] DENG FEI edited comment on SPARK-23139 at 2/7/18 6:34 AM: -- [~irashid]  You're

[jira] [Commented] (SPARK-23139) Read eventLog file with mixed encodings

2018-02-06 Thread DENG FEI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355022#comment-16355022 ] DENG FEI commented on SPARK-23139: -- [~irashid]  You're right, but one can change the default character

[jira] [Updated] (SPARK-16060) Vectorized ORC reader

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-16060: Summary: Vectorized ORC reader (was: Vectorized Orc reader) > Vectorized ORC reader >

[jira] [Assigned] (SPARK-20746) Built-in SQL Function Improvement

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20746: --- Assignee: (was: Yuming Wang) > Built-in SQL Function Improvement >

[jira] [Deleted] (SPARK-20747) Distinct in Aggregate Functions

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li deleted SPARK-20747: > Distinct in Aggregate Functions > --- > > Key: SPARK-20747 >

[jira] [Deleted] (SPARK-20752) Build-in SQL Function Support - SQRT

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li deleted SPARK-20752: > Build-in SQL Function Support - SQRT > > > Key:

[jira] [Updated] (SPARK-14878) Support Trim characters in the string trim function

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-14878: Issue Type: Sub-task (was: Improvement) Parent: SPARK-20746 > Support Trim characters in the

[jira] [Assigned] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23271: Assignee: Apache Spark > Parquet output contains only "_SUCCESS" file after empty

[jira] [Commented] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355097#comment-16355097 ] Apache Spark commented on SPARK-23271: -- User 'dilipbiswal' has created a pull request for this

[jira] [Assigned] (SPARK-23271) Parquet output contains only "_SUCCESS" file after empty DataFrame saving

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23271: Assignee: (was: Apache Spark) > Parquet output contains only "_SUCCESS" file after

[jira] [Commented] (SPARK-23347) Introduce buffer between Java data stream and gzip stream

2018-02-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354962#comment-16354962 ] Sean Owen commented on SPARK-23347: --- Well, that would depend on what Jackson does, and that's buried

[jira] [Assigned] (SPARK-23345) Flaky test: FileBasedDataSourceSuite

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23345: Assignee: Apache Spark > Flaky test: FileBasedDataSourceSuite >

[jira] [Assigned] (SPARK-23345) Flaky test: FileBasedDataSourceSuite

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23345: Assignee: (was: Apache Spark) > Flaky test: FileBasedDataSourceSuite >

[jira] [Commented] (SPARK-23345) Flaky test: FileBasedDataSourceSuite

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355005#comment-16355005 ] Apache Spark commented on SPARK-23345: -- User 'viirya' has created a pull request for this issue:

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-06 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354967#comment-16354967 ] Wenchen Fan commented on SPARK-23309: - is it possible to provide a concrete query(with table schema)

[jira] [Updated] (SPARK-22829) Add new built-in function date_trunc()

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22829: Issue Type: Sub-task (was: New Feature) Parent: SPARK-20746 > Add new built-in function

[jira] [Updated] (SPARK-21007) Add SQL function - RIGHT && LEFT

2018-02-06 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21007: Issue Type: Sub-task (was: Improvement) Parent: SPARK-20746 > Add SQL function - RIGHT && LEFT >

[jira] [Created] (SPARK-23342) Add ORC configuration tests for ORC data source

2018-02-06 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-23342: - Summary: Add ORC configuration tests for ORC data source Key: SPARK-23342 URL: https://issues.apache.org/jira/browse/SPARK-23342 Project: Spark Issue

[jira] [Commented] (SPARK-23343) Increase the exception test for the bind port

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353604#comment-16353604 ] Apache Spark commented on SPARK-23343: -- User 'heary-cao' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23343) Increase the exception test for the bind port

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23343: Assignee: (was: Apache Spark) > Increase the exception test for the bind port >

[jira] [Assigned] (SPARK-23343) Increase the exception test for the bind port

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23343: Assignee: Apache Spark > Increase the exception test for the bind port >

[jira] [Created] (SPARK-23343) Increase the exception test for the bind port

2018-02-06 Thread caoxuewen (JIRA)
caoxuewen created SPARK-23343: - Summary: Increase the exception test for the bind port Key: SPARK-23343 URL: https://issues.apache.org/jira/browse/SPARK-23343 Project: Spark Issue Type: Test

[jira] [Commented] (SPARK-23257) Implement Kerberos Support in Kubernetes resource manager

2018-02-06 Thread Rob Keevil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353870#comment-16353870 ] Rob Keevil commented on SPARK-23257: [~ifilonenko] Happy to help, let me know when you are ready for

[jira] [Updated] (SPARK-23328) Disallow default value None in na.replace/replace when 'to_replace' is not a dictionary

2018-02-06 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-23328: - Target Version/s: (was: 2.3.0) > Disallow default value None in na.replace/replace when

[jira] [Commented] (SPARK-22119) Add cosine distance to KMeans

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353922#comment-16353922 ] Apache Spark commented on SPARK-22119: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

2018-02-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353945#comment-16353945 ] Apache Spark commented on SPARK-23240: -- User 'bersprockets' has created a pull request for this