[jira] [Commented] (SPARK-23393) Path is error when run test in local machine

2018-02-12 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360471#comment-16360471 ] Marco Gaido commented on SPARK-23393: - I think this is a problem for your environment. THe path is

[jira] [Commented] (SPARK-22002) Read JDBC table use custom schema support specify partial fields

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360464#comment-16360464 ] Apache Spark commented on SPARK-22002: -- User 'ueshin' has created a pull request for this issue:

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Description: Start spark as: {code:bash} $ bin/spark-shell --master

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Attachment: Storage_Tab.png > Storage info's Cached Partitions doesn't consider the

[jira] [Commented] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360593#comment-16360593 ] Marco Gaido commented on SPARK-23394: - I think this is not an issue. `numCachedPartitions ` is 20

[jira] [Commented] (SPARK-23352) Explicitly specify supported types in Pandas UDFs

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360658#comment-16360658 ] Apache Spark commented on SPARK-23352: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Resolved] (SPARK-23352) Explicitly specify supported types in Pandas UDFs

2018-02-12 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23352. -- Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.4.0 Fixed in

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Attachment: Spark_2.4.0-SNAPSHOT.png Spark_2.2.1.png > Storage

[jira] [Created] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
Attila Zsolt Piros created SPARK-23394: -- Summary: Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does) Key: SPARK-23394 URL:

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Attachment: Screen Shot 2018-02-12 at 11.24.22.png > Storage info's Cached

[jira] [Closed] (SPARK-10924) Failed to update accumulators for ShuffleMapTask: Broken pipe

2018-02-12 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pau Tallada Crespí closed SPARK-10924. -- > Failed to update accumulators for ShuffleMapTask: Broken pipe >

[jira] [Created] (SPARK-23395) Add an option to return an empty DataFrame from an RDD generated by a Hadoop file

2018-02-12 Thread Jens Rabe (JIRA)
Jens Rabe created SPARK-23395: - Summary: Add an option to return an empty DataFrame from an RDD generated by a Hadoop file Key: SPARK-23395 URL: https://issues.apache.org/jira/browse/SPARK-23395 Project:

[jira] [Updated] (SPARK-23395) Add an option to return an empty DataFrame from an RDD generated by a Hadoop file when there are no usable paths

2018-02-12 Thread Jens Rabe (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jens Rabe updated SPARK-23395: -- Target Version/s: 2.2.1, 2.2.0 (was: 2.2.0, 2.2.1) Summary: Add an option to return an

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Description: Start spark as: {code:bash} $ bin/spark-shell --master

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Description: Start spark as: {code:bash} $ bin/spark-shell --master

[jira] [Updated] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Attila Zsolt Piros (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-23394: --- Attachment: (was: Screen Shot 2018-02-12 at 11.24.22.png) > Storage info's

[jira] [Comment Edited] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360757#comment-16360757 ] Marcelo Vanzin edited comment on SPARK-23394 at 2/12/18 1:36 PM: - I

[jira] [Commented] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360757#comment-16360757 ] Marcelo Vanzin commented on SPARK-23394: I talked to Attila offline, and to me it seems like the

[jira] [Created] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
KaiXinXIaoLei created SPARK-23396: - Summary: Spark HistoryServer will OMM if the event log is big Key: SPARK-23396 URL: https://issues.apache.org/jira/browse/SPARK-23396 Project: Spark Issue

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: historyServer.png > Spark HistoryServer will OMM if the event log is big >

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: historyServer.png > Spark HistoryServer will OMM if the event log is big >

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: (was: historyServer.png) > Spark HistoryServer will OMM if the event log is

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: eventlog.png > Spark HistoryServer will OMM if the event log is big >

[jira] [Resolved] (SPARK-23393) Path is error when run test in local machine

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23393. --- Resolution: Invalid > Path is error when run test in local machine >

[jira] [Commented] (SPARK-23370) Spark receives a size of 0 for an Oracle Number field and defaults the field type to be BigDecimal(30,10) instead of the actual precision and scale

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360779#comment-16360779 ] Sean Owen commented on SPARK-23370: --- OK, that doesn't sound trivial. It seems like this would

[jira] [Commented] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-02-12 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360809#comment-16360809 ] Steve Loughran commented on SPARK-23308: BTW bq I should get at least ~82k partitions, thus the

[jira] [Comment Edited] (SPARK-20327) Add CLI support for YARN custom resources, like GPUs

2018-02-12 Thread Szilard Nemeth (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360904#comment-16360904 ] Szilard Nemeth edited comment on SPARK-20327 at 2/12/18 3:43 PM: - Hey

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: eventlog.png > Spark HistoryServer will OMM if the event log is big >

[jira] [Resolved] (SPARK-23343) Increase the exception test for the bind port

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23343. --- Resolution: Won't Fix > Increase the exception test for the bind port >

[jira] [Created] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.

2018-02-12 Thread Shahbaz Hussain (JIRA)
Shahbaz Hussain created SPARK-23397: --- Summary: Scheduling delay causes Spark Streaming to miss batches. Key: SPARK-23397 URL: https://issues.apache.org/jira/browse/SPARK-23397 Project: Spark

[jira] [Assigned] (SPARK-23391) It may lead to overflow for some integer multiplication

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-23391: - Assignee: liuxian > It may lead to overflow for some integer multiplication >

[jira] [Resolved] (SPARK-23391) It may lead to overflow for some integer multiplication

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23391. --- Resolution: Fixed Fix Version/s: 2.3.0 2.2.2 Issue resolved by pull

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: (was: historyServer.png) > Spark HistoryServer will OMM if the event log is

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Description: if the event log is  big, the historyServer web will be out of memory  

[jira] [Commented] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360784#comment-16360784 ] Sean Owen commented on SPARK-23396: --- This is far too vague. It seems to overlap with recent

[jira] [Updated] (SPARK-23392) Add some test case for images feature

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23392: -- Priority: Trivial (was: Major) > Add some test case for images feature >

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Description: if the event log is  big, the historyServer web will be out of memory . My

[jira] [Commented] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.

2018-02-12 Thread Shahbaz Hussain (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360793#comment-16360793 ] Shahbaz Hussain commented on SPARK-23397: - Yes ,if current Batch Processing time is greater than

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Description: if the event log is  big, the historyServer web will be out of memory . My

[jira] [Commented] (SPARK-20327) Add CLI support for YARN custom resources, like GPUs

2018-02-12 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360944#comment-16360944 ] Marcelo Vanzin commented on SPARK-20327: bq. I think the point is, without reflection, using 3.x+

[jira] [Resolved] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-02-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22977. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20521

[jira] [Assigned] (SPARK-22977) DataFrameWriter operations do not show details in SQL tab

2018-02-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-22977: --- Assignee: Wenchen Fan > DataFrameWriter operations do not show details in SQL tab >

[jira] [Commented] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360789#comment-16360789 ] Sean Owen commented on SPARK-23397: --- This is how it's supposed to work. Batches don't overlap. If one

[jira] [Comment Edited] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.

2018-02-12 Thread Shahbaz Hussain (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360802#comment-16360802 ] Shahbaz Hussain edited comment on SPARK-23397 at 2/12/18 2:29 PM: -- can

[jira] [Commented] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.

2018-02-12 Thread Shahbaz Hussain (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360802#comment-16360802 ] Shahbaz Hussain commented on SPARK-23397: - can we be able to make job creation a only once and

[jira] [Commented] (SPARK-20327) Add CLI support for YARN custom resources, like GPUs

2018-02-12 Thread Szilard Nemeth (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360904#comment-16360904 ] Szilard Nemeth commented on SPARK-20327: Hey [~vanzin]! I see what you said about compatibility.

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: historyServer.png > Spark HistoryServer will OMM if the event log is big >

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Description: if the event log is  big, the historyServer web will be out of memory

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Attachment: (was: eventlog.png) > Spark HistoryServer will OMM if the event log is big >

[jira] [Updated] (SPARK-23396) Spark HistoryServer will OMM if the event log is big

2018-02-12 Thread KaiXinXIaoLei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KaiXinXIaoLei updated SPARK-23396: -- Description: if the event log is  big, the historyServer web will be out of memory

[jira] [Commented] (SPARK-23397) Scheduling delay causes Spark Streaming to miss batches.

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360798#comment-16360798 ] Sean Owen commented on SPARK-23397: --- That sounds correct. The next batch executes as soon as possible.

[jira] [Commented] (SPARK-20327) Add CLI support for YARN custom resources, like GPUs

2018-02-12 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360933#comment-16360933 ] Sean Owen commented on SPARK-20327: --- I think the point is, without reflection, using 3.x+ APIs with 2.x

[jira] [Assigned] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23394: Assignee: Apache Spark > Storage info's Cached Partitions doesn't consider the

[jira] [Assigned] (SPARK-23388) Support for Parquet Binary DecimalType in VectorizedColumnReader

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-23388: --- Assignee: James Thompson > Support for Parquet Binary DecimalType in VectorizedColumnReader >

[jira] [Resolved] (SPARK-23388) Support for Parquet Binary DecimalType in VectorizedColumnReader

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23388. - Resolution: Fixed Fix Version/s: 2.3.0 > Support for Parquet Binary DecimalType in

[jira] [Commented] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361347#comment-16361347 ] Apache Spark commented on SPARK-23394: -- User 'attilapiros' has created a pull request for this

[jira] [Assigned] (SPARK-23394) Storage info's Cached Partitions doesn't consider the replications (but sc.getRDDStorageInfo does)

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23394: Assignee: (was: Apache Spark) > Storage info's Cached Partitions doesn't consider the

[jira] [Updated] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-23377: -- Priority: Critical (was: Major) > Bucketizer with multiple columns persistence bug >

[jira] [Comment Edited] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-12 Thread Nicolas Poggi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361071#comment-16361071 ] Nicolas Poggi edited comment on SPARK-23310 at 2/12/18 6:35 PM: Q72 of

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361154#comment-16361154 ] Joseph K. Bradley commented on SPARK-23377: --- [~viirya]'s patch currently changes

[jira] [Resolved] (SPARK-23390) Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23390. - Resolution: Fixed Assignee: Wenchen Fan Fix Version/s: 2.3.0 > Flaky Test Suite:

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-12 Thread Nicolas Poggi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361071#comment-16361071 ] Nicolas Poggi commented on SPARK-23310: --- Q72 of TPC-DS is also affected around 30% at scale factor

[jira] [Updated] (SPARK-23398) DataSourceV2 should provide a way to get a source's schema.

2018-02-12 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-23398: -- Summary: DataSourceV2 should provide a way to get a source's schema. (was: DataSourceV2 should

[jira] [Created] (SPARK-23398) DataSourceV2 should provide a way to get the source schema

2018-02-12 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23398: - Summary: DataSourceV2 should provide a way to get the source schema Key: SPARK-23398 URL: https://issues.apache.org/jira/browse/SPARK-23398 Project: Spark Issue

[jira] [Assigned] (SPARK-23399) Register a task completion listner first for OrcColumnarBatchReader

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23399: Assignee: (was: Apache Spark) > Register a task completion listner first for

[jira] [Commented] (SPARK-23390) Flaky Test Suite: FileBasedDataSourceSuite in Spark 2.3/hadoop 2.7

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361537#comment-16361537 ] Apache Spark commented on SPARK-23390: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Resolved] (SPARK-23378) move setCurrentDatabase from HiveExternalCatalog to HiveClientImpl

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23378. - Resolution: Fixed Assignee: Feng Liu Fix Version/s: 2.4.0 > move setCurrentDatabase from

[jira] [Created] (SPARK-23401) Improve test cases for all supported types and unsupported types

2018-02-12 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-23401: Summary: Improve test cases for all supported types and unsupported types Key: SPARK-23401 URL: https://issues.apache.org/jira/browse/SPARK-23401 Project: Spark

[jira] [Created] (SPARK-23400) Add two extra constructors for ScalaUDF

2018-02-12 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23400: --- Summary: Add two extra constructors for ScalaUDF Key: SPARK-23400 URL: https://issues.apache.org/jira/browse/SPARK-23400 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence

2018-02-12 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361575#comment-16361575 ] Joseph K. Bradley commented on SPARK-23154: --- I'd prefer to put it in the subsection on saving &

[jira] [Assigned] (SPARK-23399) Register a task completion listner first for OrcColumnarBatchReader

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23399: Assignee: Apache Spark > Register a task completion listner first for

[jira] [Commented] (SPARK-23399) Register a task completion listner first for OrcColumnarBatchReader

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361508#comment-16361508 ] Apache Spark commented on SPARK-23399: -- User 'dongjoon-hyun' has created a pull request for this

[jira] [Created] (SPARK-23399) Register a task completion listner first for OrcColumnarBatchReader

2018-02-12 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-23399: - Summary: Register a task completion listner first for OrcColumnarBatchReader Key: SPARK-23399 URL: https://issues.apache.org/jira/browse/SPARK-23399 Project: Spark

[jira] [Updated] (SPARK-23399) Register a task completion listner first for OrcColumnarBatchReader

2018-02-12 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23399: -- Description: This is related with SPARK-23390. Currently, there was a opened file leak for

[jira] [Commented] (SPARK-23298) distinct.count on Dataset/DataFrame yields non-deterministic results

2018-02-12 Thread Mateusz Jukiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360415#comment-16360415 ] Mateusz Jukiewicz commented on SPARK-23298: --- Not sure but seems like it could be related to 

[jira] [Commented] (SPARK-21866) SPIP: Image support in Spark

2018-02-12 Thread xubo245 (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360417#comment-16360417 ] xubo245 commented on SPARK-21866: - Are the summary for TODO work of image feature? When will it plan to

[jira] [Assigned] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-23323: --- Assignee: Ryan Blue > DataSourceV2 should use the output commit coordinator. >

[jira] [Resolved] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-12 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23323. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20490

[jira] [Updated] (SPARK-22820) Spark 2.3 SQL API audit

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22820: Fix Version/s: 2.3.0 > Spark 2.3 SQL API audit > --- > > Key:

[jira] [Resolved] (SPARK-22820) Spark 2.3 SQL API audit

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22820. - Resolution: Fixed > Spark 2.3 SQL API audit > --- > > Key:

[jira] [Commented] (SPARK-23154) Document backwards compatibility guarantees for ML persistence

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361654#comment-16361654 ] Apache Spark commented on SPARK-23154: -- User 'jkbradley' has created a pull request for this issue:

[jira] [Updated] (SPARK-23352) Explicitly specify supported types in Pandas UDFs

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23352: Fix Version/s: 2.3.1 > Explicitly specify supported types in Pandas UDFs >

[jira] [Comment Edited] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361154#comment-16361154 ] Joseph K. Bradley edited comment on SPARK-23377 at 2/13/18 1:10 AM:

[jira] [Commented] (SPARK-23230) When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361696#comment-16361696 ] Apache Spark commented on SPARK-23230: -- User 'cxzl25' has created a pull request for this issue:

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361718#comment-16361718 ] Liang-Chi Hsieh commented on SPARK-23377: - I have no objection to [~josephkb]'s proposal (first

[jira] [Commented] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361724#comment-16361724 ] Liang-Chi Hsieh commented on SPARK-23377: - For now, I think neither 3rd option or my current

[jira] [Assigned] (SPARK-23313) Add a migration guide for ORC

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-23313: --- Assignee: Dongjoon Hyun > Add a migration guide for ORC > - > >

[jira] [Resolved] (SPARK-23313) Add a migration guide for ORC

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23313. - Resolution: Fixed Fix Version/s: 2.3.0 > Add a migration guide for ORC >

[jira] [Commented] (SPARK-20307) SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer

2018-02-12 Thread Miao Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361631#comment-16361631 ] Miao Wang commented on SPARK-20307: --- [~felixcheung] I will do it during the Lunar New Year vacation. I

[jira] [Resolved] (SPARK-23230) When hive.default.fileformat is other kinds of file types, create textfile table cause a serde error

2018-02-12 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23230. - Resolution: Fixed Assignee: dzcxzl Fix Version/s: 2.3.0 > When hive.default.fileformat

[jira] [Updated] (SPARK-23340) Empty float/double array columns in ORC file should not raise EOFException

2018-02-12 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23340: -- Summary: Empty float/double array columns in ORC file should not raise EOFException (was:

[jira] [Updated] (SPARK-23340) Empty float/double array columns in ORC file should not raise EOFException

2018-02-12 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23340: -- Description: This issue updates Apache ORC dependencies to 1.4.3 released on February 9th.

[jira] [Created] (SPARK-23404) When the underlying buffers are already direct, we should copy it to the heap memory

2018-02-12 Thread liuxian (JIRA)
liuxian created SPARK-23404: --- Summary: When the underlying buffers are already direct, we should copy it to the heap memory Key: SPARK-23404 URL: https://issues.apache.org/jira/browse/SPARK-23404 Project:

[jira] [Assigned] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23404: Assignee: (was: Apache Spark) > When the underlying buffers are already direct, we

[jira] [Commented] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-12 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361933#comment-16361933 ] Apache Spark commented on SPARK-23404: -- User '10110346' has created a pull request for this issue:

[jira] [Created] (SPARK-23403) java.lang.ArrayIndexOutOfBoundsException: 10

2018-02-12 Thread Naresh Kumar (JIRA)
Naresh Kumar created SPARK-23403: Summary: java.lang.ArrayIndexOutOfBoundsException: 10 Key: SPARK-23403 URL: https://issues.apache.org/jira/browse/SPARK-23403 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-23340) Empty float/double array columns in ORC file raise EOFException

2018-02-12 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23340: -- Summary: Empty float/double array columns in ORC file raise EOFException (was: Empty

[jira] [Updated] (SPARK-23340) Empty float/double array columns in ORC file should not raise EOFException

2018-02-12 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23340: -- Description: This issue updates Apache ORC dependencies to 1.4.3 released on February 9th.

[jira] [Updated] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-12 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23404: Description: If the memory mode is _ON_HEAP_,when the underlying buffers are direct, we should copy them

[jira] [Updated] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-12 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23404: Summary: When the underlying buffers are already direct, we should copy them to the heap memory (was:

  1   2   >