[jira] [Created] (SPARK-27552) The configuration `hive.exec.stagingdir` is invalid on Windows OS

2019-04-23 Thread liuxian (JIRA)
liuxian created SPARK-27552: --- Summary: The configuration `hive.exec.stagingdir` is invalid on Windows OS Key: SPARK-27552 URL: https://issues.apache.org/jira/browse/SPARK-27552 Project: Spark

[jira] [Resolved] (SPARK-27173) For hive parquet table,codes(lz4,brotli,zstd) are not available

2019-04-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-27173. - Resolution: Won't Fix > For hive parquet table,codes(lz4,brotli,zstd) are not available >

[jira] [Updated] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-24 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27256: Description: Currently, if we want to configure `spark. sql. files. maxPartitionBytes` to 256 megabytes,

[jira] [Created] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-23 Thread liuxian (JIRA)
liuxian created SPARK-27256: --- Summary: If the configuration is used to set the number of bytes, we'd better use `bytesConf`'. Key: SPARK-27256 URL: https://issues.apache.org/jira/browse/SPARK-27256

[jira] [Updated] (SPARK-27238) In the same APP, maybe some hive Parquet(ORC) tables can't use the built-in Parquet(ORC) reader and writer

2019-03-22 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27238: Description: In the same APP, TableA and TableB are both hive Parquet tables, but TableA can't use the

[jira] [Updated] (SPARK-27238) In the same APP, maybe some hive Parquet(ORC) tables can't use the built-in Parquet(ORC) reader and writer

2019-03-22 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27238: Summary: In the same APP, maybe some hive Parquet(ORC) tables can't use the built-in Parquet(ORC) reader

[jira] [Updated] (SPARK-27238) In the same APP, maybe some hive parquet tables can't use the built-in Parquet reader and writer

2019-03-21 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27238: Description: In the same APP, TableA and TableB are both hive parquet tables, but TableA can't use the

[jira] [Created] (SPARK-27238) In the same APP, maybe some hive parquet tables can't use the built-in Parquet reader and writer

2019-03-21 Thread liuxian (JIRA)
liuxian created SPARK-27238: --- Summary: In the same APP, maybe some hive parquet tables can't use the built-in Parquet reader and writer Key: SPARK-27238 URL: https://issues.apache.org/jira/browse/SPARK-27238

[jira] [Created] (SPARK-27173) For hive parquet table,codes()

2019-03-15 Thread liuxian (JIRA)
liuxian created SPARK-27173: --- Summary: For hive parquet table,codes() Key: SPARK-27173 URL: https://issues.apache.org/jira/browse/SPARK-27173 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-27173) For hive parquet table,codes(lz4,brotli,zstd) are not available

2019-03-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27173: Description: From _parquet.hadoop.metadata.CompressionCodecName_(parquet-hadoop-bundle-1.6.0.jar ), we

[jira] [Updated] (SPARK-27173) For hive parquet table,codes(lz4,brotli,zstd) are not available

2019-03-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27173: Description: We can parquet-hadoop-bundle-1.6.0.jar parquet.hadoop.metadata.CompressionCodecName >

[jira] [Updated] (SPARK-27173) For hive parquet table,codes(lz4,brotli,zstd) are not available

2019-03-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27173: Summary: For hive parquet table,codes(lz4,brotli,zstd) are not available (was: For hive parquet

[jira] [Created] (SPARK-27083) Add a config to control subqueryReuse

2019-03-07 Thread liuxian (JIRA)
liuxian created SPARK-27083: --- Summary: Add a config to control subqueryReuse Key: SPARK-27083 URL: https://issues.apache.org/jira/browse/SPARK-27083 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-27056) Remove `start-shuffle-service.sh`

2019-03-05 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-27056: Description: _start-shuffle-service.sh_ was only used by Mesos before _start-mesos-shuffle-service.sh_.

[jira] [Created] (SPARK-27056) Remove `start-shuffle-service.sh`

2019-03-05 Thread liuxian (JIRA)
liuxian created SPARK-27056: --- Summary: Remove `start-shuffle-service.sh` Key: SPARK-27056 URL: https://issues.apache.org/jira/browse/SPARK-27056 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2019-02-22 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-25574. - Resolution: Invalid > Add an option `keepQuotes` for parsing csv file >

[jira] [Updated] (SPARK-26353) Add typed aggregate functions(max/min) to the example module

2019-02-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-26353: Summary: Add typed aggregate functions(max/min) to the example module (was: Add typed aggregate

[jira] [Updated] (SPARK-26353) Add typed aggregate functions(max/min) to the example module

2019-02-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-26353: Description: Add typed aggregate functions(max/min) to the example module. (was: For Dataset API, 

[jira] [Resolved] (SPARK-26793) Remove spark.shuffle.manager

2019-01-31 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-26793. - Resolution: Invalid > Remove spark.shuffle.manager > > >

[jira] [Created] (SPARK-26793) Remove spark.shuffle.manager

2019-01-30 Thread liuxian (JIRA)
liuxian created SPARK-26793: --- Summary: Remove spark.shuffle.manager Key: SPARK-26793 URL: https://issues.apache.org/jira/browse/SPARK-26793 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-26780) Improve shuffle read using ReadAheadInputStream

2019-01-29 Thread liuxian (JIRA)
liuxian created SPARK-26780: --- Summary: Improve shuffle read using ReadAheadInputStream Key: SPARK-26780 URL: https://issues.apache.org/jira/browse/SPARK-26780 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2019-01-28 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23516: Description: Now `StaticMemoryManager` mode has been removed. And for `UnifiedMemoryManager`, unroll

[jira] [Reopened] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2019-01-28 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian reopened SPARK-23516: - > I think it is unnecessary to transfer unroll memory to storage memory >

[jira] [Updated] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2019-01-28 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23516: Description: Now _StaticMemoryManager_ mode has been removed. And for _UnifiedMemoryManager_,  unroll

[jira] [Updated] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2019-01-28 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23516: Affects Version/s: (was: 2.3.0) 3.0.0 > I think it is unnecessary to transfer

[jira] [Created] (SPARK-26621) Use ConfigEntry for hardcoded configs for shuffle categories.

2019-01-15 Thread liuxian (JIRA)
liuxian created SPARK-26621: --- Summary: Use ConfigEntry for hardcoded configs for shuffle categories. Key: SPARK-26621 URL: https://issues.apache.org/jira/browse/SPARK-26621 Project: Spark Issue

[jira] [Created] (SPARK-26353) Add typed aggregate functions:max&

2018-12-12 Thread liuxian (JIRA)
liuxian created SPARK-26353: --- Summary: Add typed aggregate functions:max& Key: SPARK-26353 URL: https://issues.apache.org/jira/browse/SPARK-26353 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-26300) The `checkForStreaming` mothod may be called twice in `createQuery`

2018-12-06 Thread liuxian (JIRA)
liuxian created SPARK-26300: --- Summary: The `checkForStreaming` mothod may be called twice in `createQuery` Key: SPARK-26300 URL: https://issues.apache.org/jira/browse/SPARK-26300 Project: Spark

[jira] [Closed] (SPARK-26264) It is better to add @transient to field 'locs' for class `ResultTask`.

2018-12-04 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian closed SPARK-26264. --- > It is better to add @transient to field 'locs' for class `ResultTask`. >

[jira] [Resolved] (SPARK-26264) It is better to add @transient to field 'locs' for class `ResultTask`.

2018-12-04 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-26264. - Resolution: Not A Problem > It is better to add @transient to field 'locs' for class `ResultTask`. >

[jira] [Created] (SPARK-26264) It is better to add @transient to field 'locs' for class `ResultTask`.

2018-12-04 Thread liuxian (JIRA)
liuxian created SPARK-26264: --- Summary: It is better to add @transient to field 'locs' for class `ResultTask`. Key: SPARK-26264 URL: https://issues.apache.org/jira/browse/SPARK-26264 Project: Spark

[jira] [Updated] (SPARK-25729) It is better to replace `minPartitions` with `defaultParallelism` , when `minPartitions` is less than `defaultParallelism`

2018-11-04 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25729: Description: For ‘WholeTextFileInputFormat’, when `minPartitions` is less than `defaultParallelism`, it

[jira] [Updated] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat and OrcFileFormat

2018-10-28 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25806: Summary: The instanceof FileSplit is redundant for ParquetFileFormat and OrcFileFormat (was: The

[jira] [Updated] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat

2018-10-28 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25806: Description: The instance of FileSplit is redundant  {color:#33}in the

[jira] [Updated] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat

2018-10-23 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25806: Description: The instance of FileSplit is redundant for {color:#ffc66d}buildReaderWithPartitionValues

[jira] [Updated] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat

2018-10-23 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25806: Description: The instanceof FileSplit is redundant for {color:#ffc66d}buildReaderWithPartitionValues

[jira] [Updated] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat

2018-10-23 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25806: Description: The instanceof FileSplit is redundant for {color:#ffc66d}buildReaderWithPartitionValues

[jira] [Updated] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat

2018-10-23 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25806: Description: The instanceof FileSplit is redundant for {color:#ffc66d}buildReaderWithPartitionValues

[jira] [Created] (SPARK-25806) The instanceof FileSplit is redundant for ParquetFileFormat

2018-10-23 Thread liuxian (JIRA)
liuxian created SPARK-25806: --- Summary: The instanceof FileSplit is redundant for ParquetFileFormat Key: SPARK-25806 URL: https://issues.apache.org/jira/browse/SPARK-25806 Project: Spark Issue

[jira] [Updated] (SPARK-25786) If the ByteBuffer.hasArray is false , it will throw UnsupportedOperationException for Kryo

2018-10-19 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25786: Environment: (was: `{color:#ffc66d}deserialize{color}` for kryo,  the type of input parameter is

[jira] [Created] (SPARK-25786) If the ByteBuffer.hasArray is false , it will throw UnsupportedOperationException for Kryo

2018-10-19 Thread liuxian (JIRA)
liuxian created SPARK-25786: --- Summary: If the ByteBuffer.hasArray is false , it will throw UnsupportedOperationException for Kryo Key: SPARK-25786 URL: https://issues.apache.org/jira/browse/SPARK-25786

[jira] [Created] (SPARK-25780) Scheduling the tasks which have no higher level locality first

2018-10-19 Thread liuxian (JIRA)
liuxian created SPARK-25780: --- Summary: Scheduling the tasks which have no higher level locality first Key: SPARK-25780 URL: https://issues.apache.org/jira/browse/SPARK-25780 Project: Spark Issue

[jira] [Updated] (SPARK-25776) The disk write buffer size must be greater than 12.

2018-10-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25776: Description: In {color:#205081}{{UnsafeSorterSpillWriter.java}}{color}, when we write a record to a

[jira] [Updated] (SPARK-25776) The disk write buffer size must be greater than 12.

2018-10-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25776: Description: In {color:#205081}{{UnsafeSorterSpillWriter.java}}{color}, when we write a record to a

[jira] [Created] (SPARK-25776) The disk write buffer size must be greater than 12.

2018-10-18 Thread liuxian (JIRA)
liuxian created SPARK-25776: --- Summary: The disk write buffer size must be greater than 12. Key: SPARK-25776 URL: https://issues.apache.org/jira/browse/SPARK-25776 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-25753) binaryFiles broken for small files

2018-10-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25753: Description: _{{StreamFileInputFormat}}_ and

[jira] [Created] (SPARK-25753) binaryFiles broken for small files

2018-10-16 Thread liuxian (JIRA)
liuxian created SPARK-25753: --- Summary: binaryFiles broken for small files Key: SPARK-25753 URL: https://issues.apache.org/jira/browse/SPARK-25753 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-25729) It is better to replace `minPartitions` with `defaultParallelism` , when `minPartitions` is less than `defaultParallelism`

2018-10-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25729: Description: In ‘WholeTextFileRDD’,when `minPartitions` is less than `defaultParallelism`, it is better

[jira] [Created] (SPARK-25729) It is better to replace `minPartitions` with `defaultParallelism` , when `minPartitions` is less than `defaultParallelism`

2018-10-15 Thread liuxian (JIRA)
liuxian created SPARK-25729: --- Summary: It is better to replace `minPartitions` with `defaultParallelism` , when `minPartitions` is less than `defaultParallelism` Key: SPARK-25729 URL:

[jira] [Updated] (SPARK-25674) If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated

2018-10-08 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25674: Priority: Minor (was: Trivial) > If the records are incremented by more than 1 at a time,the number of

[jira] [Created] (SPARK-25674) If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated

2018-10-07 Thread liuxian (JIRA)
liuxian created SPARK-25674: --- Summary: If the records are incremented by more than 1 at a time,the number of bytes might rarely ever get updated Key: SPARK-25674 URL: https://issues.apache.org/jira/browse/SPARK-25674

[jira] [Updated] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25574: Description: In our project, when we read the CSV file, we hope to keep quotes. For example: We have

[jira] [Updated] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25574: Description: In our project, when we read the CSV file, we hope to keep quotes. For example: We have

[jira] [Created] (SPARK-25574) Add an option `keepQuotes` for parsing csv file

2018-09-29 Thread liuxian (JIRA)
liuxian created SPARK-25574: --- Summary: Add an option `keepQuotes` for parsing csv file Key: SPARK-25574 URL: https://issues.apache.org/jira/browse/SPARK-25574 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25366) Zstd and brotli CompressionCodec are not supported for parquet files

2018-09-09 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25366: Summary: Zstd and brotli CompressionCodec are not supported for parquet files (was: Zstd and brotil

[jira] [Created] (SPARK-25366) Zstd and brotil CompressionCodec are not supported for parquet files

2018-09-07 Thread liuxian (JIRA)
liuxian created SPARK-25366: --- Summary: Zstd and brotil CompressionCodec are not supported for parquet files Key: SPARK-25366 URL: https://issues.apache.org/jira/browse/SPARK-25366 Project: Spark

[jira] [Resolved] (SPARK-25356) Add Parquet block size (row group size) option to SparkSQL configuration

2018-09-06 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-25356. - Resolution: Invalid > Add Parquet block size (row group size) option to SparkSQL configuration >

[jira] [Created] (SPARK-25356) Add Parquet block size (row group size) option to SparkSQL configuration

2018-09-06 Thread liuxian (JIRA)
liuxian created SPARK-25356: --- Summary: Add Parquet block size (row group size) option to SparkSQL configuration Key: SPARK-25356 URL: https://issues.apache.org/jira/browse/SPARK-25356 Project: Spark

[jira] [Created] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`

2018-08-31 Thread liuxian (JIRA)
liuxian created SPARK-25300: --- Summary: Unified the configuration parameter `spark.shuffle.service.enabled` Key: SPARK-25300 URL: https://issues.apache.org/jira/browse/SPARK-25300 Project: Spark

[jira] [Created] (SPARK-25249) Add a unit test for OpenHashMap

2018-08-27 Thread liuxian (JIRA)
liuxian created SPARK-25249: --- Summary: Add a unit test for OpenHashMap Key: SPARK-25249 URL: https://issues.apache.org/jira/browse/SPARK-25249 Project: Spark Issue Type: Test Components:

[jira] [Updated] (SPARK-25166) Reduce the number of write operations for shuffle write.

2018-08-20 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-25166: Description: Currently, only one record is written to a buffer each time, which increases the number of

[jira] [Created] (SPARK-25166) Reduce the number of write operations for shuffle write.

2018-08-20 Thread liuxian (JIRA)
liuxian created SPARK-25166: --- Summary: Reduce the number of write operations for shuffle write. Key: SPARK-25166 URL: https://issues.apache.org/jira/browse/SPARK-25166 Project: Spark Issue Type:

[jira] [Created] (SPARK-24994) When the data type of the field is converted to other types, it can also support pushdown to parquet

2018-08-01 Thread liuxian (JIRA)
liuxian created SPARK-24994: --- Summary: When the data type of the field is converted to other types, it can also support pushdown to parquet Key: SPARK-24994 URL: https://issues.apache.org/jira/browse/SPARK-24994

[jira] [Comment Edited] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442176#comment-16442176 ] liuxian edited comment on SPARK-23989 at 4/18/18 9:21 AM: --

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442176#comment-16442176 ] liuxian commented on SPARK-23989: - test({color:#6a8759}"groupBy"{color}) { {color:#808080} 

[jira] [Updated] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23989: Attachment: (was: 无标题2.png) > When using `SortShuffleWriter`, the data will be overwritten >

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441982#comment-16441982 ] liuxian commented on SPARK-23989: - We assume that: numPartitions >

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441980#comment-16441980 ] liuxian commented on SPARK-23989: - {color:#9876aa}I think '{color:#33}SortShuffleWriter{color}'

[jira] [Comment Edited] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441952#comment-16441952 ] liuxian edited comment on SPARK-23989 at 4/18/18 6:21 AM: -- 1.  Make

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441952#comment-16441952 ] liuxian commented on SPARK-23989: - 1.  Make 'BypassMergeSortShuffleHandle' and 'SerializedShuffleHandle'

[jira] [Updated] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-18 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23989: Attachment: 无标题2.png > When using `SortShuffleWriter`, the data will be overwritten >

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16440250#comment-16440250 ] liuxian commented on SPARK-23989: - If we make 'BypassMergeSortShuffleHandle' and

[jira] [Created] (SPARK-23992) ShuffleDependency does not need to be deserialized every time

2018-04-16 Thread liuxian (JIRA)
liuxian created SPARK-23992: --- Summary: ShuffleDependency does not need to be deserialized every time Key: SPARK-23992 URL: https://issues.apache.org/jira/browse/SPARK-23992 Project: Spark Issue

[jira] [Comment Edited] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439231#comment-16439231 ] liuxian edited comment on SPARK-23989 at 4/16/18 10:18 AM: --- For

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439231#comment-16439231 ] liuxian commented on SPARK-23989: - For {color:#33}`SortShuffleWriter`{color},  `records:

[jira] [Comment Edited] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439148#comment-16439148 ] liuxian edited comment on SPARK-23989 at 4/16/18 9:00 AM: -- [~joshrosen]

[jira] [Commented] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16439148#comment-16439148 ] liuxian commented on SPARK-23989: - [~joshrosen] > When using `SortShuffleWriter`, the data will be

[jira] [Created] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
liuxian created SPARK-23989: --- Summary: When using `SortShuffleWriter`, the data will be overwritten Key: SPARK-23989 URL: https://issues.apache.org/jira/browse/SPARK-23989 Project: Spark Issue

[jira] [Updated] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-16 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23989: Description: {color:#33}When using `SortShuffleWriter`, we only insert 

[jira] [Updated] (SPARK-23744) Memory leak in ReadableChannelFileRegion

2018-03-19 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23744: Description: In the class _ReadableChannelFileRegion_,  the _buffer_ is direct memory, we should  modify 

[jira] [Updated] (SPARK-23744) Memory leak in ReadableChannelFileRegion

2018-03-19 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23744: Description: In the class _ReadableChannelFileRegion_,  the _buffer_ is direct memory, we should  modify 

[jira] [Created] (SPARK-23744) Memory leak in ReadableChannelFileRegion

2018-03-19 Thread liuxian (JIRA)
liuxian created SPARK-23744: --- Summary: Memory leak in ReadableChannelFileRegion Key: SPARK-23744 URL: https://issues.apache.org/jira/browse/SPARK-23744 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-23651) Add a check for host name

2018-03-15 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-23651. - Resolution: Fixed > Add a check for host name > -- > > Key:

[jira] [Created] (SPARK-23651) Add a check for host name

2018-03-12 Thread liuxian (JIRA)
liuxian created SPARK-23651: --- Summary: Add a check for host name Key: SPARK-23651 URL: https://issues.apache.org/jira/browse/SPARK-23651 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2018-03-05 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-23516. - Resolution: Invalid > I think it is unnecessary to transfer unroll memory to storage memory >

[jira] [Updated] (SPARK-23532) [STANDALONE] Improve data locality when launching new executors for dynamic allocation

2018-02-27 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23532: Description: Currently Spark on Yarn supports better data locality by considering the preferred locations

[jira] [Updated] (SPARK-23532) [STANDALONE] Improve data locality when launching new executors for dynamic allocation

2018-02-27 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23532: Description: Currently Spark on Yarn supports better data locality by considering the preferred locations

[jira] [Updated] (SPARK-23532) [STANDALONE] Improve data locality when launching new executors for dynamic allocation

2018-02-27 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23532: Description: Currently Spark on Yarn supports better data locality by considering the preferred locations

[jira] [Created] (SPARK-23532) [STANDALONE] Improve data locality when launching new executors for dynamic allocation

2018-02-27 Thread liuxian (JIRA)
liuxian created SPARK-23532: --- Summary: [STANDALONE] Improve data locality when launching new executors for dynamic allocation Key: SPARK-23532 URL: https://issues.apache.org/jira/browse/SPARK-23532

[jira] [Created] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2018-02-26 Thread liuxian (JIRA)
liuxian created SPARK-23516: --- Summary: I think it is unnecessary to transfer unroll memory to storage memory Key: SPARK-23516 URL: https://issues.apache.org/jira/browse/SPARK-23516 Project: Spark

[jira] [Resolved] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-21 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-23404. - Resolution: Invalid > When the underlying buffers are already direct, we should copy them to the > heap

[jira] [Updated] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-12 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23404: Description: If the memory mode is _ON_HEAP_,when the underlying buffers are direct, we should copy them

[jira] [Updated] (SPARK-23404) When the underlying buffers are already direct, we should copy them to the heap memory

2018-02-12 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23404: Summary: When the underlying buffers are already direct, we should copy them to the heap memory (was:

[jira] [Created] (SPARK-23404) When the underlying buffers are already direct, we should copy it to the heap memory

2018-02-12 Thread liuxian (JIRA)
liuxian created SPARK-23404: --- Summary: When the underlying buffers are already direct, we should copy it to the heap memory Key: SPARK-23404 URL: https://issues.apache.org/jira/browse/SPARK-23404 Project:

[jira] [Updated] (SPARK-23391) It may lead to overflow for some integer multiplication

2018-02-11 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23391: Priority: Minor (was: Major) > It may lead to overflow for some integer multiplication >

[jira] [Updated] (SPARK-23391) It may lead to overflow for some integer multiplication

2018-02-11 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23391: Description: In the {{getBlockData}},{{blockId.reduceId}} is the {{Int}} type, when it is greater than

[jira] [Updated] (SPARK-23391) It may lead to overflow for some integer multiplication

2018-02-11 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23391: Description: In the {{getBlockData}},{{blockId.reduceId}} is the {{Int}} type, when it is greater than

[jira] [Created] (SPARK-23391) It may lead to overflow for some integer multiplication

2018-02-11 Thread liuxian (JIRA)
liuxian created SPARK-23391: --- Summary: It may lead to overflow for some integer multiplication Key: SPARK-23391 URL: https://issues.apache.org/jira/browse/SPARK-23391 Project: Spark Issue Type:

[jira] [Created] (SPARK-23389) When the shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`, we should be able to use serialized sorting.

2018-02-11 Thread liuxian (JIRA)
liuxian created SPARK-23389: --- Summary: When the shuffle dependency specifies aggregation ,and `dependency.mapSideCombine=false`, we should be able to use serialized sorting. Key: SPARK-23389 URL:

[jira] [Updated] (SPARK-23358) When the number of partitions is greater than 2^28, it will result in an error result

2018-02-08 Thread liuxian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian updated SPARK-23358: Description: In the `checkIndexAndDataFile`,the _blocks_ is the  _Int_ type,  when it is greater than

  1   2   >