Gengliang Wang created SPARK-21259:
--
Summary: More rules for scalastyle
Key: SPARK-21259
URL: https://issues.apache.org/jira/browse/SPARK-21259
Project: Spark
Issue Type: Improvement
Gengliang Wang created SPARK-21323:
--
Summary: Rename sql.catalyst.plans.logical.statsEstimation.Range
to ValueInterval
Key: SPARK-21323
URL: https://issues.apache.org/jira/browse/SPARK-21323
Gengliang Wang created SPARK-21336:
--
Summary: Revise rand comparison in BatchEvalPythonExecSuite
Key: SPARK-21336
URL: https://issues.apache.org/jira/browse/SPARK-21336
Project: Spark
Issue
Gengliang Wang created SPARK-21174:
--
Summary: Validate sampling fraction in logical operator level
Key: SPARK-21174
URL: https://issues.apache.org/jira/browse/SPARK-21174
Project: Spark
Gengliang Wang created SPARK-21196:
--
Summary: Split codegen info of query plan into sequence
Key: SPARK-21196
URL: https://issues.apache.org/jira/browse/SPARK-21196
Project: Spark
Issue
Gengliang Wang created SPARK-21222:
--
Summary: Move elimination of Distinct clause from analyzer to
optimizer
Key: SPARK-21222
URL: https://issues.apache.org/jira/browse/SPARK-21222
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065791#comment-16065791
]
Gengliang Wang commented on SPARK-21222:
[~srowen] thanks! I have corrected the statement.
>
[
https://issues.apache.org/jira/browse/SPARK-21222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-21222:
---
Description:
Move elimination of Distinct clause from analyzer to optimizer
Distinct clause
Gengliang Wang created SPARK-22037:
--
Summary: Collapse Project if it is the child of Aggregate
Key: SPARK-22037
URL: https://issues.apache.org/jira/browse/SPARK-22037
Project: Spark
Issue
Gengliang Wang created SPARK-22263:
--
Summary: Refactor deterministic as lazy value
Key: SPARK-22263
URL: https://issues.apache.org/jira/browse/SPARK-22263
Project: Spark
Issue Type:
Gengliang Wang created SPARK-21979:
--
Summary: Improve QueryPlanConstraints framework
Key: SPARK-21979
URL: https://issues.apache.org/jira/browse/SPARK-21979
Project: Spark
Issue Type: Bug
Gengliang Wang created SPARK-21848:
--
Summary: Create trait to identify user-defined functions
Key: SPARK-21848
URL: https://issues.apache.org/jira/browse/SPARK-21848
Project: Spark
Issue
Gengliang Wang created SPARK-22257:
--
Summary: Reserve all non-deterministic expressions in
ExpressionSet.
Key: SPARK-22257
URL: https://issues.apache.org/jira/browse/SPARK-22257
Project: Spark
Gengliang Wang created SPARK-22141:
--
Summary: Propagate empty relation before checking Cartesian
products
Key: SPARK-22141
URL: https://issues.apache.org/jira/browse/SPARK-22141
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-22141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-22141:
---
Description:
When inferring constraints from children, Join's condition can be simplified as
[
https://issues.apache.org/jira/browse/SPARK-22615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-22615:
---
Description:
Currently, in the optimize rule `PropagateEmptyRelation`, the following cases
[
https://issues.apache.org/jira/browse/SPARK-22615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-22615:
---
Description:
Currently, in the optimize rule `PropagateEmptyRelation`, the following cases
[
https://issues.apache.org/jira/browse/SPARK-22615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-22615:
---
Description:
Currently, in the optimize rule `PropagateEmptyRelation`, the following cases
Gengliang Wang created SPARK-22615:
--
Summary: Handle more cases in PropagateEmptyRelation
Key: SPARK-22615
URL: https://issues.apache.org/jira/browse/SPARK-22615
Project: Spark
Issue Type:
Gengliang Wang created SPARK-22763:
--
Summary: SHS: Ignore unknown events and parse through the file
Key: SPARK-22763
URL: https://issues.apache.org/jira/browse/SPARK-22763
Project: Spark
Gengliang Wang created SPARK-22834:
--
Summary: Make insert commands have real children to fix UI issues
Key: SPARK-22834
URL: https://issues.apache.org/jira/browse/SPARK-22834
Project: Spark
Gengliang Wang created SPARK-22559:
--
Summary: history server: handle exception on opening corrupted
listing.ldb
Key: SPARK-22559
URL: https://issues.apache.org/jira/browse/SPARK-22559
Project: Spark
Gengliang Wang created SPARK-22719:
--
Summary: refactor ConstantPropagation
Key: SPARK-22719
URL: https://issues.apache.org/jira/browse/SPARK-22719
Project: Spark
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/SPARK-22719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-22719:
---
Description:
The current time complexity of ConstantPropagation is O(n^2), which can be slow
Gengliang Wang created SPARK-24275:
--
Summary: Revise doc comments in InputPartition
Key: SPARK-24275
URL: https://issues.apache.org/jira/browse/SPARK-24275
Project: Spark
Issue Type:
Gengliang Wang created SPARK-24277:
--
Summary: Code clean up in SQL module:
HadoopMapReduceCommitProtocol/FileFormatWriter
Key: SPARK-24277
URL: https://issues.apache.org/jira/browse/SPARK-24277
[
https://issues.apache.org/jira/browse/SPARK-24330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24330:
---
Description:
Refactor ExecuteWriteTask in FileFormatWriter to reduce common logic and
Gengliang Wang created SPARK-24330:
--
Summary: Refactor ExecuteWriteTask in FileFormatWriter with
DataWriter(V2)
Key: SPARK-24330
URL: https://issues.apache.org/jira/browse/SPARK-24330
Project: Spark
Gengliang Wang created SPARK-24365:
--
Summary: Add Parquet write benchmark
Key: SPARK-24365
URL: https://issues.apache.org/jira/browse/SPARK-24365
Project: Spark
Issue Type: Improvement
Gengliang Wang created SPARK-24367:
--
Summary: Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag
ENABLE_JOB_SUMMARY
Key: SPARK-24367
URL: https://issues.apache.org/jira/browse/SPARK-24367
Gengliang Wang created SPARK-24524:
--
Summary: Improve aggregateMetrics: less memory usage and loops
Key: SPARK-24524
URL: https://issues.apache.org/jira/browse/SPARK-24524
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24365:
---
Description: Add data source write benchmark. So that it would be easier to
measure the
[
https://issues.apache.org/jira/browse/SPARK-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24365:
---
Summary: Add data source write benchmark (was: Add Parquet write benchmark)
> Add data
Gengliang Wang created SPARK-23005:
--
Summary: Improve RDD.take on small number of partitions
Key: SPARK-23005
URL: https://issues.apache.org/jira/browse/SPARK-23005
Project: Spark
Issue
Gengliang Wang created SPARK-22990:
--
Summary: Fix method isFairScheduler in JobsTab and StagesTab
Key: SPARK-22990
URL: https://issues.apache.org/jira/browse/SPARK-22990
Project: Spark
Gengliang Wang created SPARK-23079:
--
Summary: Fix query constraints propagation with aliases
Key: SPARK-23079
URL: https://issues.apache.org/jira/browse/SPARK-23079
Project: Spark
Issue
Gengliang Wang created SPARK-23219:
--
Summary: Rename ReadTask to DataReaderFactory
Key: SPARK-23219
URL: https://issues.apache.org/jira/browse/SPARK-23219
Project: Spark
Issue Type:
Gengliang Wang created SPARK-23202:
--
Summary: Break down DataSourceV2Writer.commit into two phase
Key: SPARK-23202
URL: https://issues.apache.org/jira/browse/SPARK-23202
Project: Spark
Gengliang Wang created SPARK-23268:
--
Summary: Reorganize packages in data source V2
Key: SPARK-23268
URL: https://issues.apache.org/jira/browse/SPARK-23268
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-23202:
---
Description:
The current DataSourceWriter API makes it hard to implement
[
https://issues.apache.org/jira/browse/SPARK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-23202:
---
Description:
The current DataSourceWriter API makes it hard to implement
[
https://issues.apache.org/jira/browse/SPARK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-23202:
---
Summary: Add new API in DataSourceWriter: onDataWriterCommit (was: Break
down
[
https://issues.apache.org/jira/browse/SPARK-23202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-23202:
---
Affects Version/s: (was: 2.2.1)
2.3.0
> Break down
Gengliang Wang created SPARK-23490:
--
Summary: Check storage.locationUri with existing table in
CreateTable
Key: SPARK-23490
URL: https://issues.apache.org/jira/browse/SPARK-23490
Project: Spark
Gengliang Wang created SPARK-23507:
--
Summary: Migrate file-based data sources to data source v2
Key: SPARK-23507
URL: https://issues.apache.org/jira/browse/SPARK-23507
Project: Spark
Issue
Gengliang Wang created SPARK-25002:
--
Summary: Avro: revise the output namespace
Key: SPARK-25002
URL: https://issues.apache.org/jira/browse/SPARK-25002
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-25002:
---
Summary: Avro: revise the output record namespace (was: Avro: revise the
output namespace)
Gengliang Wang created SPARK-25104:
--
Summary: Validate user specified output schema
Key: SPARK-25104
URL: https://issues.apache.org/jira/browse/SPARK-25104
Project: Spark
Issue Type:
Gengliang Wang created SPARK-25129:
--
Summary: Revert mapping of com.databricks.spark.avro
Key: SPARK-25129
URL: https://issues.apache.org/jira/browse/SPARK-25129
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581929#comment-16581929
]
Gengliang Wang commented on SPARK-24924:
As package "org.apache.spark.sql.avro" is external
[
https://issues.apache.org/jira/browse/SPARK-23817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582005#comment-16582005
]
Gengliang Wang commented on SPARK-23817:
[~dongjoon] Thanks! This issue is still open.
>
Gengliang Wang created SPARK-25133:
--
Summary: Documentaion: AVRO data source guide
Key: SPARK-25133
URL: https://issues.apache.org/jira/browse/SPARK-25133
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583483#comment-16583483
]
Gengliang Wang commented on SPARK-24924:
[~dongjoon] I see. I am now +1 with adding new
[
https://issues.apache.org/jira/browse/SPARK-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-25129:
---
Description:
In https://issues.apache.org/jira/browse/SPARK-24924, the data source provider
Gengliang Wang created SPARK-25099:
--
Summary: Generate Avro Binary files in test suite
Key: SPARK-25099
URL: https://issues.apache.org/jira/browse/SPARK-25099
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-25099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-25099:
---
Description:
In PR [https://github.com/apache/spark/pull/21984] and
[
https://issues.apache.org/jira/browse/SPARK-24774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24774:
---
Summary: support reading AVRO logical types - Decimal (was: support
reading AVRO logical
[
https://issues.apache.org/jira/browse/SPARK-24772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24772:
---
Summary: support reading AVRO logical types - Date (was: support reading
AVRO logical
Gengliang Wang created SPARK-25160:
--
Summary: Remove sql configuration
spark.sql.avro.outputTimestampType
Key: SPARK-25160
URL: https://issues.apache.org/jira/browse/SPARK-25160
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-25129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang resolved SPARK-25129.
Resolution: Fixed
> Make the mapping of com.databricks.spark.avro to built-in module
Gengliang Wang created SPARK-24876:
--
Summary: Remove SerializableSchema and use json format string
schema
Key: SPARK-24876
URL: https://issues.apache.org/jira/browse/SPARK-24876
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-24876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24876:
---
Summary: Simplify schema serialization (was: Remove SerializableSchema and
use json format
[
https://issues.apache.org/jira/browse/SPARK-24770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang resolved SPARK-24770.
Resolution: Duplicate
The function `from_avro` and `to_avro` can be added in one PR:
#
[
https://issues.apache.org/jira/browse/SPARK-24770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24770:
---
Comment: was deleted
(was: The function `from_avro` and `to_avro` can be added in one PR:
[
https://issues.apache.org/jira/browse/SPARK-24769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang resolved SPARK-24769.
Resolution: Duplicate
> Support for parsing AVRO binary column
>
[
https://issues.apache.org/jira/browse/SPARK-24769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544636#comment-16544636
]
Gengliang Wang commented on SPARK-24769:
The function `from_avro` and `to_avro` can be added in
[
https://issues.apache.org/jira/browse/SPARK-24770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544637#comment-16544637
]
Gengliang Wang commented on SPARK-24770:
The function `from_avro` and `to_avro` can be added in
[
https://issues.apache.org/jira/browse/SPARK-24770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544639#comment-16544639
]
Gengliang Wang commented on SPARK-24770:
[~felipesmmelo] Thank you. But I have created a PR:
[
https://issues.apache.org/jira/browse/SPARK-24769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544640#comment-16544640
]
Gengliang Wang commented on SPARK-24769:
[~felipesmmelo] Thank you. But I have created a PR:
Gengliang Wang created SPARK-24811:
--
Summary: Add function `from_avro` and `to_avro`
Key: SPARK-24811
URL: https://issues.apache.org/jira/browse/SPARK-24811
Project: Spark
Issue Type:
Gengliang Wang created SPARK-24883:
--
Summary: Remove implicit class
AvroDataFrameWriter/AvroDataFrameReader
Key: SPARK-24883
URL: https://issues.apache.org/jira/browse/SPARK-24883
Project: Spark
Gengliang Wang created SPARK-24887:
--
Summary: Use SerializableConfiguration in Spark util
Key: SPARK-24887
URL: https://issues.apache.org/jira/browse/SPARK-24887
Project: Spark
Issue Type:
Gengliang Wang created SPARK-24858:
--
Summary: Avoid unnecessary parquet footer reads
Key: SPARK-24858
URL: https://issues.apache.org/jira/browse/SPARK-24858
Project: Spark
Issue Type:
Gengliang Wang created SPARK-24919:
--
Summary: Scala linter rule for sparkContext.hadoopConfiguration
Key: SPARK-24919
URL: https://issues.apache.org/jira/browse/SPARK-24919
Project: Spark
Gengliang Wang created SPARK-25305:
--
Summary: Respect attribute name in `CollapseProject`
Key: SPARK-25305
URL: https://issues.apache.org/jira/browse/SPARK-25305
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-25305:
---
Summary: Respect attribute name in `CollapseProject` and `ColumnPruning`
(was: Respect
[
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-25305:
---
Description:
Currently in optimizer rule `CollapseProject`, the lower level project is
[
https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605308#comment-16605308
]
Gengliang Wang commented on SPARK-24771:
[~vanzin] I am OK with either way. Shading Avro 1.8 in
Gengliang Wang created SPARK-24768:
--
Summary: Have a built-in AVRO data source implementation
Key: SPARK-24768
URL: https://issues.apache.org/jira/browse/SPARK-24768
Project: Spark
Issue
Gengliang Wang created SPARK-24771:
--
Summary: Upgrade AVRO version from 1.7.7 to 1.8
Key: SPARK-24771
URL: https://issues.apache.org/jira/browse/SPARK-24771
Project: Spark
Issue Type:
Gengliang Wang created SPARK-24769:
--
Summary: Support for parsing AVRO string column
Key: SPARK-24769
URL: https://issues.apache.org/jira/browse/SPARK-24769
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24768:
---
Attachment: Design doc-Spark Avro.pdf
> Have a built-in AVRO data source implementation
>
Gengliang Wang created SPARK-24772:
--
Summary: support reading AVRO logical types - Decimal
Key: SPARK-24772
URL: https://issues.apache.org/jira/browse/SPARK-24772
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-24776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24776:
---
Summary: AVRO unit test: use SQLTestUtils and Replace deprecated methods
(was: Improve
[
https://issues.apache.org/jira/browse/SPARK-24768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24768:
---
Attachment: Built-in AVRO Data Source In Spark 2.4.pdf
> Have a built-in AVRO data source
Gengliang Wang created SPARK-24776:
--
Summary: Improve AVRO unit test: use
Key: SPARK-24776
URL: https://issues.apache.org/jira/browse/SPARK-24776
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-24768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24768:
---
Description: Apache Avro (https://avro.apache.org) is a popular data
serialization format.
[
https://issues.apache.org/jira/browse/SPARK-24768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24768:
---
Attachment: (was: Design doc-Spark Avro.pdf)
> Have a built-in AVRO data source
[
https://issues.apache.org/jira/browse/SPARK-24770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24770:
---
Summary: Supporting to convert a column into binary of AVRO format (was:
Supporting to
[
https://issues.apache.org/jira/browse/SPARK-24769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gengliang Wang updated SPARK-24769:
---
Summary: Support for parsing AVRO binary column (was: Support for parsing
AVRO string
Gengliang Wang created SPARK-24777:
--
Summary: Refactor AVRO read/write benchmark
Key: SPARK-24777
URL: https://issues.apache.org/jira/browse/SPARK-24777
Project: Spark
Issue Type: Sub-task
Gengliang Wang created SPARK-24770:
--
Summary: Supporting to convert a column into binary of avro format
Key: SPARK-24770
URL: https://issues.apache.org/jira/browse/SPARK-24770
Project: Spark
Gengliang Wang created SPARK-24775:
--
Summary: support reading AVRO logical types - Duration
Key: SPARK-24775
URL: https://issues.apache.org/jira/browse/SPARK-24775
Project: Spark
Issue
Gengliang Wang created SPARK-24774:
--
Summary: support reading AVRO logical types - Time with different
precisions
Key: SPARK-24774
URL: https://issues.apache.org/jira/browse/SPARK-24774
Project:
Gengliang Wang created SPARK-24773:
--
Summary: support reading AVRO logical types - Timestamp with
different precisions
Key: SPARK-24773
URL: https://issues.apache.org/jira/browse/SPARK-24773
Gengliang Wang created SPARK-24792:
--
Summary: Add API `.avro` in DataFrameReader/DataFrameWriter
Key: SPARK-24792
URL: https://issues.apache.org/jira/browse/SPARK-24792
Project: Spark
Issue
Gengliang Wang created SPARK-24800:
--
Summary: Refactor Avro Serializer and Deserializer
Key: SPARK-24800
URL: https://issues.apache.org/jira/browse/SPARK-24800
Project: Spark
Issue Type:
Gengliang Wang created SPARK-23624:
--
Summary: Revise doc of method pushFilters
Key: SPARK-23624
URL: https://issues.apache.org/jira/browse/SPARK-23624
Project: Spark
Issue Type: Improvement
Gengliang Wang created SPARK-23896:
--
Summary: Improve PartitioningAwareFileIndex
Key: SPARK-23896
URL: https://issues.apache.org/jira/browse/SPARK-23896
Project: Spark
Issue Type:
Gengliang Wang created SPARK-24045:
--
Summary: Create base class for file data source v2
Key: SPARK-24045
URL: https://issues.apache.org/jira/browse/SPARK-24045
Project: Spark
Issue Type:
1 - 100 of 1978 matches
Mail list logo