[jira] [Created] (SPARK-31872) NotNullSafe to get complementary set

2020-05-30 Thread Xiaoju Wu (Jira)
Xiaoju Wu created SPARK-31872: - Summary: NotNullSafe to get complementary set Key: SPARK-31872 URL: https://issues.apache.org/jira/browse/SPARK-31872 Project: Spark Issue Type: Improvement

[jira] [Comment Edited] (SPARK-30443) "Managed memory leak detected" even with no calls to take() or limit()

2020-03-26 Thread Xiaoju Wu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068290#comment-17068290 ] Xiaoju Wu edited comment on SPARK-30443 at 3/27/20, 5:50 AM: - Also see this

[jira] [Commented] (SPARK-30443) "Managed memory leak detected" even with no calls to take() or limit()

2020-03-26 Thread Xiaoju Wu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068290#comment-17068290 ] Xiaoju Wu commented on SPARK-30443: --- Also see this kind of warning logs. SPARK-21492 may relate to

[jira] [Updated] (SPARK-31069) high cpu caused by chunksBeingTransferred in external shuffle service

2020-03-07 Thread Xiaoju Wu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-31069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoju Wu updated SPARK-31069: -- Description: "shuffle-chunk-fetch-handler-2-40" #250 daemon prio=5 os_prio=0 tid=0x02ac

[jira] [Created] (SPARK-31069) high cpu caused by chunksBeingTransferred in external shuffle service

2020-03-06 Thread Xiaoju Wu (Jira)
Xiaoju Wu created SPARK-31069: - Summary: high cpu caused by chunksBeingTransferred in external shuffle service Key: SPARK-31069 URL: https://issues.apache.org/jira/browse/SPARK-31069 Project: Spark

[jira] [Commented] (SPARK-23811) FetchFailed comes before Success of same task will cause child stage never succeed

2020-01-01 Thread Xiaoju Wu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-23811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17006539#comment-17006539 ] Xiaoju Wu commented on SPARK-23811: --- [~XuanYuan] The issue seems still exist after patch #17955, any

[jira] [Created] (SPARK-30298) bucket join cannot work for self-join with views

2019-12-18 Thread Xiaoju Wu (Jira)
Xiaoju Wu created SPARK-30298: - Summary: bucket join cannot work for self-join with views Key: SPARK-30298 URL: https://issues.apache.org/jira/browse/SPARK-30298 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-30072) Create dedicated planner for subqueries

2019-12-13 Thread Xiaoju Wu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16995627#comment-16995627 ] Xiaoju Wu commented on SPARK-30072: --- [~cloud_fan] If the sql looks like: SELECT * FROM df2 WHERE

[jira] [Commented] (SPARK-30072) Create dedicated planner for subqueries

2019-12-09 Thread Xiaoju Wu (Jira)
[ https://issues.apache.org/jira/browse/SPARK-30072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991554#comment-16991554 ] Xiaoju Wu commented on SPARK-30072: --- [~afroozeh] I think the change from checking if queryExecution

[jira] [Created] (SPARK-30186) support Dynamic Partition Pruning in Adaptive Execution

2019-12-09 Thread Xiaoju Wu (Jira)
Xiaoju Wu created SPARK-30186: - Summary: support Dynamic Partition Pruning in Adaptive Execution Key: SPARK-30186 URL: https://issues.apache.org/jira/browse/SPARK-30186 Project: Spark Issue

[jira] [Commented] (SPARK-27290) remove unneed sort under Aggregate

2019-06-12 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862698#comment-16862698 ] Xiaoju Wu commented on SPARK-27290: --- [~joshrosen] Got it. I think we should identify in which patterns

[jira] [Created] (SPARK-27431) move HashedRelation to global UnifiedMemoryManager and enable offheap

2019-04-10 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-27431: - Summary: move HashedRelation to global UnifiedMemoryManager and enable offheap Key: SPARK-27431 URL: https://issues.apache.org/jira/browse/SPARK-27431 Project: Spark

[jira] [Commented] (SPARK-27290) remove unneed sort under Aggregate

2019-03-28 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803737#comment-16803737 ] Xiaoju Wu commented on SPARK-27290: --- [~ekoifman] HashAggregate can not benefit from sorted input but

[jira] [Created] (SPARK-27290) remove unneed sort under Aggregate

2019-03-27 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-27290: - Summary: remove unneed sort under Aggregate Key: SPARK-27290 URL: https://issues.apache.org/jira/browse/SPARK-27290 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21492) Memory leak in SortMergeJoin

2019-03-18 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794962#comment-16794962 ] Xiaoju Wu commented on SPARK-21492: --- Any updates? Do you have any discussion on the general fix

[jira] [Commented] (SPARK-25837) Web UI does not respect spark.ui.retainedJobs in some instances

2019-03-18 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794932#comment-16794932 ] Xiaoju Wu commented on SPARK-25837: --- Did you verify this fix with the reproduce case above? I tried

[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2019-03-18 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794805#comment-16794805 ] Xiaoju Wu commented on SPARK-23375: --- But one of your test cases is conflict with what I talked about

[jira] [Commented] (SPARK-23375) Optimizer should remove unneeded Sort

2019-03-18 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794791#comment-16794791 ] Xiaoju Wu commented on SPARK-23375: --- I think there's another case in which sort is redundant: Sort

[jira] [Created] (SPARK-26779) NullPointerException when disable wholestage codegen

2019-01-29 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-26779: - Summary: NullPointerException when disable wholestage codegen Key: SPARK-26779 URL: https://issues.apache.org/jira/browse/SPARK-26779 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-09-25 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628184#comment-16628184 ] Xiaoju Wu commented on SPARK-23839: --- [~smilegator] Is there any plan on the cost-based optimizer? >

[jira] [Created] (SPARK-24088) only HadoopRDD leverage HDFS Cache as preferred location

2018-04-25 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-24088: - Summary: only HadoopRDD leverage HDFS Cache as preferred location Key: SPARK-24088 URL: https://issues.apache.org/jira/browse/SPARK-24088 Project: Spark Issue

[jira] [Comment Edited] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-04-02 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1648#comment-1648 ] Xiaoju Wu edited comment on SPARK-23839 at 4/2/18 3:05 PM: --- Yes, bucketing is

[jira] [Commented] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-04-02 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1648#comment-1648 ] Xiaoju Wu commented on SPARK-23839: --- Yes, bucketing is one of the cases to say that the cost of

[jira] [Commented] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-04-02 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422116#comment-16422116 ] Xiaoju Wu commented on SPARK-23839: --- [~maropu] My concern is, "bucket join always firstly" doesn't mean

[jira] [Updated] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-04-02 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoju Wu updated SPARK-23839: -- Description: Since spark 2.2, the cost-based JoinReorder rule is implemented and in Spark 2.3

[jira] [Commented] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-04-01 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421613#comment-16421613 ] Xiaoju Wu commented on SPARK-23839: --- Any discussion or ticket already related to this topic please let

[jira] [Created] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-04-01 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-23839: - Summary: consider bucket join in cost-based JoinReorder rule Key: SPARK-23839 URL: https://issues.apache.org/jira/browse/SPARK-23839 Project: Spark Issue Type:

[jira] [Commented] (SPARK-17570) Avoid Hash and Exchange in Sort Merge join if bucketing factor is multiple for tables

2018-03-22 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16410752#comment-16410752 ] Xiaoju Wu commented on SPARK-17570: --- [~tejasp] When you join 3 tables with bucket number 4,8,12, if

[jira] [Issue Comment Deleted] (SPARK-17495) Hive hash implementation

2018-03-07 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoju Wu updated SPARK-17495: -- Comment: was deleted (was: [~tejasp] I can see HiveHash merged but never used. Seems the using of

[jira] [Commented] (SPARK-17495) Hive hash implementation

2018-03-06 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387873#comment-16387873 ] Xiaoju Wu commented on SPARK-17495: --- [~tejasp] I can see HiveHash merged but never used. Seems the

[jira] [Commented] (SPARK-22469) Accuracy problem in comparison with string and numeric

2018-03-04 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385156#comment-16385156 ] Xiaoju Wu commented on SPARK-22469: --- [~liutang123] cast Decimal to Double is possible to lose

[jira] [Resolved] (SPARK-23493) insert-into depends on columns order, otherwise incorrect data inserted

2018-02-28 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoju Wu resolved SPARK-23493. --- Resolution: Not A Bug > insert-into depends on columns order, otherwise incorrect data inserted >

[jira] [Commented] (SPARK-23493) insert-into depends on columns order, otherwise incorrect data inserted

2018-02-23 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374266#comment-16374266 ] Xiaoju Wu commented on SPARK-23493: --- If that's the case, it should throw an exception to tell the users

[jira] [Commented] (SPARK-23493) insert-into depends on columns order, otherwise incorrect data inserted

2018-02-23 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374172#comment-16374172 ] Xiaoju Wu commented on SPARK-23493: --- [~mgaido] "Columns are matched in order while inserting" This is

[jira] [Commented] (SPARK-23493) insert-into depends on columns order, otherwise incorrect data inserted

2018-02-23 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374076#comment-16374076 ] Xiaoju Wu commented on SPARK-23493: --- This issue is similar with the issue described in ticket 

[jira] [Commented] (SPARK-9278) DataFrameWriter.insertInto inserts incorrect data

2018-02-23 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374073#comment-16374073 ] Xiaoju Wu commented on SPARK-9278: -- Created a new ticket to trace this issue SPARK-23493 >

[jira] [Updated] (SPARK-23493) insert-into depends on columns order, otherwise incorrect data inserted

2018-02-23 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoju Wu updated SPARK-23493: -- Description: insert-into only works when the partitionby key columns are set at last: val data = Seq(

[jira] [Created] (SPARK-23493) insert-into depends on columns order, otherwise incorrect data inserted

2018-02-23 Thread Xiaoju Wu (JIRA)
Xiaoju Wu created SPARK-23493: - Summary: insert-into depends on columns order, otherwise incorrect data inserted Key: SPARK-23493 URL: https://issues.apache.org/jira/browse/SPARK-23493 Project: Spark

[jira] [Comment Edited] (SPARK-9278) DataFrameWriter.insertInto inserts incorrect data

2018-02-22 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374040#comment-16374040 ] Xiaoju Wu edited comment on SPARK-9278 at 2/23/18 7:48 AM: --- Seems the issue

[jira] [Commented] (SPARK-9278) DataFrameWriter.insertInto inserts incorrect data

2018-02-22 Thread Xiaoju Wu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374040#comment-16374040 ] Xiaoju Wu commented on SPARK-9278: -- Seems the issue still exists, here's the test: val data = Seq( (7,