[jira] [Commented] (SPARK-19256) Hive bucketing support

2018-02-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354440#comment-16354440 ] Tejas Patil commented on SPARK-19256: - [~ferdonline] : The feature you are referring to is not in the

[jira] [Commented] (SPARK-18067) SortMergeJoin adds shuffle if join predicates have non partitioned columns

2018-02-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354429#comment-16354429 ] Tejas Patil commented on SPARK-18067: - [~eyalfa] :  Re #1: Theoretically this idea should help

[jira] [Commented] (SPARK-23034) Display tablename for `HiveTableScan` node in UI

2018-01-10 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16321529#comment-16321529 ] Tejas Patil commented on SPARK-23034: - [~dongjoon] recommended that the scope of this JIRA could be

[jira] [Created] (SPARK-23034) Display tablename for `HiveTableScan` node in UI

2018-01-10 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-23034: --- Summary: Display tablename for `HiveTableScan` node in UI Key: SPARK-23034 URL: https://issues.apache.org/jira/browse/SPARK-23034 Project: Spark Issue Type:

[jira] [Commented] (SPARK-22042) ReorderJoinPredicates can break when child's partitioning is not decided

2017-11-11 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248745#comment-16248745 ] Tejas Patil commented on SPARK-22042: - Am trying out the suggestion discussed in

[jira] [Resolved] (SPARK-21649) Support writing data into hive bucket table.

2017-10-11 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil resolved SPARK-21649. - Resolution: Duplicate > Support writing data into hive bucket table. >

[jira] [Commented] (SPARK-21649) Support writing data into hive bucket table.

2017-10-11 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200428#comment-16200428 ] Tejas Patil commented on SPARK-21649: - yes. It is duplicate of SPARK-19256 > Support writing data

[jira] [Commented] (SPARK-22220) Spark SQL: LATERAL VIEW OUTER null pointer exception with GROUP BY

2017-10-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195929#comment-16195929 ] Tejas Patil commented on SPARK-0: - Does this repro with current spark trunk (over CLI) ? I could

[jira] [Commented] (SPARK-22220) Spark SQL: LATERAL VIEW OUTER null pointer exception with GROUP BY

2017-10-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195927#comment-16195927 ] Tejas Patil commented on SPARK-0: - Whats the full stack trace for `NullPointerException` ? >

[jira] [Comment Edited] (SPARK-22042) ReorderJoinPredicates can break when child's partitioning is not decided

2017-09-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169148#comment-16169148 ] Tejas Patil edited comment on SPARK-22042 at 9/17/17 1:09 AM: -- At the core,

[jira] [Commented] (SPARK-22042) ReorderJoinPredicates can break when child's partitioning is not decided

2017-09-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169148#comment-16169148 ] Tejas Patil commented on SPARK-22042: - At the core, the problem is `ReorderJoinPredicates` expects

[jira] [Created] (SPARK-22042) ReorderJoinPredicates can break when child's partitioning is not decided

2017-09-16 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-22042: --- Summary: ReorderJoinPredicates can break when child's partitioning is not decided Key: SPARK-22042 URL: https://issues.apache.org/jira/browse/SPARK-22042 Project:

[jira] [Commented] (SPARK-20313) Possible lack of join optimization when partitions are in the join condition

2017-08-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16148463#comment-16148463 ] Tejas Patil commented on SPARK-20313: - I tried to replicate what you shared in the jira but dont see

[jira] [Commented] (SPARK-18067) SortMergeJoin adds shuffle if join predicates have non partitioned columns

2017-08-25 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141748#comment-16141748 ] Tejas Patil commented on SPARK-18067: - PR: https://github.com/apache/spark/pull/19054 >

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-08-19 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134268#comment-16134268 ] Tejas Patil commented on SPARK-19256: - Opened a PR which has both reader and writer side changes :

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-08-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128305#comment-16128305 ] Tejas Patil commented on SPARK-19256: - PR for writer side changes is out :

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-08-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125084#comment-16125084 ] Tejas Patil commented on SPARK-19256: - After the refactoring of the insertion plan node has been

[jira] [Commented] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow

2017-08-04 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114694#comment-16114694 ] Tejas Patil commented on SPARK-21595: - [~sreiling] Spilling will happen only when _both_ these are

[jira] [Commented] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow

2017-08-03 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113622#comment-16113622 ] Tejas Patil commented on SPARK-21595: - [~hvanhovell] : I am fine with either options you mentioned.

[jira] [Commented] (SPARK-21595) introduction of spark.sql.windowExec.buffer.spill.threshold in spark 2.2 breaks existing workflow

2017-08-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110053#comment-16110053 ] Tejas Patil commented on SPARK-21595: - This config was introduced by me in SPARK-13450. The reason

[jira] [Commented] (SPARK-21079) ANALYZE TABLE fails to calculate totalSize for a partitioned table

2017-06-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048381#comment-16048381 ] Tejas Patil commented on SPARK-21079: - [~ZenWzh] The reason why unit tests won't catch this is

[jira] [Closed] (SPARK-15905) Driver hung while writing to console progress bar

2017-05-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil closed SPARK-15905. --- Resolution: Cannot Reproduce > Driver hung while writing to console progress bar >

[jira] [Commented] (SPARK-15905) Driver hung while writing to console progress bar

2017-05-30 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029846#comment-16029846 ] Tejas Patil commented on SPARK-15905: - I haven't seen this in a while with Spark 2.0. Closing. If

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-05-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019109#comment-16019109 ] Tejas Patil commented on SPARK-19256: - [~cloud_fan] : After SPARK-18243, `InsertIntoHiveTable` is a

[jira] [Created] (SPARK-20758) Add Constant propagation optimization

2017-05-15 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-20758: --- Summary: Add Constant propagation optimization Key: SPARK-20758 URL: https://issues.apache.org/jira/browse/SPARK-20758 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-20703) Add an operator for writing data out

2017-05-14 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009978#comment-16009978 ] Tejas Patil commented on SPARK-20703: - for dynamic partitioned tables, number of partitions produced

[jira] [Commented] (SPARK-20703) Add an operator for writing data out

2017-05-14 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009977#comment-16009977 ] Tejas Patil commented on SPARK-20703: - [~viirya] : - Would this new operator be a physical plan node

[jira] [Commented] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-05-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009165#comment-16009165 ] Tejas Patil commented on SPARK-19122: - Thanks for confirming. I have added it in the jira description

[jira] [Updated] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-05-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-19122: Description: `table1` and `table2` are sorted and bucketed on columns `j` and `k` (in respective

[jira] [Commented] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-05-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008266#comment-16008266 ] Tejas Patil commented on SPARK-19122: - [~cloud_fan]: - The test case in the [associated PR

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-04-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989560#comment-15989560 ] Tejas Patil commented on SPARK-19256: - [~cloud_fan], [~sameerag] : I was looking at trunk and

[jira] [Created] (SPARK-20487) `HiveTableScan` node is quite verbose in explained plan

2017-04-26 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-20487: --- Summary: `HiveTableScan` node is quite verbose in explained plan Key: SPARK-20487 URL: https://issues.apache.org/jira/browse/SPARK-20487 Project: Spark Issue

[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen

2017-04-12 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966300#comment-15966300 ] Tejas Patil commented on SPARK-20184: - Out of curiosity, I tried out a query with ~20 columns

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-04-05 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956353#comment-15956353 ] Tejas Patil commented on SPARK-17495: - [~dricard] : this does not intend to break any existing

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-03-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907534#comment-15907534 ] Tejas Patil commented on SPARK-17495: - Yes. Thats a better way. At this point all datatypes are

[jira] [Reopened] (SPARK-17495) Hive hash implementation

2017-03-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reopened SPARK-17495: - > Hive hash implementation > > > Key: SPARK-17495 >

[jira] [Created] (SPARK-19843) UTF8String => (int / long) conversion expensive for invalid inputs

2017-03-06 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-19843: --- Summary: UTF8String => (int / long) conversion expensive for invalid inputs Key: SPARK-19843 URL: https://issues.apache.org/jira/browse/SPARK-19843 Project: Spark

[jira] [Reopened] (SPARK-17495) Hive hash implementation

2017-03-06 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reopened SPARK-17495: - Re-opening. This is not done yet as there are time related datatypes that need to be handled and

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-02-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889645#comment-15889645 ] Tejas Patil commented on SPARK-17495: - >> Is it possible to figure out the hashing function based on

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-02-28 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889090#comment-15889090 ] Tejas Patil commented on SPARK-17495: - [~rxin]: >> 1. On the read side we shouldn't care which hash

[jira] [Comment Edited] (SPARK-17495) Hive hash implementation

2017-02-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883203#comment-15883203 ] Tejas Patil edited comment on SPARK-17495 at 2/24/17 5:57 PM: -- [~rxin] : No

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-02-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883203#comment-15883203 ] Tejas Patil commented on SPARK-17495: - [~rxin] : No probs. Any opinion about my comment from

[jira] [Reopened] (SPARK-17495) Hive hash implementation

2017-02-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reopened SPARK-17495: - Re-opening. This is not done yet as there are few datatypes that need to be handled and making

[jira] [Commented] (SPARK-17495) Hive hash implementation

2017-02-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882161#comment-15882161 ] Tejas Patil commented on SPARK-17495: - I am looking into using hive-hash when `hash()` in called in a

[jira] [Commented] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869314#comment-15869314 ] Tejas Patil commented on SPARK-19326: - > You might be able to just write an `if` case that checks

[jira] [Commented] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869112#comment-15869112 ] Tejas Patil commented on SPARK-19326: - Thanks for the info !! [~andrewor14] / [~kayousterhout] : I

[jira] [Created] (SPARK-19618) Inconsistency wrt max. buckets allowed from Dataframe API vs SQL

2017-02-15 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-19618: --- Summary: Inconsistency wrt max. buckets allowed from Dataframe API vs SQL Key: SPARK-19618 URL: https://issues.apache.org/jira/browse/SPARK-19618 Project: Spark

[jira] [Commented] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868889#comment-15868889 ] Tejas Patil commented on SPARK-19326: - [~andrewor14] : Ping !! > Speculated task attempts do not get

[jira] [Updated] (SPARK-19587) Disallow when sort columns are part of partitioning columns

2017-02-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-19587: Description: This came up in discussion at

[jira] [Created] (SPARK-19587) Disallow when sort columns are part of partitioning columns

2017-02-13 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-19587: --- Summary: Disallow when sort columns are part of partitioning columns Key: SPARK-19587 URL: https://issues.apache.org/jira/browse/SPARK-19587 Project: Spark

[jira] [Commented] (SPARK-19493) Remove Java 7 support

2017-02-13 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863244#comment-15863244 ] Tejas Patil commented on SPARK-19493: - +1 for removing Java 7 > Remove Java 7 support >

[jira] [Commented] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851006#comment-15851006 ] Tejas Patil commented on SPARK-19326: - [~kayousterhout] : I have updated the configs in the repro

[jira] [Updated] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-19326: Description: Speculated copies of tasks do not get launched in some cases. Examples: - All the

[jira] [Updated] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-19326: Description: Speculated copies of tasks do not get launched in some cases. Examples: - All the

[jira] [Commented] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-02-02 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850214#comment-15850214 ] Tejas Patil commented on SPARK-19326: - ping [~kayousterhout], [~irashid] [~vanzin] !! > Speculated

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837302#comment-15837302 ] Tejas Patil commented on SPARK-19256: - BTW: In its current state, Spark writes data to hive bucketed

[jira] [Commented] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837300#comment-15837300 ] Tejas Patil commented on SPARK-19122: - [~hvanhovell] : ping !! If you are busy, can you suggest

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-01-24 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15837294#comment-15837294 ] Tejas Patil commented on SPARK-19256: - [~cloud_fan] [~hvanhovell] : Are you guys ok with the proposal

[jira] [Updated] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-19326: Description: Speculated copies of tasks do not get launched in some cases. Examples: - All the

[jira] [Commented] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-01-21 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833270#comment-15833270 ] Tejas Patil commented on SPARK-19326: - cc [~kayousterhout], [~irashid] [~vanzin]

[jira] [Created] (SPARK-19326) Speculated task attempts do not get launched in few scenarios

2017-01-21 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-19326: --- Summary: Speculated task attempts do not get launched in few scenarios Key: SPARK-19326 URL: https://issues.apache.org/jira/browse/SPARK-19326 Project: Spark

[jira] [Updated] (SPARK-17654) Propagate bucketing information for Hive tables to / from Catalog

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17654: Issue Type: Sub-task (was: Bug) Parent: SPARK-19256 > Propagate bucketing information for

[jira] [Updated] (SPARK-17487) Configurable bucketing info extraction

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17487: Issue Type: Sub-task (was: Improvement) Parent: SPARK-19256 > Configurable bucketing info

[jira] [Updated] (SPARK-17570) Avoid Hash and Exchange in Sort Merge join if bucketing factor is multiple for tables

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17570: Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-18245) > Avoid Hash and

[jira] [Updated] (SPARK-17487) Configurable bucketing info extraction

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17487: Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-18245) > Configurable

[jira] [Updated] (SPARK-17570) Avoid Hash and Exchange in Sort Merge join if bucketing factor is multiple for tables

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17570: Issue Type: Sub-task (was: Improvement) Parent: SPARK-19256 > Avoid Hash and Exchange in

[jira] [Updated] (SPARK-17729) Enable creating hive bucketed tables

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17729: Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-18245) > Enable creating

[jira] [Updated] (SPARK-17729) Enable creating hive bucketed tables

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17729: Issue Type: Sub-task (was: Improvement) Parent: SPARK-19256 > Enable creating hive

[jira] [Updated] (SPARK-17495) Hive hash implementation

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17495: Issue Type: Sub-task (was: Improvement) Parent: SPARK-19256 > Hive hash implementation >

[jira] [Updated] (SPARK-17495) Hive hash implementation

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-17495: Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-18245) > Hive hash

[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-01-16 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15825494#comment-15825494 ] Tejas Patil commented on SPARK-19256: - cc [~cloud_fan] [~hvanhovell] for comments over the proposal

[jira] [Created] (SPARK-19256) Hive bucketing support

2017-01-16 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-19256: --- Summary: Hive bucketing support Key: SPARK-19256 URL: https://issues.apache.org/jira/browse/SPARK-19256 Project: Spark Issue Type: Umbrella

[jira] [Comment Edited] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2017-01-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812508#comment-15812508 ] Tejas Patil edited comment on SPARK-13450 at 1/9/17 6:48 PM: - I have seen

[jira] [Updated] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2017-01-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-13450: Attachment: heap-dump-analysis.png > SortMergeJoin will OOM when join rows have lot of same keys >

[jira] [Reopened] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2017-01-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil reopened SPARK-13450: - I have seen this problem a couple times in prod while trying out jobs over Spark. There have been

[jira] [Updated] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2017-01-09 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-13450: Affects Version/s: 2.0.2 2.1.0 > SortMergeJoin will OOM when join rows have

[jira] [Updated] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-01-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-19122: Description: `table1` and `table2` are sorted and bucketed on columns `j` and `k` (in respective

[jira] [Commented] (SPARK-18067) SortMergeJoin adds shuffle if join predicates have non partitioned columns

2017-01-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15808507#comment-15808507 ] Tejas Patil commented on SPARK-18067: - [~hvanhovell] : * You are right about the data distribution.

[jira] [Comment Edited] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-01-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15808485#comment-15808485 ] Tejas Patil edited comment on SPARK-19122 at 1/8/17 1:44 AM: - [~hvanhovell] :

[jira] [Commented] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-01-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15808485#comment-15808485 ] Tejas Patil commented on SPARK-19122: - [~hvanhovell] : In a broader level, I see that when nodes for

[jira] [Commented] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-01-07 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15808479#comment-15808479 ] Tejas Patil commented on SPARK-19122: - When a `SortMergeJoinExec` node is created, the join keys in

[jira] [Created] (SPARK-19122) Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order

2017-01-07 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-19122: --- Summary: Unnecessary shuffle+sort added if join predicates ordering differ from bucketing and sorting order Key: SPARK-19122 URL: https://issues.apache.org/jira/browse/SPARK-19122

[jira] [Commented] (SPARK-15867) TABLESAMPLE BUCKET semantics don't match Hive's

2016-11-01 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627120#comment-15627120 ] Tejas Patil commented on SPARK-15867: - Yes. I am interested in this support. > TABLESAMPLE BUCKET

[jira] [Updated] (SPARK-18067) SortMergeJoin adds shuffle if join predicates have non partitioned columns

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-18067: Summary: SortMergeJoin adds shuffle if join predicates have non partitioned columns (was: Adding

[jira] [Comment Edited] (SPARK-18067) Adding filter after SortMergeJoin creates unnecessary shuffle

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600824#comment-15600824 ] Tejas Patil edited comment on SPARK-18067 at 10/24/16 3:35 AM: ---

[jira] [Commented] (SPARK-18067) Adding filter after SortMergeJoin creates unnecessary shuffle

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600824#comment-15600824 ] Tejas Patil commented on SPARK-18067: - [~hvanhovell] : Tagging you since you have context of this

[jira] [Commented] (SPARK-18067) Adding filter after SortMergeJoin creates unnecessary shuffle

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600820#comment-15600820 ] Tejas Patil commented on SPARK-18067: - The predicate `value1 === value2` is pushed down to the Join

[jira] [Comment Edited] (SPARK-18067) Adding filter after SortMergeJoin creates unnecessary shuffle

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600788#comment-15600788 ] Tejas Patil edited comment on SPARK-18067 at 10/24/16 3:12 AM: --- You could

[jira] [Commented] (SPARK-18067) Adding filter after SortMergeJoin creates unnecessary shuffle

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15600788#comment-15600788 ] Tejas Patil commented on SPARK-18067: - You could do : {norformat} val joinedOutput =

[jira] [Commented] (SPARK-17495) Hive hash implementation

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599977#comment-15599977 ] Tejas Patil commented on SPARK-17495: - [~rxin] : Sorry about that. In my original PR I intentionally

[jira] [Commented] (SPARK-17495) Hive hash implementation

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599965#comment-15599965 ] Tejas Patil commented on SPARK-17495: - [~hvanhovell] : There were two datatypes that I had added TODO

[jira] [Updated] (SPARK-15453) FileSourceScanExec to extract `outputOrdering` information

2016-10-23 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-15453: Summary: FileSourceScanExec to extract `outputOrdering` information (was: Improve join planning

[jira] [Updated] (SPARK-18035) Introduce performant and memory efficient APIs to create ArrayBasedMapData

2016-10-22 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-18035: Summary: Introduce performant and memory efficient APIs to create ArrayBasedMapData (was:

[jira] [Commented] (SPARK-18038) Move output partitioning definition from UnaryNodeExec to its children

2016-10-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593478#comment-15593478 ] Tejas Patil commented on SPARK-18038: - Not sure if this deserves a jira but created one. This is a

[jira] [Created] (SPARK-18038) Move output partitioning definition from UnaryNodeExec to its children

2016-10-20 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-18038: --- Summary: Move output partitioning definition from UnaryNodeExec to its children Key: SPARK-18038 URL: https://issues.apache.org/jira/browse/SPARK-18038 Project: Spark

[jira] [Created] (SPARK-18035) Unwrapping java maps in HiveInspectors allocates unnecessary buffer

2016-10-20 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-18035: --- Summary: Unwrapping java maps in HiveInspectors allocates unnecessary buffer Key: SPARK-18035 URL: https://issues.apache.org/jira/browse/SPARK-18035 Project: Spark

[jira] [Updated] (SPARK-18035) Unwrapping java maps in HiveInspectors allocates unnecessary buffer

2016-10-20 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tejas Patil updated SPARK-18035: Description: In HiveInspectors, I saw that converting Java map to Spark's `ArrayBasedMapData`

[jira] [Commented] (SPARK-17954) FetchFailedException executor cannot connect to another worker executor

2016-10-15 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15579262#comment-15579262 ] Tejas Patil commented on SPARK-17954: - I agree with [~srowen]'s comment about this being more of a

[jira] [Commented] (SPARK-17487) Configurable bucketing info extraction

2016-10-10 Thread Tejas Patil (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15563756#comment-15563756 ] Tejas Patil commented on SPARK-17487: - [~rxin] : Since Spark native tables and hive tables follow a

[jira] [Created] (SPARK-17741) Grammar to parse top level and nested data fields separately

2016-09-29 Thread Tejas Patil (JIRA)
Tejas Patil created SPARK-17741: --- Summary: Grammar to parse top level and nested data fields separately Key: SPARK-17741 URL: https://issues.apache.org/jira/browse/SPARK-17741 Project: Spark

  1   2   >