[jira] [Created] (SPARK-28156) Join plan sometimes does not use cached query

2019-06-24 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-28156: - Summary: Join plan sometimes does not use cached query Key: SPARK-28156 URL: https://issues.apache.org/jira/browse/SPARK-28156 Project: Spark Issue Type:

[jira] [Commented] (SPARK-27466) LEAD function with 'ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING' causes exception in Spark

2019-05-06 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834258#comment-16834258 ] Bruce Robbins commented on SPARK-27466: --- This _seems_ to be intentional, according to SPARK-8641

[jira] [Updated] (SPARK-27498) Built-in parquet code path (convertMetastoreParquet=true) does not respect hive.enforce.bucketing

2019-04-19 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27498: -- Summary: Built-in parquet code path (convertMetastoreParquet=true) does not respect

[jira] [Updated] (SPARK-27497) Spark wipes out bucket spec in metastore when updating table stats

2019-04-19 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27497: -- Description: The bucket spec gets wiped out after Spark writes to a Hive-bucketed table that

[jira] [Updated] (SPARK-27498) Built-in parquet code path does not respect hive.enforce.bucketing

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27498: -- Description: _Caveat: I can see how this could be intentional if Spark believes that the

[jira] [Updated] (SPARK-27498) Built-in parquet code path does not respect hive.enforce.bucketing

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27498: -- Description: _Caveat: I can see how this could be intentional if Spark believes that the

[jira] [Updated] (SPARK-27498) Built-in parquet code path does not respect hive.enforce.bucketing

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27498: -- Description: _Caveat: I can see how this could be intentional if Spark believes that the

[jira] [Updated] (SPARK-27497) Spark wipes out bucket spec in metastore when updating table stats

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27497: -- Description: The bucket spec gets wiped out after Spark writes to a Hive-bucketed table that

[jira] [Updated] (SPARK-27497) Spark wipes out bucket spec in metastore when updating table stats

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27497: -- Description: The bucket spec gets wiped out after Spark writes to a Hive-bucketed table that

[jira] [Created] (SPARK-27498) Built-in parquet code path does not respect hive.enforce.bucketing

2019-04-17 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-27498: - Summary: Built-in parquet code path does not respect hive.enforce.bucketing Key: SPARK-27498 URL: https://issues.apache.org/jira/browse/SPARK-27498 Project: Spark

[jira] [Updated] (SPARK-27497) Spark wipes out bucket spec in metastore when updating table stats

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27497: -- Description: The bucket spec gets wiped out after Spark writes to a Hive-bucketed table that

[jira] [Updated] (SPARK-27497) Spark wipes out bucket spec in metastore when updating table stats

2019-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-27497: -- Description: The bucket spec gets wiped out after Spark writes to a Hive-bucketed table that

[jira] [Created] (SPARK-27497) Spark wipes out bucket spec in metastore when updating table stats

2019-04-17 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-27497: - Summary: Spark wipes out bucket spec in metastore when updating table stats Key: SPARK-27497 URL: https://issues.apache.org/jira/browse/SPARK-27497 Project: Spark

[jira] [Resolved] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188

2019-03-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins resolved SPARK-26990. --- Resolution: Fixed > Difference in handling of mixed-case partition column names after

[jira] [Updated] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188

2019-03-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26990: -- Fix Version/s: 2.4.1 > Difference in handling of mixed-case partition column names after

[jira] [Updated] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188

2019-02-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26990: -- Fix Version/s: 3.0.0 > Difference in handling of mixed-case partition column names after

[jira] [Updated] (SPARK-26990) Difference in handling of mixed-case partition column names after SPARK-26188

2019-02-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26990: -- Summary: Difference in handling of mixed-case partition column names after SPARK-26188 (was:

[jira] [Created] (SPARK-26990) Difference in handling of mixed-case partition columns after SPARK-26188

2019-02-25 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26990: - Summary: Difference in handling of mixed-case partition columns after SPARK-26188 Key: SPARK-26990 URL: https://issues.apache.org/jira/browse/SPARK-26990 Project:

[jira] [Updated] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

2019-02-12 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26851: -- Description: In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to

[jira] [Commented] (SPARK-26804) Spark sql carries newline char from last csv column when imported

2019-02-09 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764260#comment-16764260 ] Bruce Robbins commented on SPARK-26804: --- [~hipruthvi] It seems that neither 2.3 nor 2.4 are

[jira] [Updated] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

2019-02-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26851: -- Description: In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to

[jira] [Updated] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

2019-02-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26851: -- Labels: (was: con) > CachedRDDBuilder only partially implements double-checked locking >

[jira] [Updated] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

2019-02-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26851: -- Labels: con (was: ) > CachedRDDBuilder only partially implements double-checked locking >

[jira] [Comment Edited] (SPARK-26804) Spark sql carries newline char from last csv column when imported

2019-02-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763728#comment-16763728 ] Bruce Robbins edited comment on SPARK-26804 at 2/8/19 10:13 PM:

[jira] [Created] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

2019-02-08 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26851: - Summary: CachedRDDBuilder only partially implements double-checked locking Key: SPARK-26851 URL: https://issues.apache.org/jira/browse/SPARK-26851 Project: Spark

[jira] [Commented] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking

2019-02-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763818#comment-16763818 ] Bruce Robbins commented on SPARK-26851: --- [~maropu] [~cloud_fan] I will let this Jira marinate for

[jira] [Commented] (SPARK-26804) Spark sql carries newline char from last csv column when imported

2019-02-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763728#comment-16763728 ] Bruce Robbins commented on SPARK-26804: --- v2.4.0: Fails as described Tip of branch-2.4: Fails as

[jira] [Comment Edited] (SPARK-26708) Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

2019-02-05 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761289#comment-16761289 ] Bruce Robbins edited comment on SPARK-26708 at 2/6/19 12:41 AM: How does

[jira] [Commented] (SPARK-26708) Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

2019-02-05 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761289#comment-16761289 ] Bruce Robbins commented on SPARK-26708: --- How does one hit this issue? > Incorrect result caused

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752304#comment-16752304 ] Bruce Robbins commented on SPARK-26711: --- [~hyukjin.kwon] Ok, that worked. I had in my mind a more

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-24 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751655#comment-16751655 ] Bruce Robbins commented on SPARK-26711: --- Re: 7 minutes vs. 50 seconds: Looking at the code, it

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750741#comment-16750741 ] Bruce Robbins commented on SPARK-26711: --- [~hyukjin.kwon] inferTimestamp=: ~13 min

[jira] [Updated] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26711: -- Description: I noticed that the first benchmark/case of JSONBenchmark ("JSON schema

[jira] [Commented] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750704#comment-16750704 ] Bruce Robbins commented on SPARK-26711: --- ping [~maxgekk] [~hyukjin.kwon] > JSON Schema inference

[jira] [Created] (SPARK-26711) JSON Schema inference takes 15 times longer

2019-01-23 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26711: - Summary: JSON Schema inference takes 15 times longer Key: SPARK-26711 URL: https://issues.apache.org/jira/browse/SPARK-26711 Project: Spark Issue Type:

[jira] [Created] (SPARK-26707) Insert into table with single struct column fails

2019-01-23 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26707: - Summary: Insert into table with single struct column fails Key: SPARK-26707 URL: https://issues.apache.org/jira/browse/SPARK-26707 Project: Spark Issue

[jira] [Updated] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-22 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26680: -- Affects Version/s: 2.3.2 > StackOverflowError if Stream passed to groupBy >

[jira] [Updated] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-22 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26680: -- Affects Version/s: 2.4.0 > StackOverflowError if Stream passed to groupBy >

[jira] [Updated] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26680: -- Description: This Java code results in a StackOverflowError: {code:java} List groupByCols =

[jira] [Created] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-21 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26680: - Summary: StackOverflowError if Stream passed to groupBy Key: SPARK-26680 URL: https://issues.apache.org/jira/browse/SPARK-26680 Project: Spark Issue Type:

[jira] [Commented] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748277#comment-16748277 ] Bruce Robbins commented on SPARK-26680: --- I will make a PR for this, but I would like to hear any

[jira] [Created] (SPARK-26496) Test "locality preferences of StateStoreAwareZippedRDD" frequently fails on High Sierra

2018-12-28 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26496: - Summary: Test "locality preferences of StateStoreAwareZippedRDD" frequently fails on High Sierra Key: SPARK-26496 URL: https://issues.apache.org/jira/browse/SPARK-26496

[jira] [Commented] (SPARK-26450) Map of schema is built too frequently in some wide queries

2018-12-27 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729647#comment-16729647 ] Bruce Robbins commented on SPARK-26450: --- I can attempt a patch later today. > Map of schema is

[jira] [Updated] (SPARK-26378) Queries of wide CSV/JSON data slowed after SPARK-26151

2018-12-26 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26378: -- Description: A recent change significantly slowed the queries of wide CSV tables. For

[jira] [Updated] (SPARK-26378) Queries of wide CSV/JSON data slowed after SPARK-26151

2018-12-26 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26378: -- Summary: Queries of wide CSV/JSON data slowed after SPARK-26151 (was: Queries of wide CSV

[jira] [Created] (SPARK-26450) Map of schema is built too frequently in some wide queries

2018-12-26 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26450: - Summary: Map of schema is built too frequently in some wide queries Key: SPARK-26450 URL: https://issues.apache.org/jira/browse/SPARK-26450 Project: Spark

[jira] [Commented] (SPARK-26378) Queries of wide CSV data slowed after SPARK-26151

2018-12-15 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722311#comment-16722311 ] Bruce Robbins commented on SPARK-26378: --- I will try to submit a PR tomorrow. > Queries of wide

[jira] [Created] (SPARK-26378) Queries of wide CSV data slowed after SPARK-26151

2018-12-15 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26378: - Summary: Queries of wide CSV data slowed after SPARK-26151 Key: SPARK-26378 URL: https://issues.apache.org/jira/browse/SPARK-26378 Project: Spark Issue

[jira] [Updated] (SPARK-26372) CSV parsing uses previous good value for bad input field

2018-12-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26372: -- Summary: CSV parsing uses previous good value for bad input field (was: CSV Parsing uses

[jira] [Commented] (SPARK-26372) CSV parsing uses previous good value for bad input field

2018-12-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721593#comment-16721593 ] Bruce Robbins commented on SPARK-26372: --- I can prep a PR, unless someone thinks this needs a

[jira] [Created] (SPARK-26372) CSV Parsing uses previous good value for bad input field

2018-12-14 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26372: - Summary: CSV Parsing uses previous good value for bad input field Key: SPARK-26372 URL: https://issues.apache.org/jira/browse/SPARK-26372 Project: Spark

[jira] [Commented] (SPARK-24758) Create table wants to use /user/hive/warehouse in clean clone

2018-10-31 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670665#comment-16670665 ] Bruce Robbins commented on SPARK-24758: --- This issue was introduced by commit 

[jira] [Comment Edited] (SPARK-25643) Performance issues querying wide rows

2018-10-15 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650866#comment-16650866 ] Bruce Robbins edited comment on SPARK-25643 at 10/15/18 10:08 PM: --

[jira] [Commented] (SPARK-25643) Performance issues querying wide rows

2018-10-15 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650866#comment-16650866 ] Bruce Robbins commented on SPARK-25643: --- [~viirya] Yes, in the case where I said "predicate push

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-10-04 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638803#comment-16638803 ] Bruce Robbins commented on SPARK-25164: --- [~Tagar] I've opened SPARK-25643 to keep track of the

[jira] [Created] (SPARK-25643) Performance issues querying wide rows

2018-10-04 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-25643: - Summary: Performance issues querying wide rows Key: SPARK-25643 URL: https://issues.apache.org/jira/browse/SPARK-25643 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-09-19 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621452#comment-16621452 ] Bruce Robbins commented on SPARK-23715: --- Hi [~rxin], Thanks for following up with me. This is a

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-19 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621125#comment-16621125 ] Bruce Robbins commented on SPARK-25164: --- {quote}I am thinking if it's feasible to lazily realize

[jira] [Commented] (SPARK-25454) Division between operands with negative scale can cause precision loss

2018-09-18 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619911#comment-16619911 ] Bruce Robbins commented on SPARK-25454: --- Thanks [~mgaido], OK, so the way I understand it -

[jira] [Commented] (SPARK-22036) BigDecimal multiplication sometimes returns null

2018-09-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16618104#comment-16618104 ] Bruce Robbins commented on SPARK-22036: --- [~mgaido] In this change, you modified how precision and

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611331#comment-16611331 ] Bruce Robbins commented on SPARK-25164: --- Thanks [~Tagar] for the feedback. I assume the 44%

[jira] [Commented] (SPARK-23243) Shuffle+Repartition on an RDD could lead to incorrect answers

2018-09-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608169#comment-16608169 ] Bruce Robbins commented on SPARK-23243: --- BTW, I took a stab at back porting it to 2.2, but to get

[jira] [Commented] (SPARK-23243) Shuffle+Repartition on an RDD could lead to incorrect answers

2018-09-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608167#comment-16608167 ] Bruce Robbins commented on SPARK-23243: --- Any plans to back port this to 2.2? >

[jira] [Commented] (SPARK-24316) Spark sql queries stall for column width more than 6k for parquet based table

2018-09-04 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603453#comment-16603453 ] Bruce Robbins commented on SPARK-24316: --- This is likely SPARK-25164. > Spark sql queries stall

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590847#comment-16590847 ] Bruce Robbins commented on SPARK-23207: --- Will we be back-porting this to 2.1, or does the 18 month

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588168#comment-16588168 ] Bruce Robbins commented on SPARK-25164: --- [~viirya] Sure. I will try to get something up by tonight

[jira] [Updated] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-25164: -- Description: {{VectorizedParquetRecordReader.initializeInternal}} loops through each column,

[jira] [Created] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-20 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-25164: - Summary: Parquet reader builds entire list of columns once for each column Key: SPARK-25164 URL: https://issues.apache.org/jira/browse/SPARK-25164 Project: Spark

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-10 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577041#comment-16577041 ] Bruce Robbins commented on SPARK-23207: --- I can help out here. I will make a PR for branch-2.2 in

[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555998#comment-16555998 ] Bruce Robbins commented on SPARK-24914: --- [~irashid] {quote} given HIVE-20079, can we also have a

[jira] [Updated] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24914: -- Description: When determining whether to do a broadcast join, Spark estimates the size of

[jira] [Updated] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-24 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24914: -- Description: When determining whether to do a broadcast join, Spark estimates the size of

[jira] [Created] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-24 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24914: - Summary: totalSize is not a good estimate for broadcast joins Key: SPARK-24914 URL: https://issues.apache.org/jira/browse/SPARK-24914 Project: Spark Issue

[jira] [Updated] (SPARK-24912) Broadcast join OutOfMemory stack trace obscures actual cause of OOM

2018-07-24 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24912: -- Priority: Minor (was: Major) > Broadcast join OutOfMemory stack trace obscures actual cause

[jira] [Created] (SPARK-24912) Broadcast join OutOfMemory stack trace obscures actual cause of OOM

2018-07-24 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24912: - Summary: Broadcast join OutOfMemory stack trace obscures actual cause of OOM Key: SPARK-24912 URL: https://issues.apache.org/jira/browse/SPARK-24912 Project: Spark

[jira] [Updated] (SPARK-24814) Relationship between catalog and datasources

2018-07-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24814: -- Description: This is somewhat related, though not identical to, [~rdblue]'s SPIP on

[jira] [Commented] (SPARK-24814) Relationship between catalog and datasources

2018-07-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553336#comment-16553336 ] Bruce Robbins commented on SPARK-24814: --- [~rdblue] Your parquet example is a compelling one. If

[jira] [Updated] (SPARK-24814) Relationship between catalog and datasources

2018-07-18 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24814: -- Description: This is somewhat related, though not identical to, [~rdblue]'s SPIP on

[jira] [Created] (SPARK-24814) Relationship between catalog and datasources

2018-07-15 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24814: - Summary: Relationship between catalog and datasources Key: SPARK-24814 URL: https://issues.apache.org/jira/browse/SPARK-24814 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-23629) Building streaming-kafka-0-8-assembly or streaming-flume-assembly adds incompatible jline jar to assembly

2018-07-07 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins resolved SPARK-23629. --- Resolution: Cannot Reproduce > Building streaming-kafka-0-8-assembly or

[jira] [Commented] (SPARK-23629) Building streaming-kafka-0-8-assembly or streaming-flume-assembly adds incompatible jline jar to assembly

2018-07-07 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535944#comment-16535944 ] Bruce Robbins commented on SPARK-23629: --- Whatever was causing this, it is now gone away. Problem

[jira] [Created] (SPARK-24758) Create table wants to use /user/hive/warehouse in clean clone

2018-07-07 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24758: - Summary: Create table wants to use /user/hive/warehouse in clean clone Key: SPARK-24758 URL: https://issues.apache.org/jira/browse/SPARK-24758 Project: Spark

[jira] [Created] (SPARK-24633) arrays_zip function's code generator splits input processing incorrectly

2018-06-22 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24633: - Summary: arrays_zip function's code generator splits input processing incorrectly Key: SPARK-24633 URL: https://issues.apache.org/jira/browse/SPARK-24633 Project:

[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1, map2, ..., mapN) → map

2018-05-31 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496599#comment-16496599 ] Bruce Robbins commented on SPARK-23936: --- tl;dr version: Spark's Map type allows duplicates.

[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>

2018-05-04 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464245#comment-16464245 ] Bruce Robbins commented on SPARK-23936: --- [~ueshin] I have a question about map_concat's behavior

[jira] [Commented] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460532#comment-16460532 ] Bruce Robbins commented on SPARK-24142: --- [~maropu] I don't seem to have the Jira authority (or

[jira] [Commented] (SPARK-24119) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460527#comment-16460527 ] Bruce Robbins commented on SPARK-24119: --- [~maropu] Ahh... we crossed paths and I opened a second

[jira] [Resolved] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins resolved SPARK-24142. --- Resolution: Duplicate > Add interpreted execution to SortPrefix expression >

[jira] [Commented] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460525#comment-16460525 ] Bruce Robbins commented on SPARK-24142: --- I opened another Jira on this a few days ago, but it was

[jira] [Updated] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24142: -- Affects Version/s: (was: 2.3.0) 2.4.0 > Add interpreted execution

[jira] [Created] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24142: - Summary: Add interpreted execution to SortPrefix expression Key: SPARK-24142 URL: https://issues.apache.org/jira/browse/SPARK-24142 Project: Spark Issue

[jira] [Created] (SPARK-24119) Add interpreted execution to SortPrefix expression

2018-04-29 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24119: - Summary: Add interpreted execution to SortPrefix expression Key: SPARK-24119 URL: https://issues.apache.org/jira/browse/SPARK-24119 Project: Spark Issue

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457896#comment-16457896 ] Bruce Robbins commented on SPARK-23715: --- [~hyukjin.kwon] Yes, I also built from sources and I could

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457892#comment-16457892 ] Bruce Robbins commented on SPARK-23715: --- Still, I filed an Jira with Hive so they won't release

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457874#comment-16457874 ] Bruce Robbins commented on SPARK-23715: --- I might understand what's going on with Hive. In the

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457754#comment-16457754 ] Bruce Robbins commented on SPARK-23715: --- I just downloaded and installed hive-2.3.3 (3 April 2018)

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457674#comment-16457674 ] Bruce Robbins commented on SPARK-23715: --- Could be this: HIVE-14412 > from_utc_timestamp returns

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457665#comment-16457665 ] Bruce Robbins commented on SPARK-23715: --- {quote}Which version did you use?{quote} The jars all say

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457657#comment-16457657 ] Bruce Robbins commented on SPARK-23715: --- Maybe a configuration setting or difference between

[jira] [Commented] (SPARK-23580) Interpreted mode fallback should be implemented for all expressions & projections

2018-04-26 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454918#comment-16454918 ] Bruce Robbins commented on SPARK-23580: --- Should SortPrefix also get this treatment? > Interpreted

<    1   2   3   4   5   >