[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>

2018-05-04 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464245#comment-16464245 ] Bruce Robbins commented on SPARK-23936: --- [~ueshin] I have a question about map_concat's behavior

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457674#comment-16457674 ] Bruce Robbins commented on SPARK-23715: --- Could be this: HIVE-14412 > from_utc_timestamp returns

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457657#comment-16457657 ] Bruce Robbins commented on SPARK-23715: --- Maybe a configuration setting or difference between

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457665#comment-16457665 ] Bruce Robbins commented on SPARK-23715: --- {quote}Which version did you use?{quote} The jars all say

[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1, map2, ..., mapN) → map

2018-05-31 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496599#comment-16496599 ] Bruce Robbins commented on SPARK-23936: --- tl;dr version: Spark's Map type allows duplicates.

[jira] [Created] (SPARK-24633) arrays_zip function's code generator splits input processing incorrectly

2018-06-22 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24633: - Summary: arrays_zip function's code generator splits input processing incorrectly Key: SPARK-24633 URL: https://issues.apache.org/jira/browse/SPARK-24633 Project:

[jira] [Commented] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460525#comment-16460525 ] Bruce Robbins commented on SPARK-24142: --- I opened another Jira on this a few days ago, but it was

[jira] [Resolved] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins resolved SPARK-24142. --- Resolution: Duplicate > Add interpreted execution to SortPrefix expression >

[jira] [Updated] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24142: -- Affects Version/s: (was: 2.3.0) 2.4.0 > Add interpreted execution

[jira] [Commented] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460532#comment-16460532 ] Bruce Robbins commented on SPARK-24142: --- [~maropu] I don't seem to have the Jira authority (or

[jira] [Created] (SPARK-24142) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24142: - Summary: Add interpreted execution to SortPrefix expression Key: SPARK-24142 URL: https://issues.apache.org/jira/browse/SPARK-24142 Project: Spark Issue

[jira] [Commented] (SPARK-24119) Add interpreted execution to SortPrefix expression

2018-05-01 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460527#comment-16460527 ] Bruce Robbins commented on SPARK-24119: --- [~maropu] Ahh... we crossed paths and I opened a second

[jira] [Commented] (SPARK-23580) Interpreted mode fallback should be implemented for all expressions & projections

2018-04-26 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16454918#comment-16454918 ] Bruce Robbins commented on SPARK-23580: --- Should SortPrefix also get this treatment? > Interpreted

[jira] [Created] (SPARK-24119) Add interpreted execution to SortPrefix expression

2018-04-29 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24119: - Summary: Add interpreted execution to SortPrefix expression Key: SPARK-24119 URL: https://issues.apache.org/jira/browse/SPARK-24119 Project: Spark Issue

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457754#comment-16457754 ] Bruce Robbins commented on SPARK-23715: --- I just downloaded and installed hive-2.3.3 (3 April 2018)

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457892#comment-16457892 ] Bruce Robbins commented on SPARK-23715: --- Still, I filed an Jira with Hive so they won't release

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457896#comment-16457896 ] Bruce Robbins commented on SPARK-23715: --- [~hyukjin.kwon] Yes, I also built from sources and I could

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457874#comment-16457874 ] Bruce Robbins commented on SPARK-23715: --- I might understand what's going on with Hive. In the

[jira] [Created] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

2018-01-26 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23240: - Summary: PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout Key: SPARK-23240 URL: https://issues.apache.org/jira/browse/SPARK-23240

[jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

2018-01-26 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341609#comment-16341609 ] Bruce Robbins commented on SPARK-23240: --- I will be making a pull request. > PythonWorkerFactory

[jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

2018-01-28 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342688#comment-16342688 ] Bruce Robbins commented on SPARK-23240: --- Hi [~hyukjin.kwon], I am not sure this update covers the

[jira] [Created] (SPARK-23251) ClassNotFoundException: scala.Any when there's a missing implicit Map encoder

2018-01-27 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23251: - Summary: ClassNotFoundException: scala.Any when there's a missing implicit Map encoder Key: SPARK-23251 URL: https://issues.apache.org/jira/browse/SPARK-23251

[jira] [Commented] (SPARK-23251) ClassNotFoundException: scala.Any when there's a missing implicit Map encoder

2018-01-30 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346050#comment-16346050 ] Bruce Robbins commented on SPARK-23251: --- I commented out the following line in 

[jira] [Commented] (SPARK-23251) ClassNotFoundException: scala.Any when there's a missing implicit Map encoder

2018-01-30 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346246#comment-16346246 ] Bruce Robbins commented on SPARK-23251: --- [~srowen] This also occurs with compiled apps submitted

[jira] [Comment Edited] (SPARK-23251) ClassNotFoundException: scala.Any when there's a missing implicit Map encoder

2018-01-30 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346246#comment-16346246 ] Bruce Robbins edited comment on SPARK-23251 at 1/31/18 4:35 AM: [~srowen] 

[jira] [Commented] (SPARK-23410) Unable to read jsons in charset different from UTF-8

2018-02-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364788#comment-16364788 ] Bruce Robbins commented on SPARK-23410: --- I am probably misunderstanding the issue, but I couldn't

[jira] [Commented] (SPARK-23410) Unable to read jsons in charset different from UTF-8

2018-02-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364916#comment-16364916 ] Bruce Robbins commented on SPARK-23410: --- On Spark 2.2.1, I got the same result as you. But with

[jira] [Commented] (SPARK-23410) Unable to read jsons in charset different from UTF-8

2018-02-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364929#comment-16364929 ] Bruce Robbins commented on SPARK-23410: --- bq. I am working on a fix, just in case Oh, OK, this one

[jira] [Comment Edited] (SPARK-23410) Unable to read jsons in charset different from UTF-8

2018-02-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364929#comment-16364929 ] Bruce Robbins edited comment on SPARK-23410 at 2/14/18 11:17 PM: -

[jira] [Commented] (SPARK-23410) Unable to read jsons in charset different from UTF-8

2018-02-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364866#comment-16364866 ] Bruce Robbins commented on SPARK-23410: --- [~maxgekk] My simple test input of [{"field1": 10,

[jira] [Comment Edited] (SPARK-23410) Unable to read jsons in charset different from UTF-8

2018-02-14 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364866#comment-16364866 ] Bruce Robbins edited comment on SPARK-23410 at 2/14/18 10:21 PM: -

[jira] [Commented] (SPARK-23240) PythonWorkerFactory issues unhelpful message when pyspark.daemon produces bogus stdout

2018-02-10 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359568#comment-16359568 ] Bruce Robbins commented on SPARK-23240: --- A little background. A Spark installation had a Python

[jira] [Commented] (SPARK-23417) pyspark tests give wrong sbt instructions

2018-02-16 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368031#comment-16368031 ] Bruce Robbins commented on SPARK-23417: --- This does the trick: {noformat} build/sbt -Pkafka-0-8

[jira] [Updated] (SPARK-22940) Test suite HiveExternalCatalogVersionsSuite fails on platforms that don't have wget installed

2018-01-02 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-22940: -- Description: On platforms that don't have wget installed (e.g., Mac OS X), test suite

[jira] [Created] (SPARK-22940) Test suite HiveExternalCatalogVersionsSuite fails on platforms that don't have wget installed

2018-01-02 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-22940: - Summary: Test suite HiveExternalCatalogVersionsSuite fails on platforms that don't have wget installed Key: SPARK-22940 URL: https://issues.apache.org/jira/browse/SPARK-22940

[jira] [Updated] (SPARK-22940) Test suite HiveExternalCatalogVersionsSuite fails on platforms that don't have wget installed

2018-01-02 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-22940: -- Description: On platforms that don't have wget installed (e.g., Mac OS X), test suite

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-10 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577041#comment-16577041 ] Bruce Robbins commented on SPARK-23207: --- I can help out here. I will make a PR for branch-2.2 in

[jira] [Created] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-20 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-25164: - Summary: Parquet reader builds entire list of columns once for each column Key: SPARK-25164 URL: https://issues.apache.org/jira/browse/SPARK-25164 Project: Spark

[jira] [Updated] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-25164: -- Description: {{VectorizedParquetRecordReader.initializeInternal}} loops through each column,

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-08-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588168#comment-16588168 ] Bruce Robbins commented on SPARK-25164: --- [~viirya] Sure. I will try to get something up by tonight

[jira] [Commented] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-08-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590847#comment-16590847 ] Bruce Robbins commented on SPARK-23207: --- Will we be back-porting this to 2.1, or does the 18 month

[jira] [Updated] (SPARK-24814) Relationship between catalog and datasources

2018-07-18 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24814: -- Description: This is somewhat related, though not identical to, [~rdblue]'s SPIP on

[jira] [Created] (SPARK-24814) Relationship between catalog and datasources

2018-07-15 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24814: - Summary: Relationship between catalog and datasources Key: SPARK-24814 URL: https://issues.apache.org/jira/browse/SPARK-24814 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24814) Relationship between catalog and datasources

2018-07-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553336#comment-16553336 ] Bruce Robbins commented on SPARK-24814: --- [~rdblue] Your parquet example is a compelling one. If

[jira] [Updated] (SPARK-24814) Relationship between catalog and datasources

2018-07-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24814: -- Description: This is somewhat related, though not identical to, [~rdblue]'s SPIP on

[jira] [Created] (SPARK-24912) Broadcast join OutOfMemory stack trace obscures actual cause of OOM

2018-07-24 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24912: - Summary: Broadcast join OutOfMemory stack trace obscures actual cause of OOM Key: SPARK-24912 URL: https://issues.apache.org/jira/browse/SPARK-24912 Project: Spark

[jira] [Updated] (SPARK-24912) Broadcast join OutOfMemory stack trace obscures actual cause of OOM

2018-07-24 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24912: -- Priority: Minor (was: Major) > Broadcast join OutOfMemory stack trace obscures actual cause

[jira] [Created] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-24 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24914: - Summary: totalSize is not a good estimate for broadcast joins Key: SPARK-24914 URL: https://issues.apache.org/jira/browse/SPARK-24914 Project: Spark Issue

[jira] [Updated] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-24 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24914: -- Description: When determining whether to do a broadcast join, Spark estimates the size of

[jira] [Updated] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-24914: -- Description: When determining whether to do a broadcast join, Spark estimates the size of

[jira] [Commented] (SPARK-24914) totalSize is not a good estimate for broadcast joins

2018-07-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555998#comment-16555998 ] Bruce Robbins commented on SPARK-24914: --- [~irashid] {quote} given HIVE-20079, can we also have a

[jira] [Commented] (SPARK-24316) Spark sql queries stall for column width more than 6k for parquet based table

2018-09-04 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603453#comment-16603453 ] Bruce Robbins commented on SPARK-24316: --- This is likely SPARK-25164. > Spark sql queries stall

[jira] [Commented] (SPARK-23243) Shuffle+Repartition on an RDD could lead to incorrect answers

2018-09-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608167#comment-16608167 ] Bruce Robbins commented on SPARK-23243: --- Any plans to back port this to 2.2? >

[jira] [Commented] (SPARK-23243) Shuffle+Repartition on an RDD could lead to incorrect answers

2018-09-08 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608169#comment-16608169 ] Bruce Robbins commented on SPARK-23243: --- BTW, I took a stab at back porting it to 2.2, but to get

[jira] [Created] (SPARK-24758) Create table wants to use /user/hive/warehouse in clean clone

2018-07-07 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24758: - Summary: Create table wants to use /user/hive/warehouse in clean clone Key: SPARK-24758 URL: https://issues.apache.org/jira/browse/SPARK-24758 Project: Spark

[jira] [Commented] (SPARK-23629) Building streaming-kafka-0-8-assembly or streaming-flume-assembly adds incompatible jline jar to assembly

2018-07-07 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535944#comment-16535944 ] Bruce Robbins commented on SPARK-23629: --- Whatever was causing this, it is now gone away. Problem

[jira] [Resolved] (SPARK-23629) Building streaming-kafka-0-8-assembly or streaming-flume-assembly adds incompatible jline jar to assembly

2018-07-07 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins resolved SPARK-23629. --- Resolution: Cannot Reproduce > Building streaming-kafka-0-8-assembly or

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611331#comment-16611331 ] Bruce Robbins commented on SPARK-25164: --- Thanks [~Tagar] for the feedback. I assume the 44%

[jira] [Commented] (SPARK-23560) A joinWith followed by groupBy requires extra shuffle

2018-03-07 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390111#comment-16390111 ] Bruce Robbins commented on SPARK-23560: --- The main issue is that an AttributeReference instance

[jira] [Updated] (SPARK-23560) A joinWith followed by groupBy requires extra shuffle

2018-03-07 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23560: -- Description: Depending on the size of the input, a joinWith followed by a groupBy requires

[jira] [Commented] (SPARK-23560) A joinWith followed by groupBy requires extra shuffle

2018-03-10 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394369#comment-16394369 ] Bruce Robbins commented on SPARK-23560: --- A simpler example that seems to reproduce this issue

[jira] [Created] (SPARK-23629) Building streaming-kafka-0-8-assembly or streaming-flume-assembly adds incompatible jline jar to assembly

2018-03-08 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23629: - Summary: Building streaming-kafka-0-8-assembly or streaming-flume-assembly adds incompatible jline jar to assembly Key: SPARK-23629 URL:

[jira] [Created] (SPARK-23963) Queries on text-based Hive tables grow disproportionately slower as the number of columns increase

2018-04-11 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23963: - Summary: Queries on text-based Hive tables grow disproportionately slower as the number of columns increase Key: SPARK-23963 URL:

[jira] [Updated] (SPARK-23963) Queries on text-based Hive tables grow disproportionately slower as the number of columns increase

2018-04-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23963: -- Description: TableReader gets disproportionately slower as the number of columns in the query

[jira] [Updated] (SPARK-23963) Queries on text-based Hive tables grow disproportionately slower as the number of columns increase

2018-04-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23963: -- Description: TableReader gets disproportionately slower as the number of columns in the query

[jira] [Comment Edited] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-11 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431403#comment-16431403 ] Bruce Robbins edited comment on SPARK-23715 at 4/11/18 8:09 PM: I've been

[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>

2018-04-12 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436133#comment-16436133 ] Bruce Robbins commented on SPARK-23936: --- I would like to take this one, assuming no one has taken

[jira] [Commented] (SPARK-23936) High-order function: map_concat(map1<K, V>, map2<K, V>, ..., mapN<K, V>) → map<K,V>

2018-04-13 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437985#comment-16437985 ] Bruce Robbins commented on SPARK-23936: --- I will have a WIP pull request tonight or tomorrow

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-09 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431403#comment-16431403 ] Bruce Robbins commented on SPARK-23715: --- I've been convinced this is worth fixing, at least for

[jira] [Created] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-21 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-24043: - Summary: InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions Key: SPARK-24043 URL: https://issues.apache.org/jira/browse/SPARK-24043

[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-24 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449352#comment-16449352 ] Bruce Robbins commented on SPARK-24043: --- You're half-way there. When whole-stage codegen is off

[jira] [Commented] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-04-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449281#comment-16449281 ] Bruce Robbins commented on SPARK-24043: --- [~maropu] > Do I miss any precondition? For this bug to

[jira] [Commented] (SPARK-23963) Queries on text-based Hive tables grow disproportionately slower as the number of columns increase

2018-04-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441378#comment-16441378 ] Bruce Robbins commented on SPARK-23963: --- [~Tagar] Yes, although I am a little fuzzy on the process

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Updated] (SPARK-23560) Group by on struct field can add extra shuffle

2018-03-17 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23560: -- Summary: Group by on struct field can add extra shuffle (was: A joinWith followed by groupBy

[jira] [Created] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-16 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23715: - Summary: from_utc_timestamp returns incorrect results for some UTC date/time values Key: SPARK-23715 URL: https://issues.apache.org/jira/browse/SPARK-23715

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-16 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403183#comment-16403183 ] Bruce Robbins commented on SPARK-23715: --- It almost seems like FromUTCTimestamp needs its own Cast

[jira] [Created] (SPARK-23776) pyspark-sql tests should display build instructions when components are missing

2018-03-22 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23776: - Summary: pyspark-sql tests should display build instructions when components are missing Key: SPARK-23776 URL: https://issues.apache.org/jira/browse/SPARK-23776

[jira] [Commented] (SPARK-23776) pyspark-sql tests should display build instructions when components are missing

2018-03-23 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411854#comment-16411854 ] Bruce Robbins commented on SPARK-23776: --- As it turns out, the building-spark page does have maven

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-20 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-20 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-20 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-20 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406836#comment-16406836 ] Bruce Robbins commented on SPARK-23715: --- A fix to this requires some ugly hacking of the implicit

[jira] [Updated] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-03-20 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-23715: -- Description: This produces the expected answer: {noformat}

[jira] [Created] (SPARK-23560) A joinWith followed by groupBy requires extra shuffle

2018-03-01 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-23560: - Summary: A joinWith followed by groupBy requires extra shuffle Key: SPARK-23560 URL: https://issues.apache.org/jira/browse/SPARK-23560 Project: Spark

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452790#comment-16452790 ] Bruce Robbins commented on SPARK-23715: --- [~cloud_fan] I'll give separate answers for String input

[jira] [Comment Edited] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-04-25 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452790#comment-16452790 ] Bruce Robbins edited comment on SPARK-23715 at 4/25/18 10:00 PM: -

[jira] [Comment Edited] (SPARK-25643) Performance issues querying wide rows

2018-10-15 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650866#comment-16650866 ] Bruce Robbins edited comment on SPARK-25643 at 10/15/18 10:08 PM: --

[jira] [Commented] (SPARK-25643) Performance issues querying wide rows

2018-10-15 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650866#comment-16650866 ] Bruce Robbins commented on SPARK-25643: --- [~viirya] Yes, in the case where I said "predicate push

[jira] [Commented] (SPARK-24758) Create table wants to use /user/hive/warehouse in clean clone

2018-10-31 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16670665#comment-16670665 ] Bruce Robbins commented on SPARK-24758: --- This issue was introduced by commit 

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-10-04 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638803#comment-16638803 ] Bruce Robbins commented on SPARK-25164: --- [~Tagar] I've opened SPARK-25643 to keep track of the

[jira] [Created] (SPARK-25643) Performance issues querying wide rows

2018-10-04 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-25643: - Summary: Performance issues querying wide rows Key: SPARK-25643 URL: https://issues.apache.org/jira/browse/SPARK-25643 Project: Spark Issue Type:

[jira] [Commented] (SPARK-25164) Parquet reader builds entire list of columns once for each column

2018-09-19 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621125#comment-16621125 ] Bruce Robbins commented on SPARK-25164: --- {quote}I am thinking if it's feasible to lazily realize

[jira] [Commented] (SPARK-23715) from_utc_timestamp returns incorrect results for some UTC date/time values

2018-09-19 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621452#comment-16621452 ] Bruce Robbins commented on SPARK-23715: --- Hi [~rxin], Thanks for following up with me. This is a

[jira] [Commented] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748277#comment-16748277 ] Bruce Robbins commented on SPARK-26680: --- I will make a PR for this, but I would like to hear any

[jira] [Created] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-21 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26680: - Summary: StackOverflowError if Stream passed to groupBy Key: SPARK-26680 URL: https://issues.apache.org/jira/browse/SPARK-26680 Project: Spark Issue Type:

[jira] [Updated] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-21 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26680: -- Description: This Java code results in a StackOverflowError: {code:java} List groupByCols =

[jira] [Updated] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-22 Thread Bruce Robbins (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruce Robbins updated SPARK-26680: -- Affects Version/s: 2.4.0 > StackOverflowError if Stream passed to groupBy >

  1   2   3   4   5   >