[jira] [Commented] (SPARK-33184) spark doesn't read data source column if it is used as an index to an array under a struct

2020-10-19 Thread colin fang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217001#comment-17217001 ] colin fang commented on SPARK-33184: I notice there is a quotation mark before `Proj

[jira] [Updated] (SPARK-33184) spark doesn't read data source column if it is used as an index to an array under a struct

2020-10-19 Thread colin fang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin fang updated SPARK-33184: --- Issue Type: Bug (was: Improvement) > spark doesn't read data source column if it is used as an inde

[jira] [Updated] (SPARK-33184) spark doesn't read data source column if it is used as an index to an array under a struct

2020-10-19 Thread colin fang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin fang updated SPARK-33184: --- Summary: spark doesn't read data source column if it is used as an index to an array under a struct

[jira] [Updated] (SPARK-33184) spark doesn't read data source column if it is needed as an index to an array in a nested struct

2020-10-19 Thread colin fang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin fang updated SPARK-33184: --- Description: {code:python} df = spark.createDataFrame([[1, [[1, 2, schema='x:int,y:struct>') df

[jira] [Updated] (SPARK-33184) spark doesn't read data source column if it is needed as an index to an array in a nested struct

2020-10-19 Thread colin fang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin fang updated SPARK-33184: --- Description: {code:python} df = spark.createDataFrame([[1, [[1, 2, schema='x:int,y:struct>') df

[jira] [Created] (SPARK-33184) spark doesn't read data source column if it is needed as an index to an array in a nested struct

2020-10-19 Thread colin fang (Jira)
colin fang created SPARK-33184: -- Summary: spark doesn't read data source column if it is needed as an index to an array in a nested struct Key: SPARK-33184 URL: https://issues.apache.org/jira/browse/SPARK-33184

[jira] [Created] (SPARK-28148) repartition after join is not optimized away

2019-06-24 Thread colin fang (JIRA)
colin fang created SPARK-28148: -- Summary: repartition after join is not optimized away Key: SPARK-28148 URL: https://issues.apache.org/jira/browse/SPARK-28148 Project: Spark Issue Type: Improvem

[jira] [Updated] (SPARK-27759) Do not auto cast array to np.array in vectorized udf

2019-06-11 Thread colin fang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] colin fang updated SPARK-27759: --- Description: {code:java} pd_df = pd.DataFrame({'x': np.random.rand(11, 3, 5).tolist()}) df = spark.c

[jira] [Created] (SPARK-27759) Do not auto cast array to np.array in vectorized udf

2019-05-17 Thread colin fang (JIRA)
colin fang created SPARK-27759: -- Summary: Do not auto cast array to np.array in vectorized udf Key: SPARK-27759 URL: https://issues.apache.org/jira/browse/SPARK-27759 Project: Spark Issue Type:

[jira] [Commented] (SPARK-17859) persist should not impede with spark's ability to perform a broadcast join.

2019-04-30 Thread colin fang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830504#comment-16830504 ] colin fang commented on SPARK-17859: The above case works for me in v2.4 {code:java}

[jira] [Created] (SPARK-27559) Nullable in a given schema is not respected when reading from parquet

2019-04-24 Thread colin fang (JIRA)
colin fang created SPARK-27559: -- Summary: Nullable in a given schema is not respected when reading from parquet Key: SPARK-27559 URL: https://issues.apache.org/jira/browse/SPARK-27559 Project: Spark

[jira] [Created] (SPARK-27217) Nested schema pruning doesn't work for aggregation e.g. `sum`.

2019-03-20 Thread colin fang (JIRA)
colin fang created SPARK-27217: -- Summary: Nested schema pruning doesn't work for aggregation e.g. `sum`. Key: SPARK-27217 URL: https://issues.apache.org/jira/browse/SPARK-27217 Project: Spark I