[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27612: -- Target Version/s: 3.0.0 > Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays > of None > - > > Key: SPARK-27612 > URL: https://issues.apache.org/jira/browse/SPARK-27612 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Bryan Cutler >Assignee: Hyukjin Kwon >Priority: Blocker > Labels: correctness > Fix For: 3.0.0 > > > This seems to only affect Python 3. > When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there > ends up being rows that are filled with None. > > {code:java} > In [1]: from pyspark.sql.types import ArrayType, IntegerType > > In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, > ArrayType(IntegerType(), True)) > In [3]: df.distinct().collect() > > Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] > {code} > > From this example, it is consistently at elements 97, 98: > {code} > In [5]: df.collect()[-5:] > > Out[5]: > [Row(value=[1, 2, 3, 4]), > Row(value=[1, 2, 3, 4]), > Row(value=[None, None, None, None]), > Row(value=[None, None, None, None]), > Row(value=[1, 2, 3, 4])] > {code} > This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27612: - Priority: Blocker (was: Critical) > Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays > of None > - > > Key: SPARK-27612 > URL: https://issues.apache.org/jira/browse/SPARK-27612 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Bryan Cutler >Priority: Blocker > Labels: correctness > > This seems to only affect Python 3. > When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there > ends up being rows that are filled with None. > > {code:java} > In [1]: from pyspark.sql.types import ArrayType, IntegerType > > In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, > ArrayType(IntegerType(), True)) > In [3]: df.distinct().collect() > > Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] > {code} > > From this example, it is consistently at elements 97, 98: > {code} > In [5]: df.collect()[-5:] > > Out[5]: > [Row(value=[1, 2, 3, 4]), > Row(value=[1, 2, 3, 4]), > Row(value=[None, None, None, None]), > Row(value=[None, None, None, None]), > Row(value=[1, 2, 3, 4])] > {code} > This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27612: - Labels: correctness (was: ) > Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays > of None > - > > Key: SPARK-27612 > URL: https://issues.apache.org/jira/browse/SPARK-27612 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Bryan Cutler >Priority: Critical > Labels: correctness > > This seems to only affect Python 3. > When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there > ends up being rows that are filled with None. > > {code:java} > In [1]: from pyspark.sql.types import ArrayType, IntegerType > > In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, > ArrayType(IntegerType(), True)) > In [3]: df.distinct().collect() > > Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] > {code} > > From this example, it is consistently at elements 97, 98: > {code} > In [5]: df.collect()[-5:] > > Out[5]: > [Row(value=[1, 2, 3, 4]), > Row(value=[1, 2, 3, 4]), > Row(value=[None, None, None, None]), > Row(value=[None, None, None, None]), > Row(value=[1, 2, 3, 4])] > {code} > This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-27612: - Priority: Critical (was: Major) > Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays > of None > - > > Key: SPARK-27612 > URL: https://issues.apache.org/jira/browse/SPARK-27612 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Bryan Cutler >Priority: Critical > > This seems to only affect Python 3. > When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there > ends up being rows that are filled with None. > > {code:java} > In [1]: from pyspark.sql.types import ArrayType, IntegerType > > In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, > ArrayType(IntegerType(), True)) > In [3]: df.distinct().collect() > > Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] > {code} > > From this example, it is consistently at elements 97, 98: > {code} > In [5]: df.collect()[-5:] > > Out[5]: > [Row(value=[1, 2, 3, 4]), > Row(value=[1, 2, 3, 4]), > Row(value=[None, None, None, None]), > Row(value=[None, None, None, None]), > Row(value=[1, 2, 3, 4])] > {code} > This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27612: - Description: This seems to only affect Python 3. When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there ends up being rows that are filled with None. {code:java} In [1]: from pyspark.sql.types import ArrayType, IntegerType In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, ArrayType(IntegerType(), True)) In [3]: df.distinct().collect() Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} >From this example, it is consistently at elements 97, 98: {code} In [5]: df.collect()[-5:] Out[5]: [Row(value=[1, 2, 3, 4]), Row(value=[1, 2, 3, 4]), Row(value=[None, None, None, None]), Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} was: When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there ends up being rows that are filled with None. {code:java} In [1]: from pyspark.sql.types import ArrayType, IntegerType In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, ArrayType(IntegerType(), True)) In [3]: df.distinct().collect() Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} >From this example, it is consistently at elements 97, 98: {code:python} In [5]: df.collect()[-5:] Out[5]: [Row(value=[1, 2, 3, 4]), Row(value=[1, 2, 3, 4]), Row(value=[None, None, None, None]), Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} > Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays > of None > - > > Key: SPARK-27612 > URL: https://issues.apache.org/jira/browse/SPARK-27612 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Bryan Cutler >Priority: Major > > This seems to only affect Python 3. > When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there > ends up being rows that are filled with None. > > {code:java} > In [1]: from pyspark.sql.types import ArrayType, IntegerType > > In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, > ArrayType(IntegerType(), True)) > In [3]: df.distinct().collect() > > Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] > {code} > > From this example, it is consistently at elements 97, 98: > {code} > In [5]: df.collect()[-5:] > > Out[5]: > [Row(value=[1, 2, 3, 4]), > Row(value=[1, 2, 3, 4]), > Row(value=[None, None, None, None]), > Row(value=[None, None, None, None]), > Row(value=[1, 2, 3, 4])] > {code} > This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27612) Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays of None
[ https://issues.apache.org/jira/browse/SPARK-27612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-27612: - Description: When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there ends up being rows that are filled with None. {code:java} In [1]: from pyspark.sql.types import ArrayType, IntegerType In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, ArrayType(IntegerType(), True)) In [3]: df.distinct().collect() Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} >From this example, it is consistently at elements 97, 98: {code:python} In [5]: df.collect()[-5:] Out[5]: [Row(value=[1, 2, 3, 4]), Row(value=[1, 2, 3, 4]), Row(value=[None, None, None, None]), Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} was: When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there ends up being rows that are filled with None. {code:java} In [1]: from pyspark.sql.types import ArrayType, IntegerType In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, ArrayType(IntegerType(), True)) In [3]: df.distinct().collect() Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} >From this example, it is consistently at elements 97, 98: {code} In [5]: df.collect()[-5:] Out[5]: [Row(value=[1, 2, 3, 4]), Row(value=[1, 2, 3, 4]), Row(value=[None, None, None, None]), Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] {code} This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} > Creating a DataFrame in PySpark with ArrayType produces some Rows with Arrays > of None > - > > Key: SPARK-27612 > URL: https://issues.apache.org/jira/browse/SPARK-27612 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Bryan Cutler >Priority: Major > > When creating a DataFrame with type {{ArrayType(IntegerType(), True)}} there > ends up being rows that are filled with None. > > {code:java} > In [1]: from pyspark.sql.types import ArrayType, IntegerType > > In [2]: df = spark.createDataFrame([[1, 2, 3, 4]] * 100, > ArrayType(IntegerType(), True)) > In [3]: df.distinct().collect() > > Out[3]: [Row(value=[None, None, None, None]), Row(value=[1, 2, 3, 4])] > {code} > > From this example, it is consistently at elements 97, 98: > {code:python} > In [5]: df.collect()[-5:] > > Out[5]: > [Row(value=[1, 2, 3, 4]), > Row(value=[1, 2, 3, 4]), > Row(value=[None, None, None, None]), > Row(value=[None, None, None, None]), > Row(value=[1, 2, 3, 4])] > {code} > This also happens with a type of {{ArrayType(ArrayType(IntegerType(), True))}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org