[jira] [Created] (SPARK-25333) Ability to add new columns in the beginning of a Dataset

Walid Mellouli (JIRA) Tue, 04 Sep 2018 09:45:07 -0700

Walid Mellouli created SPARK-25333:
--------------------------------------

             Summary: Ability to add new columns in the beginning of a Dataset
                 Key: SPARK-25333
                 URL: https://issues.apache.org/jira/browse/SPARK-25333
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Walid Mellouli



When we add new columns in a Dataset, they are added automatically at the end 
of the Dataset.
{code:java}
val df = sc.parallelize(Seq(1, 2, 3)).toDF
df.printSchema


root
 |-- value: integer (nullable = true)
{code}

When we add a new column:

{code:java}
val newDf = df.withColumn("newColumn", col("value") + 1)
newDf.printSchema


root
 |-- value: integer (nullable = true)
 |-- newColumn: integer (nullable = true)
{code}

Generally users want to add new columns either at the end or in the beginning, 
depends on use cases.
 In my case for example, we add technical columns in the beginning of a Dataset 
and we add business columns at the end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25333) Ability to add new columns in the beginning of a Dataset

Reply via email to