[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-11478: - Labels: bulk-closed (was: ) > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang >Priority: Minor > Labels: bulk-closed > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The nullable of "labelIndex" return inconsistent value: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-11478: -- Priority: Minor (was: Major) > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang >Priority: Minor > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The nullable of "labelIndex" return inconsistent value: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11478: Description: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red} true {color})) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red} false {color})) {code} was: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red}true{color})) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red}false{color})) {code} > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The output: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), > StructField(labelIndex,DoubleType,{color:red} true {color})) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), > StructField(labelIndex,DoubleType,{color:red} false {color})) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11478: Description: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) {code} was: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red} true {color})) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red} false {color})) {code} > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The output: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11478: Description: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) {code} was: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) {code} > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The output: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11478: Description: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red}true{color})) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,{color:red}false{color})) {code} was: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) {code} > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The output: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), > StructField(labelIndex,DoubleType,{color:red}true{color})) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), > StructField(labelIndex,DoubleType,{color:red}false{color})) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11478) ML StringIndexer return inconsistent schema
[ https://issues.apache.org/jira/browse/SPARK-11478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-11478: Description: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The nullable of "labelIndex" return inconsistent value: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) {code} was: ML StringIndexer transform and transformSchema return inconsistent schema. {code} val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) val df = sqlContext.createDataFrame(data).toDF("id", "label") val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("labelIndex") .fit(df) val transformed = indexer.transform(df) println(transformed.schema.toString()) println(indexer.transformSchema(df.schema)) The output: StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) StructType(StructField(id,IntegerType,false), StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) {code} > ML StringIndexer return inconsistent schema > --- > > Key: SPARK-11478 > URL: https://issues.apache.org/jira/browse/SPARK-11478 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yanbo Liang > > ML StringIndexer transform and transformSchema return inconsistent schema. > {code} > val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, > "a"), (5, "c")), 2) > val df = sqlContext.createDataFrame(data).toDF("id", "label") > val indexer = new StringIndexer() > .setInputCol("label") > .setOutputCol("labelIndex") > .fit(df) > val transformed = indexer.transform(df) > println(transformed.schema.toString()) > println(indexer.transformSchema(df.schema)) > The nullable of "labelIndex" return inconsistent value: > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,true)) > StructType(StructField(id,IntegerType,false), > StructField(label,StringType,true), StructField(labelIndex,DoubleType,false)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org