[jira] [Updated] (SPARK-17024) Weird behaviour of the DataFrame when a column name contains dots.

2016-08-19 Thread Iaroslav Zeigerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iaroslav Zeigerman updated SPARK-17024:
---
Affects Version/s: (was: 1.6.0)
   2.0.0

> Weird behaviour of the DataFrame when a column name contains dots.
> --
>
> Key: SPARK-17024
> URL: https://issues.apache.org/jira/browse/SPARK-17024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
>
> When a column name contains dots and one of the segment in a name is the same 
> as other column's name, Spark treats this column as a nested structure, 
> although the actual type of column is String/Int/etc. Example:
> {code}
>   val df = sqlContext.createDataFrame(Seq(
> ("user1", "task1"),
> ("user2", "task2")
>   )).toDF("user", "user.task")
> {code}
> Two columns "user" and "user.task". Both of them are string, and the schema 
> resolution seems to be correct:
> {noformat}
> root
>  |-- user: string (nullable = true)
>  |-- user.task: string (nullable = true)
> {noformat}
> But when I'm trying to query this DataFrame like i.e.:
> {code}
>   df.select(df("user"), df("user.task"))
> {code}
> Spark throws an exception "Can't extract value from user#2;" 
> It happens during the resolution of the LogicalPlan while processing the  
> "user.task" column.
> Here is the full stacktrace:
> {noformat}
> Can't extract value from user#2;
> org.apache.spark.sql.AnalysisException: Can't extract value from user#2;
>   at 
> org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:276)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:275)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:275)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:191)
>   at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
>   at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708)
>   at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696)
> {noformat}
> Is this actually an expected behaviour? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17024) Weird behaviour of the DataFrame when a column name contains dots.

2016-08-11 Thread Iaroslav Zeigerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iaroslav Zeigerman updated SPARK-17024:
---
Summary: Weird behaviour of the DataFrame when a column name contains dots. 
 (was: Weird behaviour of the DataFrame when the column name contains dots.)

> Weird behaviour of the DataFrame when a column name contains dots.
> --
>
> Key: SPARK-17024
> URL: https://issues.apache.org/jira/browse/SPARK-17024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Iaroslav Zeigerman
>
> When a column name contains dots and one of the segment in a name is the same 
> as other column's name, Spark treats this column as a nested structure, 
> although the actual type of column is String/Int/etc. Example:
> {code}
>   val df = sqlContext.createDataFrame(Seq(
> ("user1", "task1"),
> ("user2", "task2")
>   )).toDF("user", "user.task")
> {code}
> Two columns "user" and "user.task". Both of them are string, and the schema 
> resolution seems to be correct:
> {noformat}
> root
>  |-- user: string (nullable = true)
>  |-- user.task: string (nullable = true)
> {noformat}
> But when I'm trying to query this DataFrame like i.e.:
> {code}
>   df.select(df("user"), df("user.task"))
> {code}
> Spark throws an exception "Can't extract value from user#2;" 
> It happens during the resolution of the LogicalPlan and while processing the  
> "user.task" column.
> Here is the full stacktrace:
> {noformat}
> Can't extract value from user#2;
> org.apache.spark.sql.AnalysisException: Can't extract value from user#2;
>   at 
> org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:276)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:275)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:275)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:191)
>   at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
>   at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708)
>   at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696)
> {noformat}
> Is this actually an expected behaviour? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17024) Weird behaviour of the DataFrame when a column name contains dots.

2016-08-11 Thread Iaroslav Zeigerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iaroslav Zeigerman updated SPARK-17024:
---
Description: 
When a column name contains dots and one of the segment in a name is the same 
as other column's name, Spark treats this column as a nested structure, 
although the actual type of column is String/Int/etc. Example:

{code}
  val df = sqlContext.createDataFrame(Seq(
("user1", "task1"),
("user2", "task2")
  )).toDF("user", "user.task")
{code}

Two columns "user" and "user.task". Both of them are string, and the schema 
resolution seems to be correct:

{noformat}
root
 |-- user: string (nullable = true)
 |-- user.task: string (nullable = true)
{noformat}

But when I'm trying to query this DataFrame like i.e.:
{code}
  df.select(df("user"), df("user.task"))
{code}

Spark throws an exception "Can't extract value from user#2;" 
It happens during the resolution of the LogicalPlan while processing the  
"user.task" column.

Here is the full stacktrace:

{noformat}
Can't extract value from user#2;
org.apache.spark.sql.AnalysisException: Can't extract value from user#2;
at 
org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:276)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:275)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:275)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:191)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708)
at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696)
{noformat}

Is this actually an expected behaviour? 

  was:
When a column name contains dots and one of the segment in a name is the same 
as other column's name, Spark treats this column as a nested structure, 
although the actual type of column is String/Int/etc. Example:

{code}
  val df = sqlContext.createDataFrame(Seq(
("user1", "task1"),
("user2", "task2")
  )).toDF("user", "user.task")
{code}

Two columns "user" and "user.task". Both of them are string, and the schema 
resolution seems to be correct:

{noformat}
root
 |-- user: string (nullable = true)
 |-- user.task: string (nullable = true)
{noformat}

But when I'm trying to query this DataFrame like i.e.:
{code}
  df.select(df("user"), df("user.task"))
{code}

Spark throws an exception "Can't extract value from user#2;" 
It happens during the resolution of the LogicalPlan and while processing the  
"user.task" column.

Here is the full stacktrace:

{noformat}
Can't extract value from user#2;
org.apache.spark.sql.AnalysisException: Can't extract value from user#2;
at 
org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:276)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$4.apply(LogicalPlan.scala:275)
at 
scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:275)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:191)
at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:151)
at org.apache.spark.sql.DataFrame.col(DataFrame.scala:708)
at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:696)
{noformat}

Is this actually an expected behaviour? 


> Weird behaviour of the DataFrame when a column name contains dots.
> --
>
> Key: SPARK-17024
> URL: https://issues.apache.org/jira/browse/SPARK-17024
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Iaroslav Zeigerman
>
> When a column name contains dots and one of the segment in a name is the same 
> as other column's name, Spark treats this column as a nested structure, 
> although the actual type of column is String/Int/etc. Example:
> {code}
>   val df = sqlContext.createDataFrame(Seq(
> ("user1", "task1"),
> ("user2", "task2")
>   )).toDF("user", "user.task")
> {code}
> Two columns "user" and "user.task". Both of them are string, and the