[jira] [Commented] (SPARK-6217) insertInto doesn't work in PySpark

2015-04-15 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496522#comment-14496522
 ] 

Yin Huai commented on SPARK-6217:
-

[~cpcloud] Right now, we do not support inserting into a table created from a 
Python collection. So, in your code, sdf and sdf2 are read only table because 
they were created from existing Python collections. If you want to insert into 
data into a table, you can first create a table based on a data source that 
supports insert (for example, Parquet data source) and use 
{{sdf.saveAsTable(sdf, parquet)}}. Then, you can insert data into the 
{{sdf}} table.

For now, we should provide a better error message when a user want to insert 
data into a read only table. For long term, I think it will be useful to 
support insert for tables created from collections of a program language.

 insertInto doesn't work in PySpark
 --

 Key: SPARK-6217
 URL: https://issues.apache.org/jira/browse/SPARK-6217
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.3.0
 Environment: Mac OS X Yosemite 10.10.2
 Python 2.7.9
 Spark 1.3.0
Reporter: Charles Cloud

 The following code, running in an IPython shell throws an error:
 {code:none}
 In [1]: from pyspark import SparkContext, HiveContext
 In [2]: sc = SparkContext('local[*]', 'test')
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 In [3]: sql = HiveContext(sc)
 In [4]: import pandas as pd
 In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': 
 list('abc')})
 In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': 
 list('def')})
 In [7]: sdf = sql.createDataFrame(df)
 In [8]: sdf2 = sql.createDataFrame(df2)
 In [9]: sql.registerDataFrameAsTable(sdf, 'sdf')
 In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2')
 In [11]: sql.cacheTable('sdf')
 In [12]: sql.cacheTable('sdf2')
 In [13]: sdf2.insertInto('sdf')  # throws an error
 {code}
 Here's the Java traceback:
 {code:none}
 Py4JJavaError: An error occurred while calling o270.insertInto.
 : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable 
 (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at 
 SQLContext.scala:1167), Map(), false
  InMemoryRelation [a#6,b#7L,c#8], true, 1, StorageLevel(true, true, 
 false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at 
 mapPartitions at SQLContext.scala:1167), Some(sdf2)
 at scala.Predef$.assert(Predef.scala:179)
 at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092)
 at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 I'd be ecstatic if this was my own fault, and I'm somehow using it 
 incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6217) insertInto doesn't work in PySpark

2015-04-15 Thread Charles Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496626#comment-14496626
 ] 

Charles Cloud commented on SPARK-6217:
--

Was there a more informative error message added? Is there some documentation 
that states the limitation here?

 insertInto doesn't work in PySpark
 --

 Key: SPARK-6217
 URL: https://issues.apache.org/jira/browse/SPARK-6217
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.3.0
 Environment: Mac OS X Yosemite 10.10.2
 Python 2.7.9
 Spark 1.3.0
Reporter: Charles Cloud

 The following code, running in an IPython shell throws an error:
 {code:none}
 In [1]: from pyspark import SparkContext, HiveContext
 In [2]: sc = SparkContext('local[*]', 'test')
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 In [3]: sql = HiveContext(sc)
 In [4]: import pandas as pd
 In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': 
 list('abc')})
 In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': 
 list('def')})
 In [7]: sdf = sql.createDataFrame(df)
 In [8]: sdf2 = sql.createDataFrame(df2)
 In [9]: sql.registerDataFrameAsTable(sdf, 'sdf')
 In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2')
 In [11]: sql.cacheTable('sdf')
 In [12]: sql.cacheTable('sdf2')
 In [13]: sdf2.insertInto('sdf')  # throws an error
 {code}
 Here's the Java traceback:
 {code:none}
 Py4JJavaError: An error occurred while calling o270.insertInto.
 : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable 
 (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at 
 SQLContext.scala:1167), Map(), false
  InMemoryRelation [a#6,b#7L,c#8], true, 1, StorageLevel(true, true, 
 false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at 
 mapPartitions at SQLContext.scala:1167), Some(sdf2)
 at scala.Predef$.assert(Predef.scala:179)
 at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092)
 at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 I'd be ecstatic if this was my own fault, and I'm somehow using it 
 incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6217) insertInto doesn't work in PySpark

2015-04-15 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496706#comment-14496706
 ] 

Yin Huai commented on SPARK-6217:
-

SPARK-6941 is used to track the work of better error messages and related doc 
update. We will make sure it get done for 1.4.

 insertInto doesn't work in PySpark
 --

 Key: SPARK-6217
 URL: https://issues.apache.org/jira/browse/SPARK-6217
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.3.0
 Environment: Mac OS X Yosemite 10.10.2
 Python 2.7.9
 Spark 1.3.0
Reporter: Charles Cloud

 The following code, running in an IPython shell throws an error:
 {code:none}
 In [1]: from pyspark import SparkContext, HiveContext
 In [2]: sc = SparkContext('local[*]', 'test')
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 In [3]: sql = HiveContext(sc)
 In [4]: import pandas as pd
 In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': 
 list('abc')})
 In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': 
 list('def')})
 In [7]: sdf = sql.createDataFrame(df)
 In [8]: sdf2 = sql.createDataFrame(df2)
 In [9]: sql.registerDataFrameAsTable(sdf, 'sdf')
 In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2')
 In [11]: sql.cacheTable('sdf')
 In [12]: sql.cacheTable('sdf2')
 In [13]: sdf2.insertInto('sdf')  # throws an error
 {code}
 Here's the Java traceback:
 {code:none}
 Py4JJavaError: An error occurred while calling o270.insertInto.
 : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable 
 (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at 
 SQLContext.scala:1167), Map(), false
  InMemoryRelation [a#6,b#7L,c#8], true, 1, StorageLevel(true, true, 
 false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at 
 mapPartitions at SQLContext.scala:1167), Some(sdf2)
 at scala.Predef$.assert(Predef.scala:179)
 at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092)
 at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 I'd be ecstatic if this was my own fault, and I'm somehow using it 
 incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6217) insertInto doesn't work in PySpark

2015-03-21 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14373044#comment-14373044
 ] 

Michael Armbrust commented on SPARK-6217:
-

This error (not a very clear one from a user perspective) looks like you are 
actually using a SQLContext instead of a HiveContext (though it could certainly 
be something wrong on our side too).  Can you double check your code segment?

 insertInto doesn't work in PySpark
 --

 Key: SPARK-6217
 URL: https://issues.apache.org/jira/browse/SPARK-6217
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.3.0
 Environment: Mac OS X Yosemite 10.10.2
 Python 2.7.9
 Spark 1.3.0
Reporter: Charles Cloud

 The following code, running in an IPython shell throws an error:
 {code:none}
 In [1]: from pyspark import SparkContext, HiveContext
 In [2]: sc = SparkContext('local[*]', 'test')
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 In [3]: sql = HiveContext(sc)
 In [4]: import pandas as pd
 In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': 
 list('abc')})
 In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': 
 list('def')})
 In [7]: sdf = sql.createDataFrame(df)
 In [8]: sdf2 = sql.createDataFrame(df2)
 In [9]: sql.registerDataFrameAsTable(sdf, 'sdf')
 In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2')
 In [11]: sql.cacheTable('sdf')
 In [12]: sql.cacheTable('sdf2')
 In [13]: sdf2.insertInto('sdf')  # throws an error
 {code}
 Here's the Java traceback:
 {code:none}
 Py4JJavaError: An error occurred while calling o270.insertInto.
 : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable 
 (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at 
 SQLContext.scala:1167), Map(), false
  InMemoryRelation [a#6,b#7L,c#8], true, 1, StorageLevel(true, true, 
 false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at 
 mapPartitions at SQLContext.scala:1167), Some(sdf2)
 at scala.Predef$.assert(Predef.scala:179)
 at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092)
 at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 I'd be ecstatic if this was my own fault, and I'm somehow using it 
 incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6217) insertInto doesn't work in PySpark

2015-03-21 Thread Charles Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14373056#comment-14373056
 ] 

Charles Cloud commented on SPARK-6217:
--

If you've built spark with Hive then you'll be able to reproduce the example by 
copy-pasting the above code (including the prompt numbers; IPython recognizes 
them and rips them off) directly into IPython.

 insertInto doesn't work in PySpark
 --

 Key: SPARK-6217
 URL: https://issues.apache.org/jira/browse/SPARK-6217
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.3.0
 Environment: Mac OS X Yosemite 10.10.2
 Python 2.7.9
 Spark 1.3.0
Reporter: Charles Cloud

 The following code, running in an IPython shell throws an error:
 {code:none}
 In [1]: from pyspark import SparkContext, HiveContext
 In [2]: sc = SparkContext('local[*]', 'test')
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 In [3]: sql = HiveContext(sc)
 In [4]: import pandas as pd
 In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': 
 list('abc')})
 In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': 
 list('def')})
 In [7]: sdf = sql.createDataFrame(df)
 In [8]: sdf2 = sql.createDataFrame(df2)
 In [9]: sql.registerDataFrameAsTable(sdf, 'sdf')
 In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2')
 In [11]: sql.cacheTable('sdf')
 In [12]: sql.cacheTable('sdf2')
 In [13]: sdf2.insertInto('sdf')  # throws an error
 {code}
 Here's the Java traceback:
 {code:none}
 Py4JJavaError: An error occurred while calling o270.insertInto.
 : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable 
 (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at 
 SQLContext.scala:1167), Map(), false
  InMemoryRelation [a#6,b#7L,c#8], true, 1, StorageLevel(true, true, 
 false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at 
 mapPartitions at SQLContext.scala:1167), Some(sdf2)
 at scala.Predef$.assert(Predef.scala:179)
 at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092)
 at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 I'd be ecstatic if this was my own fault, and I'm somehow using it 
 incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6217) insertInto doesn't work in PySpark

2015-03-21 Thread Charles Cloud (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14373054#comment-14373054
 ] 

Charles Cloud commented on SPARK-6217:
--

That code segment is an exact copy-paste of an IPython session. I ran it again 
by copy-pasting what I wrote here into IPython and I get the same error.

 insertInto doesn't work in PySpark
 --

 Key: SPARK-6217
 URL: https://issues.apache.org/jira/browse/SPARK-6217
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.3.0
 Environment: Mac OS X Yosemite 10.10.2
 Python 2.7.9
 Spark 1.3.0
Reporter: Charles Cloud

 The following code, running in an IPython shell throws an error:
 {code:none}
 In [1]: from pyspark import SparkContext, HiveContext
 In [2]: sc = SparkContext('local[*]', 'test')
 Spark assembly has been built with Hive, including Datanucleus jars on 
 classpath
 In [3]: sql = HiveContext(sc)
 In [4]: import pandas as pd
 In [5]: df = pd.DataFrame({'a': [1.0, 2.0, 3.0], 'b': [1, 2, 3], 'c': 
 list('abc')})
 In [6]: df2 = pd.DataFrame({'a': [2.0, 3.0, 4.0], 'b': [4, 5, 6], 'c': 
 list('def')})
 In [7]: sdf = sql.createDataFrame(df)
 In [8]: sdf2 = sql.createDataFrame(df2)
 In [9]: sql.registerDataFrameAsTable(sdf, 'sdf')
 In [10]: sql.registerDataFrameAsTable(sdf2, 'sdf2')
 In [11]: sql.cacheTable('sdf')
 In [12]: sql.cacheTable('sdf2')
 In [13]: sdf2.insertInto('sdf')  # throws an error
 {code}
 Here's the Java traceback:
 {code:none}
 Py4JJavaError: An error occurred while calling o270.insertInto.
 : java.lang.AssertionError: assertion failed: No plan for InsertIntoTable 
 (LogicalRDD [a#0,b#1L,c#2], MapPartitionsRDD[13] at mapPartitions at 
 SQLContext.scala:1167), Map(), false
  InMemoryRelation [a#6,b#7L,c#8], true, 1, StorageLevel(true, true, 
 false, true, 1), (PhysicalRDD [a#6,b#7L,c#8], MapPartitionsRDD[41] at 
 mapPartitions at SQLContext.scala:1167), Some(sdf2)
 at scala.Predef$.assert(Predef.scala:179)
 at 
 org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1085)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1083)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1089)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1092)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1092)
 at org.apache.spark.sql.DataFrame.insertInto(DataFrame.scala:1134)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:483)
 at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
 at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
 at py4j.Gateway.invoke(Gateway.java:259)
 at 
 py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
 at py4j.commands.CallCommand.execute(CallCommand.java:79)
 at py4j.GatewayConnection.run(GatewayConnection.java:207)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 I'd be ecstatic if this was my own fault, and I'm somehow using it 
 incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org