[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012320#comment-15012320 ] Antonio Piccolboni commented on SPARK-10754: Maybe related. Created table from csv using spark-csv. Colnames contain a mix of upper and lower in file and continue to do so in table, as shown by describe. Then I create a table with CREATE TABLE AS SELECT. New table has lowercase col names. This seems case sensitive sometimes, and case insensitive some other times. Please let me know if I need to open a separate report. Test case follows Sample data "playerID","yearID","stint","teamID","lgID","G","G_batting","AB","R","H","X2B","X3B","HR","RBI","SB","CS","BB","SO","IBB","HBP","SH","SF","GIDP","G_old" "aardsda01",2004,1,"SFN","NL",11,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11 "aardsda01",2006,1,"CHN","NL",45,43,2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,45 "aardsda01",2007,1,"CHA","AL",25,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2 "aardsda01",2008,1,"BOS","AL",47,5,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,5 "aardsda01",2009,1,"SEA","AL",73,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,NA "aardsda01",2010,1,"SEA","AL",53,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,NA "aardsda01",2012,1,"NYA","AL",1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA "aaronha01",1954,1,"ML1","NL",122,122,468,58,131,27,6,13,69,2,2,28,39,NA,3,6,4,13,122 "aaronha01",1955,1,"ML1","NL",153,153,602,105,189,37,9,27,106,3,1,49,61,5,3,7,4,20,153 Create table with CREATE TABLE `batting` USING com.databricks.spark.csv OPTIONS (path '/var/folders/_p/1gx4vy311_x4syn2xq6f2xtcgr/T//Rtmp0E8pqi/file11a8546f94ed6', header 'TRUE', delimiter ',', quote '"', parserLib 'commons', mode 'PERMISSIVE', charset 'UTF-8', inferSchema 'TRUE', comment '#') Upper and lower cases preserved: Browse[6]> qy("describe batting", my_db) col_name data_type comment 1 playerIDstring 2 yearID int 3 stint int 4 teamIDstring 5 lgIDstring 6 G int 7 G_battingstring 8 ABstring 9 Rstring 10 Hstring 11 X2Bstring 12 X3Bstring 13HRstring 14 RBIstring 15SBstring 16CSstring 17BBstring 18SOstring 19 IBBstring 20 HBPstring 21SHstring 22SFstring 23 GIDPstring 24 G_oldstring Create other table with CREATE TABLE `xxhcteugas` AS SELECT `playerID` AS `playerID`, `yearID` AS `yearID`, `teamID` AS `teamID`, `G` AS `G`, `AB` AS `AB`, `R` AS `R`, `H` AS `H` FROM `batting` ORDER BY `playerID`, `yearID`, `teamID` Browse[6]> Upper case gone in colnames Browse[6]> qy("describe xxhcteugas", my_db) col_name data_type comment 1 playeridstring 2 yearid int 3 teamidstring 4g int 5 abstring 6rstring 7hstring > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at >
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968570#comment-14968570 ] Huaxin Gao commented on SPARK-10754: Hi Scott, I tried a temp table with a struct column but can't reproduce your problem. Could you please send me your test case? Thanks! > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967387#comment-14967387 ] Scott Lyons commented on SPARK-10754: - I continue to have this issue, and can confirm that case sensitivity is set to false. {code:java} val lData = sqlContext.parquetFile("input.parquet") lData.registerTempTable("inputTable") sqlContext.sql("SELECT COUNT(*), `fields.E` FROM inputTable GROUP BY `fields.E`") {code} Results in: {code} org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(E,StringType,true), StructField(e,StringType,true); {code} > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964041#comment-14964041 ] Yin Huai commented on SPARK-10754: -- Can you use {{HiveContext}}, which set {{spark.sql.caseSensitive}} to false by default. > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960802#comment-14960802 ] Rick Hillegas commented on SPARK-10754: --- Note that unquoted identifiers are case-insensitive in the SQL Standard. Thanks. > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960205#comment-14960205 ] Babulal commented on SPARK-10754: - Thank you Huaxin Gao for reply i checked with "spark.sql.caseSensitive=false " option it is working fine. Can we either make it default to false or document it (which you suggested ). i guess it is referred from SQLConf.scala val DIALECT = "spark.sql.dialect" val CASE_SENSITIVE = "spark.sql.caseSensitive" /** * caseSensitive analysis true by default */ def caseSensitiveAnalysis: Boolean = getConf(SQLConf.CASE_SENSITIVE, "true").toBoolean > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952434#comment-14952434 ] Huaxin Gao commented on SPARK-10754: I believe that it's working as designed that the table name in registerTempTable is case sensitive. I will check if there are any doc for this. If not, I will probably add one. > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10754) table and column name are case sensitive when json Dataframe was registered as tempTable using JavaSparkContext.
[ https://issues.apache.org/jira/browse/SPARK-10754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952561#comment-14952561 ] Yin Huai commented on SPARK-10754: -- If you use {{SQLContext}}, the resolution is case-sensitive. If you use {{HiveContext}}, the resolution will be case-insensitive (following Hive). > table and column name are case sensitive when json Dataframe was registered > as tempTable using JavaSparkContext. > - > > Key: SPARK-10754 > URL: https://issues.apache.org/jira/browse/SPARK-10754 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.1 > Environment: Linux ,Hadoop Version 1.3 >Reporter: Babulal > > Create a dataframe using json data source > SparkConf conf=new > SparkConf().setMaster("spark://xyz:7077")).setAppName("Spark Tabble"); > JavaSparkContext javacontext=new JavaSparkContext(conf); > SQLContext sqlContext=new SQLContext(javacontext); > > DataFrame df = > sqlContext.jsonFile("/user/root/examples/src/main/resources/people.json"); > > df.registerTempTable("sparktable"); > > Run the Query > > sqlContext.sql("select * from sparktable").show()// this will PASs > > > sqlContext.sql("select * from sparkTable").show()/// This will FAIL > > java.lang.RuntimeException: Table Not Found: sparkTable > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog$$anonfun$1.apply(Catalog.scala:115) > at scala.collection.MapLike$class.getOrElse(MapLike.scala:128) > at scala.collection.AbstractMap.getOrElse(Map.scala:58) > at > org.apache.spark.sql.catalyst.analysis.SimpleCatalog.lookupRelation(Catalog.scala:115) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:233) > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org