[ https://issues.apache.org/jira/browse/SPARK-6659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell closed SPARK-6659. ---------------------------------- Resolution: Invalid Per the comment, I think the issue is the JSON is not correctly formatted. > Spark SQL 1.3 cannot read json file that only with a record. > ------------------------------------------------------------ > > Key: SPARK-6659 > URL: https://issues.apache.org/jira/browse/SPARK-6659 > Project: Spark > Issue Type: Bug > Components: SQL > Reporter: luochenghui > > Dear friends: > > Spark SQL 1.3 cannot read json file that only with a record. > here is my json file's content. > {"name":"milo","age",24} > > when i run Spark SQL under the local mode,it throws an exception > rg.apache.spark.sql.AnalysisException: cannot resolve 'name' given input > columns _corrupt_record; > > what i had done: > 1 ./spark-shell > 2 > scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) > sqlContext: org.apache.spark.sql.SQLContext = > org.apache.spark.sql.SQLContext@5f3be6c8 > > scala> val df = sqlContext.jsonFile("/home/milo/person.json") > 15/03/19 22:11:45 INFO MemoryStore: ensureFreeSpace(163705) called with > curMem=0, maxMem=280248975 > 15/03/19 22:11:45 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 159.9 KB, free 267.1 MB) > 15/03/19 22:11:45 INFO MemoryStore: ensureFreeSpace(22692) called with > curMem=163705, maxMem=280248975 > 15/03/19 22:11:45 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 22.2 KB, free 267.1 MB) > 15/03/19 22:11:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:35842 (size: 22.2 KB, free: 267.2 MB) > 15/03/19 22:11:45 INFO BlockManagerMaster: Updated info of block > broadcast_0_piece0 > 15/03/19 22:11:45 INFO SparkContext: Created broadcast 0 from textFile at > JSONRelation.scala:98 > 15/03/19 22:11:47 INFO FileInputFormat: Total input paths to process : 1 > 15/03/19 22:11:47 INFO SparkContext: Starting job: reduce at JsonRDD.scala:51 > 15/03/19 22:11:47 INFO DAGScheduler: Got job 0 (reduce at JsonRDD.scala:51) > with 1 output partitions (allowLocal=false) > 15/03/19 22:11:47 INFO DAGScheduler: Final stage: Stage 0(reduce at > JsonRDD.scala:51) > 15/03/19 22:11:47 INFO DAGScheduler: Parents of final stage: List() > 15/03/19 22:11:47 INFO DAGScheduler: Missing parents: List() > 15/03/19 22:11:47 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[3] > at map at JsonRDD.scala:51), which has no missing parents > 15/03/19 22:11:47 INFO MemoryStore: ensureFreeSpace(3184) called with > curMem=186397, maxMem=280248975 > 15/03/19 22:11:47 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 3.1 KB, free 267.1 MB) > 15/03/19 22:11:47 INFO MemoryStore: ensureFreeSpace(2251) called with > curMem=189581, maxMem=280248975 > 15/03/19 22:11:47 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 2.2 KB, free 267.1 MB) > 15/03/19 22:11:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on localhost:35842 (size: 2.2 KB, free: 267.2 MB) > 15/03/19 22:11:47 INFO BlockManagerMaster: Updated info of block > broadcast_1_piece0 > 15/03/19 22:11:47 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:839 > 15/03/19 22:11:48 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 > (MapPartitionsRDD[3] at map at JsonRDD.scala:51) > 15/03/19 22:11:48 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks > 15/03/19 22:11:48 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, PROCESS_LOCAL, 1291 bytes) > 15/03/19 22:11:48 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 15/03/19 22:11:48 INFO HadoopRDD: Input split: > file:/home/milo/person.json:0+26 > 15/03/19 22:11:48 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 15/03/19 22:11:48 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 15/03/19 22:11:48 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 15/03/19 22:11:48 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 15/03/19 22:11:48 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 15/03/19 22:11:49 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2023 > bytes result sent to driver > 15/03/19 22:11:49 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 1209 ms on localhost (1/1) > 15/03/19 22:11:49 INFO DAGScheduler: Stage 0 (reduce at JsonRDD.scala:51) > finished in 1.308 s > 15/03/19 22:11:49 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks > have all completed, from pool > 15/03/19 22:11:49 INFO DAGScheduler: Job 0 finished: reduce at > JsonRDD.scala:51, took 2.002429 s > df: org.apache.spark.sql.DataFrame = [_corrupt_record: string] > > 3 > scala> df.select("name").show() > 15/03/19 22:12:44 INFO BlockManager: Removing broadcast 1 > 15/03/19 22:12:44 INFO BlockManager: Removing block broadcast_1_piece0 > 15/03/19 22:12:44 INFO MemoryStore: Block broadcast_1_piece0 of size 2251 > dropped from memory (free 280059394) > 15/03/19 22:12:44 INFO BlockManagerInfo: Removed broadcast_1_piece0 on > localhost:35842 in memory (size: 2.2 KB, free: 267.2 MB) > 15/03/19 22:12:44 INFO BlockManagerMaster: Updated info of block > broadcast_1_piece0 > 15/03/19 22:12:44 INFO BlockManager: Removing block broadcast_1 > 15/03/19 22:12:44 INFO MemoryStore: Block broadcast_1 of size 3184 dropped > from memory (free 280062578) > 15/03/19 22:12:45 INFO ContextCleaner: Cleaned broadcast 1 > 15/03/19 22:12:45 INFO BlockManager: Removing broadcast 0 > 15/03/19 22:12:45 INFO BlockManager: Removing block broadcast_0 > 15/03/19 22:12:45 INFO MemoryStore: Block broadcast_0 of size 163705 dropped > from memory (free 280226283) > 15/03/19 22:12:45 INFO BlockManager: Removing block broadcast_0_piece0 > 15/03/19 22:12:45 INFO MemoryStore: Block broadcast_0_piece0 of size 22692 > dropped from memory (free 280248975) > 15/03/19 22:12:45 INFO BlockManagerInfo: Removed broadcast_0_piece0 on > localhost:35842 in memory (size: 22.2 KB, free: 267.3 MB) > 15/03/19 22:12:45 INFO BlockManagerMaster: Updated info of block > broadcast_0_piece0 > 15/03/19 22:12:45 INFO ContextCleaner: Cleaned broadcast 0 > org.apache.spark.sql.AnalysisException: cannot resolve 'name' given input > columns _corrupt_record; > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:48) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3$$anonfun$apply$1.applyOrElse(CheckAnalysis.scala:45) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:250) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:50) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:249) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:103) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:117) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2.apply(QueryPlan.scala:116) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:121) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3.apply(CheckAnalysis.scala:45) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$apply$3.apply(CheckAnalysis.scala:43) > at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:88) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.apply(CheckAnalysis.scala:43) > at > org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:1069) > at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133) > at org.apache.spark.sql.DataFrame.logicalPlanToDataFrame(DataFrame.scala:157) > at org.apache.spark.sql.DataFrame.select(DataFrame.scala:465) > at org.apache.spark.sql.DataFrame.select(DataFrame.scala:480) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) > at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) > at $iwC$$iwC$$iwC$$iwC.<init>(<console>:37) > at $iwC$$iwC$$iwC.<init>(<console>:39) > at $iwC$$iwC.<init>(<console>:41) > at $iwC.<init>(<console>:43) > at <init>(<console>:45) > at .<init>(<console>:49) > at .<clinit>(<console>) > at .<init>(<console>:7) > at .<clinit>(<console>) > at $print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) > at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813) > at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > but i invoke df.show() ,it could work. > scala> df.show() > 15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(81443) called with > curMem=0, maxMem=280248975 > 15/03/19 22:13:32 INFO MemoryStore: Block broadcast_2 stored as values in > memory (estimated size 79.5 KB, free 267.2 MB) > 15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(31262) called with > curMem=81443, maxMem=280248975 > 15/03/19 22:13:32 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes > in memory (estimated size 30.5 KB, free 267.2 MB) > 15/03/19 22:13:32 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory > on localhost:35842 (size: 30.5 KB, free: 267.2 MB) > 15/03/19 22:13:32 INFO BlockManagerMaster: Updated info of block > broadcast_2_piece0 > 15/03/19 22:13:32 INFO SparkContext: Created broadcast 2 from textFile at > JSONRelation.scala:98 > 15/03/19 22:13:32 INFO FileInputFormat: Total input paths to process : 1 > 15/03/19 22:13:32 INFO SparkContext: Starting job: runJob at > SparkPlan.scala:121 > 15/03/19 22:13:32 INFO DAGScheduler: Got job 1 (runJob at > SparkPlan.scala:121) with 1 output partitions (allowLocal=false) > 15/03/19 22:13:32 INFO DAGScheduler: Final stage: Stage 1(runJob at > SparkPlan.scala:121) > 15/03/19 22:13:32 INFO DAGScheduler: Parents of final stage: List() > 15/03/19 22:13:32 INFO DAGScheduler: Missing parents: List() > 15/03/19 22:13:32 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[8] > at map at SparkPlan.scala:96), which has no missing parents > 15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(3968) called with > curMem=112705, maxMem=280248975 > 15/03/19 22:13:32 INFO MemoryStore: Block broadcast_3 stored as values in > memory (estimated size 3.9 KB, free 267.2 MB) > 15/03/19 22:13:32 INFO MemoryStore: ensureFreeSpace(2724) called with > curMem=116673, maxMem=280248975 > 15/03/19 22:13:32 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes > in memory (estimated size 2.7 KB, free 267.2 MB) > 15/03/19 22:13:32 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory > on localhost:35842 (size: 2.7 KB, free: 267.2 MB) > 15/03/19 22:13:32 INFO BlockManagerMaster: Updated info of block > broadcast_3_piece0 > 15/03/19 22:13:32 INFO SparkContext: Created broadcast 3 from broadcast at > DAGScheduler.scala:839 > 15/03/19 22:13:32 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 > (MapPartitionsRDD[8] at map at SparkPlan.scala:96) > 15/03/19 22:13:32 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks > 15/03/19 22:13:32 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, > localhost, PROCESS_LOCAL, 1291 bytes) > 15/03/19 22:13:32 INFO Executor: Running task 0.0 in stage 1.0 (TID 1) > 15/03/19 22:13:32 INFO HadoopRDD: Input split: > file:/home/milo/person.json:0+26 > 15/03/19 22:13:33 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1968 > bytes result sent to driver > 15/03/19 22:13:33 INFO DAGScheduler: Stage 1 (runJob at SparkPlan.scala:121) > finished in 0.249 s > 15/03/19 22:13:33 INFO DAGScheduler: Job 1 finished: runJob at > SparkPlan.scala:121, took 0.381798 s > 15/03/19 22:13:33 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) > in 242 ms on localhost (1/1) > 15/03/19 22:13:33 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks > have all completed, from pool > _corrupt_record > {"name":"milo","a... > > And i tested another case with a json file more than one record,it ran > success. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org