Hi Amila, Error says that the ‘people.parquet’ file does not exist. Can you manually check to see if that file exists?
> Py4JJavaError: An error occurred while calling o53840.parquet. > : java.lang.AssertionError: assertion failed: No schema defined, and no > Parquet data file or summary file found under > file:/home/ubuntu/ipython/people.parquet2. Guru Medasani gdm...@gmail.com > On Sep 2, 2015, at 8:25 PM, Amila De Silva <jaa...@gmail.com> wrote: > > Hi All, > > I have a two node spark cluster, to which I'm connecting using IPython > notebook. > To see how data saving/loading works, I simply created a dataframe using > people.json using the Code below; > > df = sqlContext.read.json("examples/src/main/resources/people.json") > > Then called the following to save the dataframe as a parquet. > df.write.save("people.parquet") > > Tried loading the saved dataframe using; > df2 = sqlContext.read.parquet('people.parquet'); > > But this simply fails giving the following exception > > --------------------------------------------------------------------------- > Py4JJavaError Traceback (most recent call last) > <ipython-input-97-35f91873c48f> in <module>() > ----> 1 df2 = sqlContext.read.parquet('people.parquet2'); > > /srv/spark/python/pyspark/sql/readwriter.pyc in parquet(self, *path) > 154 [('name', 'string'), ('year', 'int'), ('month', 'int'), > ('day', 'int')] > 155 """ > --> 156 return > self._df(self._jreader.parquet(_to_seq(self._sqlContext._sc, path))) > 157 > 158 @since(1.4) > > /srv/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py in > __call__(self, *args) > 536 answer = self.gateway_client.send_command(command) > 537 return_value = get_return_value(answer, self.gateway_client, > --> 538 self.target_id, self.name <http://self.name/>) > 539 > 540 for temp_arg in temp_args: > > /srv/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py in > get_return_value(answer, gateway_client, target_id, name) > 298 raise Py4JJavaError( > 299 'An error occurred while calling {0}{1}{2}.\n'. > --> 300 format(target_id, '.', name), value) > 301 else: > 302 raise Py4JError( > > Py4JJavaError: An error occurred while calling o53840.parquet. > : java.lang.AssertionError: assertion failed: No schema defined, and no > Parquet data file or summary file found under > file:/home/ubuntu/ipython/people.parquet2. > at scala.Predef$.assert(Predef.scala:179) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.org$apache$spark$sql$parquet$ParquetRelation2$MetadataCache$$readSchema(newParquet.scala:429) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$11.apply(newParquet.scala:369) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$11.apply(newParquet.scala:369) > at scala.Option.orElse(Option.scala:257) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369) > at org.apache.spark.sql.parquet.ParquetRelation2.org > <http://org.apache.spark.sql.parquet.parquetrelation2.org/>$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:126) > at org.apache.spark.sql.parquet.ParquetRelation2.org > <http://org.apache.spark.sql.parquet.parquetrelation2.org/>$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:124) > at > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$dataSchema$1.apply(newParquet.scala:165) > at > org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$dataSchema$1.apply(newParquet.scala:165) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.parquet.ParquetRelation2.dataSchema(newParquet.scala:165) > at > org.apache.spark.sql.sources.HadoopFsRelation.schema$lzycompute(interfaces.scala:506) > at > org.apache.spark.sql.sources.HadoopFsRelation.schema(interfaces.scala:505) > at > org.apache.spark.sql.sources.LogicalRelation.<init>(LogicalRelation.scala:30) > at > org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:438) > at > org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:264) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:722) > > > I'm using spark-1.4.1-bin-hadoop2.6 with java 1.7. > > Thanks > Amila