Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-15 Thread Pei-Lun Lee
Thanks!

On Sat, Mar 14, 2015 at 3:31 AM, Michael Armbrust 
wrote:

> Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315
>
> On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust  >
> wrote:
>
> > We are looking at the issue and will likely fix it for Spark 1.3.1.
> >
> > On Thu, Mar 12, 2015 at 8:25 PM, giive chen  wrote:
> >
> >> Hi all
> >>
> >> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot
> read
> >> parquet file generated by Spark 1.1. It will cost a lot of migration
> work
> >> when we wanna to upgrade Spark 1.3.
> >>
> >> Is there  anyone can help me?
> >>
> >>
> >> Thanks
> >>
> >> Wisely Chen
> >>
> >>
> >> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee  wrote:
> >>
> >> > Hi,
> >> >
> >> > I found that if I try to read parquet file generated by spark 1.1.1
> >> using
> >> > 1.3.0-rc3 by default settings, I got this error:
> >> >
> >> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token
> >> > 'StructType': was expecting ('true', 'false' or 'null')
> >> >  at [Source: StructType(List(StructField(a,IntegerType,false))); line:
> >> 1,
> >> > column: 11]
> >> > at
> >> >
> >>
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
> >> > at
> >> >
> >> >
> >>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
> >> > at
> >> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
> >> > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
> >> > at
> >> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
> >> > at
> >> >
> >> >
> >>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >> > at
> >> >
> >> >
> >>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >> >
> >> >
> >> >
> >> > this is how I save parquet file with 1.1.1:
> >> >
> >> > sql("select 1 as a").saveAsParquetFile("/tmp/foo")
> >> >
> >> >
> >> >
> >> > and this is the meta data of the 1.1.1 parquet file:
> >> >
> >> > creator: parquet-mr version 1.4.3
> >> > extra:   org.apache.spark.sql.parquet.row.metadata =
> >> > StructType(List(StructField(a,IntegerType,false)))
> >> >
> >> >
> >> >
> >> > by comparison, this is 1.3.0 meta:
> >> >
> >> > creator: parquet-mr version 1.6.0rc3
> >> > extra:   org.apache.spark.sql.parquet.row.metadata =
> >> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
> >> > [more]...
> >> >
> >> >
> >> >
> >> > It looks like now ParquetRelation2 is used to load parquet file by
> >> default
> >> > and it only recognizes JSON format schema but 1.1.1 schema was case
> >> class
> >> > string format.
> >> >
> >> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
> >> > don't know the differences.
> >> > Is this considered a bug? We have a lot of parquet files from 1.1.1,
> >> should
> >> > we disable data source api in order to read them if we want to upgrade
> >> to
> >> > 1.3?
> >> >
> >> > Thanks,
> >> > --
> >> > Pei-Lun
> >> >
> >>
> >
> >
>


Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-13 Thread Michael Armbrust
Here is the JIRA: https://issues.apache.org/jira/browse/SPARK-6315

On Thu, Mar 12, 2015 at 11:00 PM, Michael Armbrust 
wrote:

> We are looking at the issue and will likely fix it for Spark 1.3.1.
>
> On Thu, Mar 12, 2015 at 8:25 PM, giive chen  wrote:
>
>> Hi all
>>
>> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
>> parquet file generated by Spark 1.1. It will cost a lot of migration work
>> when we wanna to upgrade Spark 1.3.
>>
>> Is there  anyone can help me?
>>
>>
>> Thanks
>>
>> Wisely Chen
>>
>>
>> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee  wrote:
>>
>> > Hi,
>> >
>> > I found that if I try to read parquet file generated by spark 1.1.1
>> using
>> > 1.3.0-rc3 by default settings, I got this error:
>> >
>> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token
>> > 'StructType': was expecting ('true', 'false' or 'null')
>> >  at [Source: StructType(List(StructField(a,IntegerType,false))); line:
>> 1,
>> > column: 11]
>> > at
>> >
>> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
>> > at
>> >
>> >
>> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
>> > at
>> >
>> >
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
>> > at
>> >
>> >
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
>> > at
>> >
>> >
>> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
>> > at
>> >
>> >
>> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
>> > at
>> >
>> >
>> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
>> > at
>> >
>> >
>> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
>> > at
>> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
>> > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
>> > at
>> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
>> > at
>> >
>> >
>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>> > at
>> >
>> >
>> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>> >
>> >
>> >
>> > this is how I save parquet file with 1.1.1:
>> >
>> > sql("select 1 as a").saveAsParquetFile("/tmp/foo")
>> >
>> >
>> >
>> > and this is the meta data of the 1.1.1 parquet file:
>> >
>> > creator: parquet-mr version 1.4.3
>> > extra:   org.apache.spark.sql.parquet.row.metadata =
>> > StructType(List(StructField(a,IntegerType,false)))
>> >
>> >
>> >
>> > by comparison, this is 1.3.0 meta:
>> >
>> > creator: parquet-mr version 1.6.0rc3
>> > extra:   org.apache.spark.sql.parquet.row.metadata =
>> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
>> > [more]...
>> >
>> >
>> >
>> > It looks like now ParquetRelation2 is used to load parquet file by
>> default
>> > and it only recognizes JSON format schema but 1.1.1 schema was case
>> class
>> > string format.
>> >
>> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
>> > don't know the differences.
>> > Is this considered a bug? We have a lot of parquet files from 1.1.1,
>> should
>> > we disable data source api in order to read them if we want to upgrade
>> to
>> > 1.3?
>> >
>> > Thanks,
>> > --
>> > Pei-Lun
>> >
>>
>
>


Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-12 Thread Michael Armbrust
We are looking at the issue and will likely fix it for Spark 1.3.1.

On Thu, Mar 12, 2015 at 8:25 PM, giive chen  wrote:

> Hi all
>
> My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
> parquet file generated by Spark 1.1. It will cost a lot of migration work
> when we wanna to upgrade Spark 1.3.
>
> Is there  anyone can help me?
>
>
> Thanks
>
> Wisely Chen
>
>
> On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee  wrote:
>
> > Hi,
> >
> > I found that if I try to read parquet file generated by spark 1.1.1 using
> > 1.3.0-rc3 by default settings, I got this error:
> >
> > com.fasterxml.jackson.core.JsonParseException: Unrecognized token
> > 'StructType': was expecting ('true', 'false' or 'null')
> >  at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
> > column: 11]
> > at
> >
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
> > at
> >
> >
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
> > at
> >
> >
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
> > at
> >
> >
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
> > at
> >
> >
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
> > at
> >
> >
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
> > at
> >
> >
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
> > at
> >
> >
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
> > at
> org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
> > at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
> > at
> > org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
> > at
> >
> >
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> > at
> >
> >
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> >
> >
> >
> > this is how I save parquet file with 1.1.1:
> >
> > sql("select 1 as a").saveAsParquetFile("/tmp/foo")
> >
> >
> >
> > and this is the meta data of the 1.1.1 parquet file:
> >
> > creator: parquet-mr version 1.4.3
> > extra:   org.apache.spark.sql.parquet.row.metadata =
> > StructType(List(StructField(a,IntegerType,false)))
> >
> >
> >
> > by comparison, this is 1.3.0 meta:
> >
> > creator: parquet-mr version 1.6.0rc3
> > extra:   org.apache.spark.sql.parquet.row.metadata =
> > {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
> > [more]...
> >
> >
> >
> > It looks like now ParquetRelation2 is used to load parquet file by
> default
> > and it only recognizes JSON format schema but 1.1.1 schema was case class
> > string format.
> >
> > Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
> > don't know the differences.
> > Is this considered a bug? We have a lot of parquet files from 1.1.1,
> should
> > we disable data source api in order to read them if we want to upgrade to
> > 1.3?
> >
> > Thanks,
> > --
> > Pei-Lun
> >
>


Re: SparkSQL 1.3.0 (RC3) failed to read parquet file generated by 1.1.1

2015-03-12 Thread giive chen
Hi all

My team has the same issue. It looks like Spark 1.3's sparkSQL cannot read
parquet file generated by Spark 1.1. It will cost a lot of migration work
when we wanna to upgrade Spark 1.3.

Is there  anyone can help me?


Thanks

Wisely Chen


On Tue, Mar 10, 2015 at 5:06 PM, Pei-Lun Lee  wrote:

> Hi,
>
> I found that if I try to read parquet file generated by spark 1.1.1 using
> 1.3.0-rc3 by default settings, I got this error:
>
> com.fasterxml.jackson.core.JsonParseException: Unrecognized token
> 'StructType': was expecting ('true', 'false' or 'null')
>  at [Source: StructType(List(StructField(a,IntegerType,false))); line: 1,
> column: 11]
> at
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1419)
> at
>
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:508)
> at
>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2300)
> at
>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1459)
> at
>
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:683)
> at
>
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3105)
> at
>
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3051)
> at
>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2161)
> at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
> at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
> at
> org.apache.spark.sql.types.DataType$.fromJson(dataTypes.scala:41)
> at
>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
> at
>
> org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$readSchema$1$$anonfun$25.apply(newParquet.scala:675)
>
>
>
> this is how I save parquet file with 1.1.1:
>
> sql("select 1 as a").saveAsParquetFile("/tmp/foo")
>
>
>
> and this is the meta data of the 1.1.1 parquet file:
>
> creator: parquet-mr version 1.4.3
> extra:   org.apache.spark.sql.parquet.row.metadata =
> StructType(List(StructField(a,IntegerType,false)))
>
>
>
> by comparison, this is 1.3.0 meta:
>
> creator: parquet-mr version 1.6.0rc3
> extra:   org.apache.spark.sql.parquet.row.metadata =
> {"type":"struct","fields":[{"name":"a","type":"integer","nullable":t
> [more]...
>
>
>
> It looks like now ParquetRelation2 is used to load parquet file by default
> and it only recognizes JSON format schema but 1.1.1 schema was case class
> string format.
>
> Setting spark.sql.parquet.useDataSourceApi to false will fix it, but I
> don't know the differences.
> Is this considered a bug? We have a lot of parquet files from 1.1.1, should
> we disable data source api in order to read them if we want to upgrade to
> 1.3?
>
> Thanks,
> --
> Pei-Lun
>