Re: Parquet Array Support Broken?

2015-09-08 Thread Cheng Lian
Yeah, this is a typical Parquet interoperability issue due to 
unfortunate historical reasons. Hive (actually parquet-hive) gives the 
following schema for array:


message m0 {
  optional group f (LIST) {
repeated group bag {
  optional int32 array_element;
}
}
}

while Spark SQL gives

message m1 {
  optional group f (LIST) {
repeated group bag {
  optional int32 array;
}
  }
}

So Spark 1.4 couldn't find the expected field "array" in the target 
Parquet file.  As Ruslan suggested, Spark 1.5 addresses this issue 
properly and is able to read Parquet files generated by most, if not 
all, Parquet data models out there.


You may find more details about Parquet interoperability in this post if 
you are interested 
https://www.mail-archive.com/user@spark.apache.org/msg35663.html


Cheng

On 9/8/15 6:19 AM, Alex Kozlov wrote:

Thank you - it works if the file is created in Spark

On Mon, Sep 7, 2015 at 3:06 PM, Ruslan Dautkhanov 
mailto:dautkha...@gmail.com>> wrote:


Read response from Cheng Lian mailto:lian.cs@gmail.com>> on Aug/27th - it looks the same
problem.

Workarounds
1. write that parquet file in Spark;
2. upgrade to Spark 1.5.

--
Ruslan Dautkhanov

On Mon, Sep 7, 2015 at 3:52 PM, Alex Kozlov mailto:ale...@gmail.com>> wrote:

No, it was created in Hive by CTAS, but any help is
appreciated...

On Mon, Sep 7, 2015 at 2:51 PM, Ruslan Dautkhanov
mailto:dautkha...@gmail.com>> wrote:

That parquet table wasn't created in Spark, is it?

There was a recent discussion on this list that complex
data types in Spark prior to 1.5 often incompatible with
Hive for example, if I remember correctly.

On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov mailto:ale...@gmail.com>> wrote:

I am trying to read an (array typed) parquet file in
spark-shell (Spark 1.4.1 with Hadoop 2.6):

{code}
$ bin/spark-shell
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See
http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/09/07 13:45:22 INFO SecurityManager: Changing view
acls to: hivedata
15/09/07 13:45:22 INFO SecurityManager: Changing
modify acls to: hivedata
15/09/07 13:45:22 INFO SecurityManager:
SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(hivedata);
users with modify permissions: Set(hivedata)
15/09/07 13:45:23 INFO HttpServer: Starting HTTP Server
15/09/07 13:45:23 INFO Utils: Successfully started
service 'HTTP class server' on port 43731.
Welcome to
  __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\ version 1.4.1
  /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit
Server VM, Java 1.8.0)
Type in expressions to have them evaluated.
Type :help for more information.
15/09/07 13:45:26 INFO SparkContext: Running Spark
version 1.4.1
15/09/07 13:45:26 INFO SecurityManager: Changing view
acls to: hivedata
15/09/07 13:45:26 INFO SecurityManager: Changing
modify acls to: hivedata
15/09/07 13:45:26 INFO SecurityManager:
SecurityManager: authentication disabled; ui acls
disabled; users with view permissions: Set(hivedata);
users with modify permissions: Set(hivedata)
15/09/07 13:45:27 INFO Slf4jLogger: Slf4jLogger started
15/09/07 13:45:27 INFO Remoting: Starting remoting
15/09/07 13:45:27 INFO Remoting: Remoting started;
listening on addresses
:[akka.tcp://sparkDriver@10.10.30.52:46083
]
15/09/07 13:45:27 INFO Utils: Successfully started
service 'sparkDriver' on port 46083.
15/09/07 13:45:27 INFO SparkEnv: Registering
MapOutputTracker
15/09/07 13:45:27 INFO SparkEnv: Registering
BlockManagerMaster
15/09/07 13:45:27 INFO DiskBlockManager: Created local
directory at

/tmp/spark-f313315a-0769-4057-835d-196cfe140a26/blockm

Re: Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
Thank you - it works if the file is created in Spark

On Mon, Sep 7, 2015 at 3:06 PM, Ruslan Dautkhanov 
wrote:

> Read response from Cheng Lian  on Aug/27th - it
> looks the same problem.
>
> Workarounds
> 1. write that parquet file in Spark;
> 2. upgrade to Spark 1.5.
>
> --
> Ruslan Dautkhanov
>
> On Mon, Sep 7, 2015 at 3:52 PM, Alex Kozlov  wrote:
>
>> No, it was created in Hive by CTAS, but any help is appreciated...
>>
>> On Mon, Sep 7, 2015 at 2:51 PM, Ruslan Dautkhanov 
>> wrote:
>>
>>> That parquet table wasn't created in Spark, is it?
>>>
>>> There was a recent discussion on this list that complex data types in
>>> Spark prior to 1.5 often incompatible with Hive for example, if I remember
>>> correctly.
>>> On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov  wrote:
>>>
 I am trying to read an (array typed) parquet file in spark-shell (Spark
 1.4.1 with Hadoop 2.6):

 {code}
 $ bin/spark-shell
 log4j:WARN No appenders could be found for logger
 (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
 for more info.
 Using Spark's default log4j profile:
 org/apache/spark/log4j-defaults.properties
 15/09/07 13:45:22 INFO SecurityManager: Changing view acls to: hivedata
 15/09/07 13:45:22 INFO SecurityManager: Changing modify acls to:
 hivedata
 15/09/07 13:45:22 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(hivedata);
 users with modify permissions: Set(hivedata)
 15/09/07 13:45:23 INFO HttpServer: Starting HTTP Server
 15/09/07 13:45:23 INFO Utils: Successfully started service 'HTTP class
 server' on port 43731.
 Welcome to
     __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 1.4.1
   /_/

 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.8.0)
 Type in expressions to have them evaluated.
 Type :help for more information.
 15/09/07 13:45:26 INFO SparkContext: Running Spark version 1.4.1
 15/09/07 13:45:26 INFO SecurityManager: Changing view acls to: hivedata
 15/09/07 13:45:26 INFO SecurityManager: Changing modify acls to:
 hivedata
 15/09/07 13:45:26 INFO SecurityManager: SecurityManager: authentication
 disabled; ui acls disabled; users with view permissions: Set(hivedata);
 users with modify permissions: Set(hivedata)
 15/09/07 13:45:27 INFO Slf4jLogger: Slf4jLogger started
 15/09/07 13:45:27 INFO Remoting: Starting remoting
 15/09/07 13:45:27 INFO Remoting: Remoting started; listening on
 addresses :[akka.tcp://sparkDriver@10.10.30.52:46083]
 15/09/07 13:45:27 INFO Utils: Successfully started service
 'sparkDriver' on port 46083.
 15/09/07 13:45:27 INFO SparkEnv: Registering MapOutputTracker
 15/09/07 13:45:27 INFO SparkEnv: Registering BlockManagerMaster
 15/09/07 13:45:27 INFO DiskBlockManager: Created local directory at
 /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/blockmgr-bd1b8498-9f6a-47c4-ae59-8800563f97d0
 15/09/07 13:45:27 INFO MemoryStore: MemoryStore started with capacity
 265.1 MB
 15/09/07 13:45:27 INFO HttpFileServer: HTTP File server directory is
 /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/httpd-3fbe0c9d-c0c5-41ef-bf72-4f0ef59bfa21
 15/09/07 13:45:27 INFO HttpServer: Starting HTTP Server
 15/09/07 13:45:27 INFO Utils: Successfully started service 'HTTP file
 server' on port 38717.
 15/09/07 13:45:27 INFO SparkEnv: Registering OutputCommitCoordinator
 15/09/07 13:45:27 WARN Utils: Service 'SparkUI' could not bind on port
 4040. Attempting port 4041.
 15/09/07 13:45:27 INFO Utils: Successfully started service 'SparkUI' on
 port 4041.
 15/09/07 13:45:27 INFO SparkUI: Started SparkUI at
 http://10.10.30.52:4041
 15/09/07 13:45:27 INFO Executor: Starting executor ID driver on host
 localhost
 15/09/07 13:45:27 INFO Executor: Using REPL class URI:
 http://10.10.30.52:43731
 15/09/07 13:45:27 INFO Utils: Successfully started service
 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60973.
 15/09/07 13:45:27 INFO NettyBlockTransferService: Server created on
 60973
 15/09/07 13:45:27 INFO BlockManagerMaster: Trying to register
 BlockManager
 15/09/07 13:45:27 INFO BlockManagerMasterEndpoint: Registering block
 manager localhost:60973 with 265.1 MB RAM, BlockManagerId(driver,
 localhost, 60973)
 15/09/07 13:45:27 INFO BlockManagerMaster: Registered BlockManager
 15/09/07 13:45:28 INFO SparkILoop: Created spark context..
 Spark context available as sc.
 15/09/07 13:45:28 INFO HiveContext: Initializing execution hive,
 version 0.13.1
 15/09/07 13:45:28 INFO HiveMe

Re: Parquet Array Support Broken?

2015-09-07 Thread Ruslan Dautkhanov
Read response from Cheng Lian  on Aug/27th - it
looks the same problem.

Workarounds
1. write that parquet file in Spark;
2. upgrade to Spark 1.5.


--
Ruslan Dautkhanov

On Mon, Sep 7, 2015 at 3:52 PM, Alex Kozlov  wrote:

> No, it was created in Hive by CTAS, but any help is appreciated...
>
> On Mon, Sep 7, 2015 at 2:51 PM, Ruslan Dautkhanov 
> wrote:
>
>> That parquet table wasn't created in Spark, is it?
>>
>> There was a recent discussion on this list that complex data types in
>> Spark prior to 1.5 often incompatible with Hive for example, if I remember
>> correctly.
>> On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov  wrote:
>>
>>> I am trying to read an (array typed) parquet file in spark-shell (Spark
>>> 1.4.1 with Hadoop 2.6):
>>>
>>> {code}
>>> $ bin/spark-shell
>>> log4j:WARN No appenders could be found for logger
>>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>>> log4j:WARN Please initialize the log4j system properly.
>>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
>>> for more info.
>>> Using Spark's default log4j profile:
>>> org/apache/spark/log4j-defaults.properties
>>> 15/09/07 13:45:22 INFO SecurityManager: Changing view acls to: hivedata
>>> 15/09/07 13:45:22 INFO SecurityManager: Changing modify acls to: hivedata
>>> 15/09/07 13:45:22 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users with view permissions: Set(hivedata);
>>> users with modify permissions: Set(hivedata)
>>> 15/09/07 13:45:23 INFO HttpServer: Starting HTTP Server
>>> 15/09/07 13:45:23 INFO Utils: Successfully started service 'HTTP class
>>> server' on port 43731.
>>> Welcome to
>>>     __
>>>  / __/__  ___ _/ /__
>>> _\ \/ _ \/ _ `/ __/  '_/
>>>/___/ .__/\_,_/_/ /_/\_\   version 1.4.1
>>>   /_/
>>>
>>> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
>>> 1.8.0)
>>> Type in expressions to have them evaluated.
>>> Type :help for more information.
>>> 15/09/07 13:45:26 INFO SparkContext: Running Spark version 1.4.1
>>> 15/09/07 13:45:26 INFO SecurityManager: Changing view acls to: hivedata
>>> 15/09/07 13:45:26 INFO SecurityManager: Changing modify acls to: hivedata
>>> 15/09/07 13:45:26 INFO SecurityManager: SecurityManager: authentication
>>> disabled; ui acls disabled; users with view permissions: Set(hivedata);
>>> users with modify permissions: Set(hivedata)
>>> 15/09/07 13:45:27 INFO Slf4jLogger: Slf4jLogger started
>>> 15/09/07 13:45:27 INFO Remoting: Starting remoting
>>> 15/09/07 13:45:27 INFO Remoting: Remoting started; listening on
>>> addresses :[akka.tcp://sparkDriver@10.10.30.52:46083]
>>> 15/09/07 13:45:27 INFO Utils: Successfully started service 'sparkDriver'
>>> on port 46083.
>>> 15/09/07 13:45:27 INFO SparkEnv: Registering MapOutputTracker
>>> 15/09/07 13:45:27 INFO SparkEnv: Registering BlockManagerMaster
>>> 15/09/07 13:45:27 INFO DiskBlockManager: Created local directory at
>>> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/blockmgr-bd1b8498-9f6a-47c4-ae59-8800563f97d0
>>> 15/09/07 13:45:27 INFO MemoryStore: MemoryStore started with capacity
>>> 265.1 MB
>>> 15/09/07 13:45:27 INFO HttpFileServer: HTTP File server directory is
>>> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/httpd-3fbe0c9d-c0c5-41ef-bf72-4f0ef59bfa21
>>> 15/09/07 13:45:27 INFO HttpServer: Starting HTTP Server
>>> 15/09/07 13:45:27 INFO Utils: Successfully started service 'HTTP file
>>> server' on port 38717.
>>> 15/09/07 13:45:27 INFO SparkEnv: Registering OutputCommitCoordinator
>>> 15/09/07 13:45:27 WARN Utils: Service 'SparkUI' could not bind on port
>>> 4040. Attempting port 4041.
>>> 15/09/07 13:45:27 INFO Utils: Successfully started service 'SparkUI' on
>>> port 4041.
>>> 15/09/07 13:45:27 INFO SparkUI: Started SparkUI at
>>> http://10.10.30.52:4041
>>> 15/09/07 13:45:27 INFO Executor: Starting executor ID driver on host
>>> localhost
>>> 15/09/07 13:45:27 INFO Executor: Using REPL class URI:
>>> http://10.10.30.52:43731
>>> 15/09/07 13:45:27 INFO Utils: Successfully started service
>>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60973.
>>> 15/09/07 13:45:27 INFO NettyBlockTransferService: Server created on 60973
>>> 15/09/07 13:45:27 INFO BlockManagerMaster: Trying to register
>>> BlockManager
>>> 15/09/07 13:45:27 INFO BlockManagerMasterEndpoint: Registering block
>>> manager localhost:60973 with 265.1 MB RAM, BlockManagerId(driver,
>>> localhost, 60973)
>>> 15/09/07 13:45:27 INFO BlockManagerMaster: Registered BlockManager
>>> 15/09/07 13:45:28 INFO SparkILoop: Created spark context..
>>> Spark context available as sc.
>>> 15/09/07 13:45:28 INFO HiveContext: Initializing execution hive, version
>>> 0.13.1
>>> 15/09/07 13:45:28 INFO HiveMetaStore: 0: Opening raw store with
>>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>>> 15/09/07 13:45:29 INFO ObjectStore: ObjectStore, initialize called
>>> 15/09/07 13:45:29 INFO Persistence: Property
>>> hive.metas

Re: Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
No, it was created in Hive by CTAS, but any help is appreciated...

On Mon, Sep 7, 2015 at 2:51 PM, Ruslan Dautkhanov 
wrote:

> That parquet table wasn't created in Spark, is it?
>
> There was a recent discussion on this list that complex data types in
> Spark prior to 1.5 often incompatible with Hive for example, if I remember
> correctly.
> On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov  wrote:
>
>> I am trying to read an (array typed) parquet file in spark-shell (Spark
>> 1.4.1 with Hadoop 2.6):
>>
>> {code}
>> $ bin/spark-shell
>> log4j:WARN No appenders could be found for logger
>> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
>> log4j:WARN Please initialize the log4j system properly.
>> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
>> more info.
>> Using Spark's default log4j profile:
>> org/apache/spark/log4j-defaults.properties
>> 15/09/07 13:45:22 INFO SecurityManager: Changing view acls to: hivedata
>> 15/09/07 13:45:22 INFO SecurityManager: Changing modify acls to: hivedata
>> 15/09/07 13:45:22 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(hivedata);
>> users with modify permissions: Set(hivedata)
>> 15/09/07 13:45:23 INFO HttpServer: Starting HTTP Server
>> 15/09/07 13:45:23 INFO Utils: Successfully started service 'HTTP class
>> server' on port 43731.
>> Welcome to
>>     __
>>  / __/__  ___ _/ /__
>> _\ \/ _ \/ _ `/ __/  '_/
>>/___/ .__/\_,_/_/ /_/\_\   version 1.4.1
>>   /_/
>>
>> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
>> Type in expressions to have them evaluated.
>> Type :help for more information.
>> 15/09/07 13:45:26 INFO SparkContext: Running Spark version 1.4.1
>> 15/09/07 13:45:26 INFO SecurityManager: Changing view acls to: hivedata
>> 15/09/07 13:45:26 INFO SecurityManager: Changing modify acls to: hivedata
>> 15/09/07 13:45:26 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(hivedata);
>> users with modify permissions: Set(hivedata)
>> 15/09/07 13:45:27 INFO Slf4jLogger: Slf4jLogger started
>> 15/09/07 13:45:27 INFO Remoting: Starting remoting
>> 15/09/07 13:45:27 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkDriver@10.10.30.52:46083]
>> 15/09/07 13:45:27 INFO Utils: Successfully started service 'sparkDriver'
>> on port 46083.
>> 15/09/07 13:45:27 INFO SparkEnv: Registering MapOutputTracker
>> 15/09/07 13:45:27 INFO SparkEnv: Registering BlockManagerMaster
>> 15/09/07 13:45:27 INFO DiskBlockManager: Created local directory at
>> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/blockmgr-bd1b8498-9f6a-47c4-ae59-8800563f97d0
>> 15/09/07 13:45:27 INFO MemoryStore: MemoryStore started with capacity
>> 265.1 MB
>> 15/09/07 13:45:27 INFO HttpFileServer: HTTP File server directory is
>> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/httpd-3fbe0c9d-c0c5-41ef-bf72-4f0ef59bfa21
>> 15/09/07 13:45:27 INFO HttpServer: Starting HTTP Server
>> 15/09/07 13:45:27 INFO Utils: Successfully started service 'HTTP file
>> server' on port 38717.
>> 15/09/07 13:45:27 INFO SparkEnv: Registering OutputCommitCoordinator
>> 15/09/07 13:45:27 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> 15/09/07 13:45:27 INFO Utils: Successfully started service 'SparkUI' on
>> port 4041.
>> 15/09/07 13:45:27 INFO SparkUI: Started SparkUI at
>> http://10.10.30.52:4041
>> 15/09/07 13:45:27 INFO Executor: Starting executor ID driver on host
>> localhost
>> 15/09/07 13:45:27 INFO Executor: Using REPL class URI:
>> http://10.10.30.52:43731
>> 15/09/07 13:45:27 INFO Utils: Successfully started service
>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60973.
>> 15/09/07 13:45:27 INFO NettyBlockTransferService: Server created on 60973
>> 15/09/07 13:45:27 INFO BlockManagerMaster: Trying to register BlockManager
>> 15/09/07 13:45:27 INFO BlockManagerMasterEndpoint: Registering block
>> manager localhost:60973 with 265.1 MB RAM, BlockManagerId(driver,
>> localhost, 60973)
>> 15/09/07 13:45:27 INFO BlockManagerMaster: Registered BlockManager
>> 15/09/07 13:45:28 INFO SparkILoop: Created spark context..
>> Spark context available as sc.
>> 15/09/07 13:45:28 INFO HiveContext: Initializing execution hive, version
>> 0.13.1
>> 15/09/07 13:45:28 INFO HiveMetaStore: 0: Opening raw store with
>> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>> 15/09/07 13:45:29 INFO ObjectStore: ObjectStore, initialize called
>> 15/09/07 13:45:29 INFO Persistence: Property
>> hive.metastore.integral.jdo.pushdown unknown - will be ignored
>> 15/09/07 13:45:29 INFO Persistence: Property datanucleus.cache.level2
>> unknown - will be ignored
>> 15/09/07 13:45:29 WARN Connection: BoneCP specified but not present in
>> CLASSPATH (or one of dependencies)
>> 15/09/07 13:45:29 WARN Connection: BoneCP specified but no

Re: Parquet Array Support Broken?

2015-09-07 Thread Ruslan Dautkhanov
That parquet table wasn't created in Spark, is it?

There was a recent discussion on this list that complex data types in Spark
prior to 1.5 often incompatible with Hive for example, if I remember
correctly.

On Mon, Sep 7, 2015, 2:57 PM Alex Kozlov  wrote:

> I am trying to read an (array typed) parquet file in spark-shell (Spark
> 1.4.1 with Hadoop 2.6):
>
> {code}
> $ bin/spark-shell
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/09/07 13:45:22 INFO SecurityManager: Changing view acls to: hivedata
> 15/09/07 13:45:22 INFO SecurityManager: Changing modify acls to: hivedata
> 15/09/07 13:45:22 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(hivedata);
> users with modify permissions: Set(hivedata)
> 15/09/07 13:45:23 INFO HttpServer: Starting HTTP Server
> 15/09/07 13:45:23 INFO Utils: Successfully started service 'HTTP class
> server' on port 43731.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.1
>   /_/
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/09/07 13:45:26 INFO SparkContext: Running Spark version 1.4.1
> 15/09/07 13:45:26 INFO SecurityManager: Changing view acls to: hivedata
> 15/09/07 13:45:26 INFO SecurityManager: Changing modify acls to: hivedata
> 15/09/07 13:45:26 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(hivedata);
> users with modify permissions: Set(hivedata)
> 15/09/07 13:45:27 INFO Slf4jLogger: Slf4jLogger started
> 15/09/07 13:45:27 INFO Remoting: Starting remoting
> 15/09/07 13:45:27 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkDriver@10.10.30.52:46083]
> 15/09/07 13:45:27 INFO Utils: Successfully started service 'sparkDriver'
> on port 46083.
> 15/09/07 13:45:27 INFO SparkEnv: Registering MapOutputTracker
> 15/09/07 13:45:27 INFO SparkEnv: Registering BlockManagerMaster
> 15/09/07 13:45:27 INFO DiskBlockManager: Created local directory at
> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/blockmgr-bd1b8498-9f6a-47c4-ae59-8800563f97d0
> 15/09/07 13:45:27 INFO MemoryStore: MemoryStore started with capacity
> 265.1 MB
> 15/09/07 13:45:27 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/httpd-3fbe0c9d-c0c5-41ef-bf72-4f0ef59bfa21
> 15/09/07 13:45:27 INFO HttpServer: Starting HTTP Server
> 15/09/07 13:45:27 INFO Utils: Successfully started service 'HTTP file
> server' on port 38717.
> 15/09/07 13:45:27 INFO SparkEnv: Registering OutputCommitCoordinator
> 15/09/07 13:45:27 WARN Utils: Service 'SparkUI' could not bind on port
> 4040. Attempting port 4041.
> 15/09/07 13:45:27 INFO Utils: Successfully started service 'SparkUI' on
> port 4041.
> 15/09/07 13:45:27 INFO SparkUI: Started SparkUI at http://10.10.30.52:4041
> 15/09/07 13:45:27 INFO Executor: Starting executor ID driver on host
> localhost
> 15/09/07 13:45:27 INFO Executor: Using REPL class URI:
> http://10.10.30.52:43731
> 15/09/07 13:45:27 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60973.
> 15/09/07 13:45:27 INFO NettyBlockTransferService: Server created on 60973
> 15/09/07 13:45:27 INFO BlockManagerMaster: Trying to register BlockManager
> 15/09/07 13:45:27 INFO BlockManagerMasterEndpoint: Registering block
> manager localhost:60973 with 265.1 MB RAM, BlockManagerId(driver,
> localhost, 60973)
> 15/09/07 13:45:27 INFO BlockManagerMaster: Registered BlockManager
> 15/09/07 13:45:28 INFO SparkILoop: Created spark context..
> Spark context available as sc.
> 15/09/07 13:45:28 INFO HiveContext: Initializing execution hive, version
> 0.13.1
> 15/09/07 13:45:28 INFO HiveMetaStore: 0: Opening raw store with
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 15/09/07 13:45:29 INFO ObjectStore: ObjectStore, initialize called
> 15/09/07 13:45:29 INFO Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 15/09/07 13:45:29 INFO Persistence: Property datanucleus.cache.level2
> unknown - will be ignored
> 15/09/07 13:45:29 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/09/07 13:45:29 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/09/07 13:45:36 INFO ObjectStore: Setting MetaStore object pin classes
> with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,

Re: Parquet Array Support Broken?

2015-09-07 Thread Alex Kozlov
The same error if I do:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val results = sqlContext.sql("SELECT * FROM stats")

but it does work from Hive shell directly...

On Mon, Sep 7, 2015 at 1:56 PM, Alex Kozlov  wrote:

> I am trying to read an (array typed) parquet file in spark-shell (Spark
> 1.4.1 with Hadoop 2.6):
>
> {code}
> $ bin/spark-shell
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/09/07 13:45:22 INFO SecurityManager: Changing view acls to: hivedata
> 15/09/07 13:45:22 INFO SecurityManager: Changing modify acls to: hivedata
> 15/09/07 13:45:22 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(hivedata);
> users with modify permissions: Set(hivedata)
> 15/09/07 13:45:23 INFO HttpServer: Starting HTTP Server
> 15/09/07 13:45:23 INFO Utils: Successfully started service 'HTTP class
> server' on port 43731.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 1.4.1
>   /_/
>
> Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0)
> Type in expressions to have them evaluated.
> Type :help for more information.
> 15/09/07 13:45:26 INFO SparkContext: Running Spark version 1.4.1
> 15/09/07 13:45:26 INFO SecurityManager: Changing view acls to: hivedata
> 15/09/07 13:45:26 INFO SecurityManager: Changing modify acls to: hivedata
> 15/09/07 13:45:26 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(hivedata);
> users with modify permissions: Set(hivedata)
> 15/09/07 13:45:27 INFO Slf4jLogger: Slf4jLogger started
> 15/09/07 13:45:27 INFO Remoting: Starting remoting
> 15/09/07 13:45:27 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkDriver@10.10.30.52:46083]
> 15/09/07 13:45:27 INFO Utils: Successfully started service 'sparkDriver'
> on port 46083.
> 15/09/07 13:45:27 INFO SparkEnv: Registering MapOutputTracker
> 15/09/07 13:45:27 INFO SparkEnv: Registering BlockManagerMaster
> 15/09/07 13:45:27 INFO DiskBlockManager: Created local directory at
> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/blockmgr-bd1b8498-9f6a-47c4-ae59-8800563f97d0
> 15/09/07 13:45:27 INFO MemoryStore: MemoryStore started with capacity
> 265.1 MB
> 15/09/07 13:45:27 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-f313315a-0769-4057-835d-196cfe140a26/httpd-3fbe0c9d-c0c5-41ef-bf72-4f0ef59bfa21
> 15/09/07 13:45:27 INFO HttpServer: Starting HTTP Server
> 15/09/07 13:45:27 INFO Utils: Successfully started service 'HTTP file
> server' on port 38717.
> 15/09/07 13:45:27 INFO SparkEnv: Registering OutputCommitCoordinator
> 15/09/07 13:45:27 WARN Utils: Service 'SparkUI' could not bind on port
> 4040. Attempting port 4041.
> 15/09/07 13:45:27 INFO Utils: Successfully started service 'SparkUI' on
> port 4041.
> 15/09/07 13:45:27 INFO SparkUI: Started SparkUI at http://10.10.30.52:4041
> 15/09/07 13:45:27 INFO Executor: Starting executor ID driver on host
> localhost
> 15/09/07 13:45:27 INFO Executor: Using REPL class URI:
> http://10.10.30.52:43731
> 15/09/07 13:45:27 INFO Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60973.
> 15/09/07 13:45:27 INFO NettyBlockTransferService: Server created on 60973
> 15/09/07 13:45:27 INFO BlockManagerMaster: Trying to register BlockManager
> 15/09/07 13:45:27 INFO BlockManagerMasterEndpoint: Registering block
> manager localhost:60973 with 265.1 MB RAM, BlockManagerId(driver,
> localhost, 60973)
> 15/09/07 13:45:27 INFO BlockManagerMaster: Registered BlockManager
> 15/09/07 13:45:28 INFO SparkILoop: Created spark context..
> Spark context available as sc.
> 15/09/07 13:45:28 INFO HiveContext: Initializing execution hive, version
> 0.13.1
> 15/09/07 13:45:28 INFO HiveMetaStore: 0: Opening raw store with
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 15/09/07 13:45:29 INFO ObjectStore: ObjectStore, initialize called
> 15/09/07 13:45:29 INFO Persistence: Property
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 15/09/07 13:45:29 INFO Persistence: Property datanucleus.cache.level2
> unknown - will be ignored
> 15/09/07 13:45:29 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/09/07 13:45:29 WARN Connection: BoneCP specified but not present in
> CLASSPATH (or one of dependencies)
> 15/09/07 13:45:36 INFO ObjectStore: Setting MetaStore object pin classes
> with
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"