Wow, glad to know that it works well, and sorry, the Jira is another issue,
which is not the same case here.
From: Bagmeet Behera [mailto:bagme...@gmail.com]
Sent: Saturday, January 17, 2015 12:47 AM
To: Cheng, Hao
Subject: Re: using hiveContext to select a nested Map-data-type from an
AVROmodel+parquet file
Hi Cheng, Hao
An update: I installed the latest binaries of Spark 1.2.0 (prebuild for
Hadoop 2.4 and later) and tried your suggestion. And it *works* perfectly!
Therefore I would encourage you to post your reply on the archive for the
advantage of all.
Thanks and best wishes,
BB (Bagmeet)
On Fri, Jan 16, 2015 at 11:20 AM, Bagmeet Behera
bagme...@gmail.commailto:bagme...@gmail.com wrote:
Hi Chen, Hao
The awesome thing is: the way you suggest works perfectly on Spark 1.1.0. - I
am testing this on a old test installation with Spark 1.1.0 (installed from
http://spark.apache.org/) with scala 2.10.4.
Just fyi: This was because I could not create a HiveContext on the newer
installation of spark 1.2.0 (scala 2.10.4) - from Cloudera CDH release 5.3.0 -
which gave some strange error that looked like there is some incompatibility
between hive and spark libraries. I can create a post for this (if I find an
appropriate user group, perhaps on cloudera side) but would this be also the
result of the bug you mention?
BTW your reply is not in the archives. I guess this is also because of the bug
in the current version you mentioned?
Many thanks for the reply.
Best,
BB
On Fri, Jan 16, 2015 at 3:24 AM, Cheng, Hao
hao.ch...@intel.commailto:hao.ch...@intel.com wrote:
Hi, BB
Ideally you can do the query like: select key, value.percent from
mytable_data lateral view explode(audiences) f as key, value limit 3;
But there is a bug in HiveContext:
https://issues.apache.org/jira/browse/SPARK-5237
I am working on it now, hopefully make a patch soon.
Cheng Hao
-Original Message-
From: BB [mailto:bagme...@gmail.commailto:bagme...@gmail.com]
Sent: Friday, January 16, 2015 12:52 AM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: using hiveContext to select a nested Map-data-type from an
AVROmodel+parquet file
Hi all,
Any help on the following is very much appreciated.
=
Problem:
On a schemaRDD read from a parquet file (data within file uses AVRO model)
using the HiveContext:
I can't figure out how to 'select' or use 'where' clause, to filter rows
on a field that has a Map AVRO-data-type. I want to do a filtering using a
given ('key' : 'value'). How could I do this?
Details:
* the printSchema of the loaded schemaRDD is like so:
-- output snippet -
|-- created: long (nullable = false)
|-- audiences: map (nullable = true)
||-- key: string
||-- value: struct (valueContainsNull = false)
|||-- percent: float (nullable = false)
|||-- cluster: integer (nullable = false)
-
* I dont get a result when I try to select on a specific value of the
'audience' like so:
SELECT created, audiences FROM mytable_data LATERAL VIEW
explode(audiences) adtab AS adcol WHERE audiences['key']=='tg_loh' LIMIT 10
sequence of commands on the spark-shell (a different query and output) is:
-- code snippet -
scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala val parquetFile2 =
hiveContext.parquetFile(/home/myuser/myparquetfile)
scala parquetFile2.registerTempTable(mytable_data)
scala hiveContext.cacheTable(mytable_data)
scala hiveContext.sql(SELECT audiences['key'], audiences['value']
scala FROM
mytable_data LATERAL VIEW explode(audiences) adu AS audien LIMIT
3).collect().foreach(println)
-- output -
[null,null]
[null,null]
[null,null]
gives a list of nulls. I can see that there is data when I just do the
following (output is truncated):
-- code snippet -
scala hiveContext.sql(SELECT audiences FROM mytable_data LATERAL VIEW
explode(audiences) tablealias AS colalias LIMIT
1).collect().foreach(println)
output --
[Map(tg_loh - [0.0,1,Map()], tg_co - [0.0,1,Map(tg_co_petrol - 0.0)],
tg_wall - [0.0,1,Map(tg_wall_poi - 0.0)], ...
Q1) What am I doing wrong?
Q2) How can I use 'where' in the query to filter on specific values?
What works:
Queries with filtering, and selecting on fields that have simple AVRO
data-types, such as long or string works fine.
===
I hope the explanation makes sense. Thanks.
Best,
BB
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/using-hiveContext-to-select-a-nested-Map-data-type-from-an-AVROmodel-parquet-file-tp21168.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: