Hive in IntelliJ

2015-05-19 Thread Heisenberg Bb
I was trying to implement this example:
http://spark.apache.org/docs/1.3.1/sql-programming-guide.html#hive-tables

It worked well when I built spark in terminal using command specified:
http://spark.apache.org/docs/1.3.1/building-spark.html#building-with-hive-and-jdbc-support

But when I try to implement in IntelliJ, following the specifications
specified:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IntelliJ

It throws the error:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
console:21: error: object hive is not a member of package
org.apache.spark.sql

Can any one help me get through this issue.

Regards
Akhil


Building Spark

2015-05-13 Thread Heisenberg Bb
I tried to build Spark in my local machine Ubuntu 14.04 ( 4 GB Ram), my
system  is getting hanged (freezed). When I monitered system processes, the
build process is found to consume 85% of my memory. Why does it need lot of
resources. Is there any efficient method to build Spark.

Thanks
Akhil


Re: using hiveContext to select a nested Map-data-type from an AVROmodel+parquet file

2015-01-19 Thread BB
I am quoting the reply I got on this - which for some reason did not get
posted here. The suggestion in the reply below worked perfectly for me. The
error mentioned in the reply is not related (or old).
 Hope this is helpful to someone.
Cheers,
BB


 Hi, BB
Ideally you can do the query like: select key, value.percent from
 mytable_data lateral view explode(audiences) f as key, value limit 3;
But there is a bug in HiveContext:
 https://issues.apache.org/jira/browse/SPARK-5237
I am working on it now, hopefully make a patch soon.
 
 Cheng Hao





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-hiveContext-to-select-a-nested-Map-data-type-from-an-AVROmodel-parquet-file-tp21168p21231.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



using hiveContext to select a nested Map-data-type from an AVROmodel+parquet file

2015-01-15 Thread BB
Hi all,
  Any help on the following is very much appreciated.
=
Problem:
  On a schemaRDD read from a parquet file (data within file uses AVRO model)
using the HiveContext:
 I can't figure out how to 'select' or use 'where' clause, to filter
rows on a field that has a Map AVRO-data-type. I want to do a filtering
using a given ('key' : 'value'). How could I do this?

Details:
* the printSchema of the loaded schemaRDD is like so:

-- output snippet -
|-- created: long (nullable = false)
|-- audiences: map (nullable = true)
||-- key: string
||-- value: struct (valueContainsNull = false)
|||-- percent: float (nullable = false)
|||-- cluster: integer (nullable = false)
- 

* I dont get a result when I try to select on a specific value of the
'audience' like so:
 
  SELECT created, audiences FROM mytable_data LATERAL VIEW
explode(audiences) adtab AS adcol WHERE audiences['key']=='tg_loh' LIMIT 10

 sequence of commands on the spark-shell (a different query and output) is:

-- code snippet -
scala val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala val parquetFile2 =
hiveContext.parquetFile(/home/myuser/myparquetfile)
scala parquetFile2.registerTempTable(mytable_data)
scala hiveContext.cacheTable(mytable_data)

scala hiveContext.sql(SELECT  audiences['key'], audiences['value'] FROM
mytable_data LATERAL VIEW explode(audiences) adu AS audien LIMIT
3).collect().foreach(println)

-- output -
[null,null]
[null,null]
[null,null]


gives a list of nulls. I can see that there is data when I just do the
following (output is truncated):

-- code snippet -
scala hiveContext.sql(SELECT audiences FROM mytable_data LATERAL VIEW
explode(audiences) tablealias AS colalias LIMIT
1).collect().foreach(println)

 output --
[Map(tg_loh - [0.0,1,Map()], tg_co - [0.0,1,Map(tg_co_petrol - 0.0)],
tg_wall - [0.0,1,Map(tg_wall_poi - 0.0)],  ...


Q1) What am I doing wrong?
Q2) How can I use 'where' in the query to filter on specific values?

What works:
   Queries with filtering, and selecting on fields that have simple AVRO
data-types, such as long or string works fine.

===

 I hope the explanation makes sense. Thanks.
Best,
BB



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-hiveContext-to-select-a-nested-Map-data-type-from-an-AVROmodel-parquet-file-tp21168.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org