Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Sorry about the confusion I created . I just have started learning this week. Silly me, I was actually writing the schema to a txt file and expecting records. This is what I was supposed to do. Also if you could let me know about adding the data from jsonFile/jsonRDD methods of hiveContext to hive

Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Thanks for replying .I was unable to figure out how after I use jsonFile/jsonRDD be able to load data into a hive table. Also I was able to save the SchemaRDD I got via hiveContext.sql(...).saveAsParquetFile(Path) ie. save schemardd as parquetfile but when I tried to fetch data from parquet file ba

How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
The below part of code contains a part which creates a table in hive from data and and another part below creates a Schema. *Now if I try to save the quried data as a parquet file where hctx.sql("Select * from sparkHive1") returns me a SchemaRDD which contains records from table .* hctx.sq

Re: Building Spark for Hive The requested profile "hadoop-1.2" could not be activated because it does not exist.

2014-11-17 Thread akshayhazari
Oops , I guess , this is the right way to do it mvn -Phive -Dhadoop.version=1.2.1 clean -DskipTests package -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Building-Spark-for-Hive-The-requested-profile-hadoop-1-2-could-not-be-activated-because-it-does-not--

Building Spark for Hive The requested profile "hadoop-1.2" could not be activated because it does not exist.

2014-11-17 Thread akshayhazari
I am using Apache Hadoop 1.2.1 . I wanted to use Spark Sql with Hive. So I tried to build Spark like so . > mvn -Phive,hadoop-1.2 -Dhadoop.version=1.2.1 clean -DskipTests package But I get the following error. The requested profile "hadoop-1.2" could not be activated because it does not ex

Query from two or more tables Spark Sql .I have done this . Is there any simpler solution.

2014-11-12 Thread akshayhazari
As of now my approach is to fetch all data from tables located in different databases in separate RDD's and then make a union of them and then query on them together. I want to know whether I can perform a query on it directly along with creating an RDD. i.e. Instead of creating two RDDs , firing a

Combining data from two tables in two databases postgresql, JdbcRDD.

2014-11-11 Thread akshayhazari
I want to be able to perform a query on two tables in different databases. I want to know whether it can be done. I've heard about union of two RDD's but here I want to connect to something like different partitions of a table. Any help is appreciated import java.io.Serializable; //import org.ju

Mysql retrieval and storage using JdbcRDD

2014-11-10 Thread akshayhazari
So far I have tried this and I am able to compile it successfully . There isn't enough documentation on spark for its usage with databases. I am using AbstractFunction0 and AbsctractFunction1 here. I am unable to access the database. The jar just runs without doing anything when submitted. I want t