How to apply schema to queried data from Hive before saving it as parquet file?

akshayhazari Wed, 19 Nov 2014 01:35:36 -0800

The below part of code contains a part which creates a table in hive from
data and and another part below creates a Schema. 
*Now if I try to save the quried data as a parquet file where
hctx.sql("Select * from sparkHive1") returns me a SchemaRDD 
which contains records from table .*
       hctx.sql("Select * from
sparkHive1").saveAsParquetFile("/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP");


*As per the code in the following link  before saving the file as a Parquet
File the sqlContext is applied with a schema. How can I do that(save as
parquet file) when I am using Hive Context to fetch data.*
http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files

Any Help Please.
--------------------------------------------------------------------------------------

       HiveContext hctx= new HiveContext(sctx); //sctx SparkContext
        hctx.sql("Select * from sparkHive1")
        hctx.sql("ADD JAR
/home/hduser/BIGDATA_STUFF/Java_Hive2/hive-json-serde-0.2.jar");
        hctx.sql("Create table if not exists sparkHive1(id INT,name
STRING,score INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.\
JsonSerde'");
        hctx.sql("Load data local inpath
'/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/ip3.json'
into table sparkHive1");

         String schemaString = "id name score";

        List<StructField> fields = new ArrayList<StructField>();
        for (String fieldName: schemaString.split(" ")) {
            if(fieldName.contains("name"))
                fields.add(DataType.createStructField(fieldName,
DataType.StringType, true));
            else
                fields.add(DataType.createStructField(fieldName,
DataType.IntegerType, true));
        }
        StructType schema = DataType.createStructType(fields);
         *//How can I apply the schema before saving as parquet file.*
         hctx.sql("Select * from
sparkHive1").saveAsParquetFile("/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP");
------------------------------------------------------------------------------------------------





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to apply schema to queried data from Hive before saving it as parquet file?

Reply via email to