Re: How to apply schema to queried data from Hive before saving it as parquet file?
I am not very familiar with the JSONSerDe for Hive, but in general you should not need to manually create a schema for data that is loaded from hive. You should just be able to call saveAsParquetFile on any SchemaRDD that is returned from hctx.sql(...). I'd also suggest you check out the jsonFile/jsonRDD methods that are available on HiveContext. On Wed, Nov 19, 2014 at 1:34 AM, akshayhazari akshayhaz...@gmail.com wrote: The below part of code contains a part which creates a table in hive from data and and another part below creates a Schema. *Now if I try to save the quried data as a parquet file where hctx.sql(Select * from sparkHive1) returns me a SchemaRDD which contains records from table .* hctx.sql(Select * from sparkHive1).saveAsParquetFile(/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP); *As per the code in the following link before saving the file as a Parquet File the sqlContext is applied with a schema. How can I do that(save as parquet file) when I am using Hive Context to fetch data.* http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files Any Help Please. -- HiveContext hctx= new HiveContext(sctx); //sctx SparkContext hctx.sql(Select * from sparkHive1) hctx.sql(ADD JAR /home/hduser/BIGDATA_STUFF/Java_Hive2/hive-json-serde-0.2.jar); hctx.sql(Create table if not exists sparkHive1(id INT,name STRING,score INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.\ JsonSerde'); hctx.sql(Load data local inpath '/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/ip3.json' into table sparkHive1); String schemaString = id name score; ListStructField fields = new ArrayListStructField(); for (String fieldName: schemaString.split( )) { if(fieldName.contains(name)) fields.add(DataType.createStructField(fieldName, DataType.StringType, true)); else fields.add(DataType.createStructField(fieldName, DataType.IntegerType, true)); } StructType schema = DataType.createStructType(fields); *//How can I apply the schema before saving as parquet file.* hctx.sql(Select * from sparkHive1).saveAsParquetFile(/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP); -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to apply schema to queried data from Hive before saving it as parquet file?
Thanks for replying .I was unable to figure out how after I use jsonFile/jsonRDD be able to load data into a hive table. Also I was able to save the SchemaRDD I got via hiveContext.sql(...).saveAsParquetFile(Path) ie. save schemardd as parquetfile but when I tried to fetch data from parquet file back like so(below) and save data back to a text file i Got some weird values like org.apache.spark.sql.api.java.Row@e26c01c7 in the text files generated as output :-- JavaSchemaRDD parquetfilerdd=sqlContext.parquetFile(path/to/parquet/File); parquetfilerdd.registerTempTable(pq); JavaSchemaRDD writetxt=sqlCtx.sql(Select * from pq); writetxt.saveAsTextFile(Path/To/Text/Files); // This step created text files which was filled with values likeorg.apache.spark.sql.api.java.Row@e26c01c7 I know there must be something which could do it right, just that I haven't been able to figure out all the while. Could you please help . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19338.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to apply schema to queried data from Hive before saving it as parquet file?
Sorry about the confusion I created . I just have started learning this week. Silly me, I was actually writing the schema to a txt file and expecting records. This is what I was supposed to do. Also if you could let me know about adding the data from jsonFile/jsonRDD methods of hiveContext to hive tables it will be appreciated. JavaRDDString result=writetxt.map(new FunctionRow, String() { public String call(Row row) { String temp=; temp+=(row.getInt(0))+ ; temp+=row.getString(1)+ ; temp+=(row.getInt(2)); return temp; } }); result.saveAsTextFile(pqtotxt); -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19343.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to apply schema to queried data from Hive before saving it as parquet file?
You can save the results as parquet file or as text file and created a hive table based on these files Daniel On 20 בנוב׳ 2014, at 08:01, akshayhazari akshayhaz...@gmail.com wrote: Sorry about the confusion I created . I just have started learning this week. Silly me, I was actually writing the schema to a txt file and expecting records. This is what I was supposed to do. Also if you could let me know about adding the data from jsonFile/jsonRDD methods of hiveContext to hive tables it will be appreciated. JavaRDDString result=writetxt.map(new FunctionRow, String() { public String call(Row row) { String temp=; temp+=(row.getInt(0))+ ; temp+=row.getString(1)+ ; temp+=(row.getInt(2)); return temp; } }); result.saveAsTextFile(pqtotxt); -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19343.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org