Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread Michael Armbrust
I am not very familiar with the JSONSerDe for Hive, but in general you
should not need to manually create a schema for data that is loaded from
hive.  You should just be able to call saveAsParquetFile on any SchemaRDD
that is returned from hctx.sql(...).

I'd also suggest you check out the jsonFile/jsonRDD methods that are
available on HiveContext.

On Wed, Nov 19, 2014 at 1:34 AM, akshayhazari akshayhaz...@gmail.com
wrote:

 The below part of code contains a part which creates a table in hive from
 data and and another part below creates a Schema.
 *Now if I try to save the quried data as a parquet file where
 hctx.sql(Select * from sparkHive1) returns me a SchemaRDD
 which contains records from table .*
hctx.sql(Select * from

 sparkHive1).saveAsParquetFile(/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP);

 *As per the code in the following link  before saving the file as a Parquet
 File the sqlContext is applied with a schema. How can I do that(save as
 parquet file) when I am using Hive Context to fetch data.*

 http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files

 Any Help Please.

 --

HiveContext hctx= new HiveContext(sctx); //sctx SparkContext
 hctx.sql(Select * from sparkHive1)
 hctx.sql(ADD JAR
 /home/hduser/BIGDATA_STUFF/Java_Hive2/hive-json-serde-0.2.jar);
 hctx.sql(Create table if not exists sparkHive1(id INT,name
 STRING,score INT) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.\
 JsonSerde');
 hctx.sql(Load data local inpath

 '/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/ip3.json'
 into table sparkHive1);

  String schemaString = id name score;

 ListStructField fields = new ArrayListStructField();
 for (String fieldName: schemaString.split( )) {
 if(fieldName.contains(name))
 fields.add(DataType.createStructField(fieldName,
 DataType.StringType, true));
 else
 fields.add(DataType.createStructField(fieldName,
 DataType.IntegerType, true));
 }
 StructType schema = DataType.createStructType(fields);
  *//How can I apply the schema before saving as parquet file.*
  hctx.sql(Select * from

 sparkHive1).saveAsParquetFile(/home/hduser/Documents/Credentials/Newest_Credentials_AX/Songs/spark-1.1.0/HiveOP);

 





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Thanks for replying .I was unable to figure out how after I use
jsonFile/jsonRDD be able to load data into a hive table. Also I was able to
save the SchemaRDD I got via hiveContext.sql(...).saveAsParquetFile(Path)
ie. save schemardd as parquetfile but when I tried to fetch data from
parquet file back like so(below) and save data back to a text file i Got
some weird values like org.apache.spark.sql.api.java.Row@e26c01c7 in the
text files generated as output :--

 JavaSchemaRDD
parquetfilerdd=sqlContext.parquetFile(path/to/parquet/File);
parquetfilerdd.registerTempTable(pq);
JavaSchemaRDD writetxt=sqlCtx.sql(Select * from pq);  
writetxt.saveAsTextFile(Path/To/Text/Files);  // This step created
text files which was filled with values
likeorg.apache.spark.sql.api.java.Row@e26c01c7

 I know there must be something which could do it right, just that I haven't
been able to figure out all the while. Could you please help .
   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19338.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Sorry about the confusion I created . I just have started learning this week.
Silly me, I was actually writing the schema to a txt file and expecting
records. This is what I was supposed to do. Also if you could let me know
about adding the data from jsonFile/jsonRDD methods of hiveContext to hive
tables it will be appreciated. 

JavaRDDString result=writetxt.map(new FunctionRow, String() {

public String call(Row row) {
String temp=;
temp+=(row.getInt(0))+ ;
temp+=row.getString(1)+ ;
temp+=(row.getInt(2));
return temp;
}
});
result.saveAsTextFile(pqtotxt);



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19343.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread Daniel Haviv
You can save the results as parquet file or as text file and created a hive 
table based on these files 

Daniel

 On 20 בנוב׳ 2014, at 08:01, akshayhazari akshayhaz...@gmail.com wrote:
 
 Sorry about the confusion I created . I just have started learning this week.
 Silly me, I was actually writing the schema to a txt file and expecting
 records. This is what I was supposed to do. Also if you could let me know
 about adding the data from jsonFile/jsonRDD methods of hiveContext to hive
 tables it will be appreciated. 
 
 JavaRDDString result=writetxt.map(new FunctionRow, String() {
 
public String call(Row row) {
String temp=;
temp+=(row.getInt(0))+ ;
temp+=row.getString(1)+ ;
temp+=(row.getInt(2));
return temp;
}
});
result.saveAsTextFile(pqtotxt);
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19343.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org