Hi Ravi, can you please post the entire code?
Regards, Gourav On Fri, Feb 9, 2018 at 3:39 PM, Patrick Alwell <palw...@hortonworks.com> wrote: > Might sound silly, but are you using a Hive context? > > What errors do the Hive query results return? > > > > spark = SparkSession.builder.enableHiveSupport().getOrCreate() > > > > The second part of your questions, you are creating a temp table and then > subsequently creating another table from that temp view. Doesn’t seem like > you are reading the table from the spark or hive warehouse. > > > > This works fine for me; albeit I was using spark thrift to communicate > with my directory of choice. > > > > *from pyspark import SparkContext* > > *from pyspark.sql import SparkSession, Row, types* > > *from pyspark.sql.types import ** > > *from pyspark.sql import functions as f* > > *from decimal import ** > > *from datetime import datetime* > > > > *# instantiate our sparkSession and context* > > *spark = SparkSession.builder.enableHiveSupport().getOrCreate()* > > *sc = spark.sparkContext* > > > > *# Generating customer orc table files* > > *# load raw data as an RDD* > > *customer_data = sc.textFile("/data/tpch/customer.tbl")* > > *# map the data into an RDD split with pipe delimitations* > > *customer_split = customer_data.map(lambda l: l.split("|"))* > > *# map the split data with a row method; this is where we specificy column > names and types* > > *# default type is string- UTF8* > > *# there are issues with converting string to date and these issues have > been addressed* > > *# in those tables with dates: See notes below* > > *customer_row = customer_split.map( lambda r: Row(* > > * custkey=long(r[0]),* > > * name=r[1],* > > * address=r[2],* > > * nationkey=long(r[3]),* > > *phone=r[4],* > > * acctbal=Decimal(r[5]),* > > *mktsegment=r[6],* > > * comment=r[7]* > > *))* > > > > *# we can have Spark infer the schema, or apply a strict schema and > identify whether or not we want null values* > > *# in this case we don't want null values for keys; and we want explicit > data types to support the TPCH tables/ data model* > > *customer_schema = types.StructType([* > > * types.StructField('custkey',types.LongType(),False)* > > * ,types.StructField('name',types.StringType())* > > * ,types.StructField('address',types.StringType())* > > * ,types.StructField('nationkey',types.LongType(),False)* > > * ,types.StructField('phone',types.StringType())* > > * ,types.StructField('acctbal',types.DecimalType())* > > * ,types.StructField('mktsegment',types.StringType())* > > * ,types.StructField('comment',types.StringType())])* > > > > *# we create a dataframe object by referencing our sparkSession class and > the createDataFrame method* > > *# this method takes two arguments by default (row, schema)* > > *customer_df = spark.createDataFrame(customer_row,customer_schema)* > > > > *# we can now write a file of type orc by referencing our dataframe object > we created* > > *customer_df.write.orc("/data/tpch/customer.orc")* > > > > *# read that same file we created but create a seperate dataframe object* > > *customer_df_orc = spark.read.orc("/data/tpch/customer.orc")* > > > > *# reference the newly created dataframe object and create a tempView for > QA purposes* > > *customer_df_orc.createOrReplaceTempView("customer")* > > > > *# reference the sparkSession class and SQL method in order to issue SQL > statements to the materialized view* > > *spark.sql("SELECT * FROM customer LIMIT 10").show()* > > > > *From: *"☼ R Nair (रविशंकर नायर)" <ravishankar.n...@gmail.com> > *Date: *Friday, February 9, 2018 at 7:03 AM > *To: *"user @spark/'user @spark'/spark users/user@spark" < > user@spark.apache.org> > *Subject: *Re: Spark Dataframe and HIVE > > > > An update: (Sorry I missed) > > > > When I do > > > > passion_df.createOrReplaceTempView("sampleview") > > > > spark.sql("create table sample table as select * from sample view") > > > > Now, I can see table and can query as well. > > > > So why this do work from Spark and other method discussed below is not? > > > > Thanks > > > > > > > > On Fri, Feb 9, 2018 at 9:49 AM, ☼ R Nair (रविशंकर नायर) < > ravishankar.n...@gmail.com> wrote: > > All, > > > > It has been three days continuously I am on this issue. Not getting any > clue. > > > > Environment: Spark 2.2.x, all configurations are correct. hive-site.xml is > in spark's conf. > > > > 1) Step 1: I created a data frame DF1 reading a csv file. > > > > 2) Did manipulations on DF1. Resulting frame is passion_df. > > > > 3) passion_df.write.format("orc").saveAsTable("sampledb.passion") > > > > 4) The metastore shows the hive table., when I do "show tables" in HIVE, I > can see table name > > > > 5) I can't select in HIVE, though I can select from SPARK as > spark.sql("select * from sampledb.passion") > > > > Whats going on here? Please help. Why I am not seeing data from HIVE > prompt? > > The "describe formatted " command on the table in HIVE shows he data is is > in default warehouse location ( /user/hive/warehouse) since I set it. > > > > I am not getting any definite answer anywhere. Many suggestions and > answers given in Stackoverflow et al.Nothing really works. > > > > So asking experts here for some light on this, thanks > > > > Best, > > Ravion > > > > > > > > > > -- > > [image: mage removed by sender.] >