I am not sure about the exact issue bjt i see you are partioning while writing from spark. Did you tried msck repair on the table before reading it in hive ?
Thanks Deepak On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" <ravishankar.n...@gmail.com> wrote: > All, > > Thanks for the inputs. Again I am not successful. I think, we need to > resolve this, as this is a very common requirement. > > Please go through my complete code: > > STEP 1: Started Spark shell as spark-shell --master yarn > > STEP 2: Flowing code is being given as inout to shark shell > > import org.apache.spark.sql.Row > import org.apache.spark.sql.SparkSession > val warehouseLocation ="/user/hive/warehouse" > > val spark = SparkSession.builder().appName("Spark Hive > Example").config("spark.sql.warehouse.dir", warehouseLocation). > enableHiveSupport().getOrCreate() > > import org.apache.spark.sql._ > var passion_df = spark.read. > format("jdbc"). > option("url", "jdbc:mysql://localhost:3307/policies"). > option("driver" ,"com.mysql.jdbc.Driver"). > option("user", "root"). > option("password", "root"). > option("dbtable", "insurancedetails"). > option("partitionColumn", "policyid"). > option("lowerBound", "1"). > option("upperBound", "100000"). > option("numPartitions", "4"). > load() > //Made sure that passion_df is created, as passion_df.show(5) shows me > correct data. > passion_df.write.saveAsTable("default.mine") //Default parquet > > STEP 3: Went to HIVE. Started HIVE prompt. > > hive> show tables; > OK > callcentervoicelogs > mine > Time taken: 0.035 seconds, Fetched: 2 row(s) > //As you can see HIVE is showing the table "mine" in default schema. > > STEP 4: HERE IS THE PROBLEM. > > hive> select * from mine; > OK > Time taken: 0.354 seconds > hive> > //Where is the data ??? > > STEP 5: > > See the below command on HIVE > > describe formatted mine; > OK > # col_name data_type comment > > policyid int > statecode string > socialid string > county string > eq_site_limit decimal(10,2) > hu_site_limit decimal(10,2) > fl_site_limit decimal(10,2) > fr_site_limit decimal(10,2) > tiv_2014 decimal(10,2) > tiv_2015 decimal(10,2) > eq_site_deductible int > hu_site_deductible int > fl_site_deductible int > fr_site_deductible int > latitude decimal(6,6) > longitude decimal(6,6) > line string > construction string > point_granularity int > > # Detailed Table Information > Database: default > Owner: ravishankarnair > CreateTime: Sun Feb 11 00:26:40 EST 2018 > LastAccessTime: UNKNOWN > Protect Mode: None > Retention: 0 > Location: file:/Users/ravishankarnair/spark-warehouse/mine > Table Type: MANAGED_TABLE > Table Parameters: > spark.sql.sources.provider parquet > spark.sql.sources.schema.numParts 1 > spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\ > ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\ > ":true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\" > name\":\"statecode\",\"type\":\"string\",\"nullable\":true,\ > "metadata\":{\"name\":\"statecode\",\"scale\":0}},{\" > name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\" > metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\" > name\":\"county\",\"type\":\"string\",\"nullable\":true,\" > metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\" > eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\": > true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}}, > {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\" > nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\" > scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\" > decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\" > fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\" > type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{ > \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_ > 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\" > metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\" > :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\": > true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\" > name\":\"eq_site_deductible\",\"type\":\"integer\",\" > nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\ > ",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\": > \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_ > site_deductible\",\"scale\":0}},{\"name\":\"fl_site_ > deductible\",\"type\":\"integer\",\"nullable\":true,\" > metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0} > },{\"name\":\"fr_site_deductible\",\"type\":\" > integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_ > site_deductible\",\"scale\":0}},{\"name\":\"latitude\",\" > type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{ > \"name\":\"latitude\",\"scale\":6}},{\"name\":\"longitude\", > \"type\":\"decimal(6,6)\",\"nullable\":true,\"metadata\":{ > \"name\":\"longitude\",\"scale\":6}},{\"name\":\"line\" > ,\"type\":\"string\",\"nullable\":true,\"metadata\":{ > \"name\":\"line\",\"scale\":0}},{\"name\":\"construction\",\ > "type\":\"string\",\"nullable\":true,\"metadata\":{\"name\": > \"construction\",\"scale\":0}},{\"name\":\"point_granularity\",\"type\":\" > integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_ > granularity\",\"scale\":0}}]} > transient_lastDdlTime 1518326800 > > # Storage Information > SerDe Library: org.apache.hadoop.hive.ql.io. > parquet.serde.ParquetHiveSerDe > InputFormat: org.apache.hadoop.hive.ql.io.parquet. > MapredParquetInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.parquet. > MapredParquetOutputFormat > Compressed: No > Num Buckets: -1 > Bucket Columns: [] > Sort Columns: [] > Storage Desc Params: > path hdfs://localhost:8020/user/hive/warehouse/mine > serialization.format 1 > Time taken: 0.077 seconds, Fetched: 48 row(s) > > Now, I see your advise and support. Whats the issue? Am I doing wrong, it > it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP 2.7.3. All class > path, configuration are set properly. > > Best, > > Ravion > > On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian <nicholas.hakobian@ > rallyhealth.com> wrote: > >> Its possible that the format of your table is not compatible with your >> version of hive, so Spark saved it in a way such that only Spark can read >> it. When this happens it prints out a very visible warning letting you know >> this has happened. >> >> We've seen it most frequently when trying to save a parquet file with a >> column in date format into a Hive table. In older versions of hive, its >> parquet reader/writer did not support Date formats (among a couple others). >> >> Nicholas Szandor Hakobian, Ph.D. >> Staff Data Scientist >> Rally Health >> nicholas.hakob...@rallyhealth.com >> >> >> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi <prakashcjos...@gmail.com> >> wrote: >> >>> Ravi, >>> >>> Can you send the result of >>> Show create table your_table_name >>> >>> Thanks >>> Prakash >>> >>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)" < >>> ravishankar.n...@gmail.com> wrote: >>> >>>> All, >>>> >>>> It has been three days continuously I am on this issue. Not getting any >>>> clue. >>>> >>>> Environment: Spark 2.2.x, all configurations are correct. hive-site.xml >>>> is in spark's conf. >>>> >>>> 1) Step 1: I created a data frame DF1 reading a csv file. >>>> >>>> 2) Did manipulations on DF1. Resulting frame is passion_df. >>>> >>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion") >>>> >>>> 4) The metastore shows the hive table., when I do "show tables" in >>>> HIVE, I can see table name >>>> >>>> 5) I can't select in HIVE, though I can select from SPARK as >>>> spark.sql("select * from sampledb.passion") >>>> >>>> Whats going on here? Please help. Why I am not seeing data from HIVE >>>> prompt? >>>> The "describe formatted " command on the table in HIVE shows he data is >>>> is in default warehouse location ( /user/hive/warehouse) since I set it. >>>> >>>> I am not getting any definite answer anywhere. Many suggestions and >>>> answers given in Stackoverflow et al.Nothing really works. >>>> >>>> So asking experts here for some light on this, thanks >>>> >>>> Best, >>>> Ravion >>>> >>>> >>>> >> > > > -- > >