Why did it matter? So, are you saying Spark SQL cannot create tables in HIVE?
If I need to use HiveCOntext, how should I change my code? Best, On Sun, Feb 11, 2018 at 3:09 AM, Deepak Sharma <deepakmc...@gmail.com> wrote: > I think this is the problem here. > You created the table using the spark sql and not the hive sql context. > > Thanks > Deepak > > On Sun, Feb 11, 2018 at 1:36 PM, Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> simple question have you created the table through spark sql or hive? >> >> I recall similar issues a while back. >> >> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >> //val sqlContext = new HiveContext(sc) >> println ("\nStarted at"); spark.sql("SELECT FROM_unixtime(unix_timestamp(), >> 'dd/MM/yyyy HH:mm:ss.ss') ").collect.foreach(println) >> // >> // Need to create and populate target ORC table sales in database test in >> Hive >> // >> HiveContext.sql("use test") >> // >> // Drop and create table >> // >> HiveContext.sql("DROP TABLE IF EXISTS test.dummy2") >> var sqltext = "" >> sqltext = """ >> CREATE TABLE test.dummy2 >> ( >> ID INT >> , CLUSTERED INT >> , SCATTERED INT >> , RANDOMISED INT >> , RANDOM_STRING VARCHAR(50) >> , SMALL_VC VARCHAR(10) >> , PADDING VARCHAR(10) >> ) >> --CLUSTERED BY (ID) INTO 256 BUCKETS >> STORED AS ORC >> TBLPROPERTIES ( "orc.compress"="SNAPPY", >> "orc.create.index"="true", >> "orc.bloom.filter.columns"="ID", >> "orc.bloom.filter.fpp"="0.05", >> "orc.stripe.size"="268435456", >> "orc.row.index.stride"="10000" ) >> """ >> HiveContext.sql(sqltext) >> // >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 11 February 2018 at 07:54, ☼ R Nair (रविशंकर नायर) < >> ravishankar.n...@gmail.com> wrote: >> >>> Hi, >>> Here you go: >>> >>> hive> show create table mine; >>> OK >>> CREATE TABLE `mine`( >>> `policyid` int, >>> `statecode` string, >>> `socialid` string, >>> `county` string, >>> `eq_site_limit` decimal(10,2), >>> `hu_site_limit` decimal(10,2), >>> `fl_site_limit` decimal(10,2), >>> `fr_site_limit` decimal(10,2), >>> `tiv_2014` decimal(10,2), >>> `tiv_2015` decimal(10,2), >>> `eq_site_deductible` int, >>> `hu_site_deductible` int, >>> `fl_site_deductible` int, >>> `fr_site_deductible` int, >>> `latitude` decimal(6,6), >>> `longitude` decimal(6,6), >>> `line` string, >>> `construction` string, >>> `point_granularity` int) >>> ROW FORMAT SERDE >>> 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' >>> WITH SERDEPROPERTIES ( >>> 'path'='hdfs://localhost:8020/user/hive/warehouse/mine') >>> STORED AS INPUTFORMAT >>> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' >>> OUTPUTFORMAT >>> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' >>> LOCATION >>> 'file:/Users/ravishankarnair/spark-warehouse/mine' >>> TBLPROPERTIES ( >>> 'spark.sql.sources.provider'='parquet', >>> 'spark.sql.sources.schema.numParts'='1', >>> 'spark.sql.sources.schema.part.0'='{\"type\":\"struct\",\"fi >>> elds\":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullab >>> le\":true,\"metadata\":{\"name\":\"policyid\",\"scale\": >>> 0}},{\"name\":\"statecode\",\"type\":\"string\",\"nullable\" >>> :true,\"metadata\":{\"name\":\"statecode\",\"scale\":0}},{\" >>> name\":\"Socialid\",\"type\":\"string\",\"nullable\":true,\" >>> metadata\":{\"name\":\"Socialid\",\"scale\":0}},{\" >>> name\":\"county\",\"type\":\"string\",\"nullable\":true,\" >>> metadata\":{\"name\":\"county\",\"scale\":0}},{\"name\":\" >>> eq_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\": >>> true,\"metadata\":{\"name\":\"eq_site_limit\",\"scale\":2}}, >>> {\"name\":\"hu_site_limit\",\"type\":\"decimal(10,2)\",\" >>> nullable\":true,\"metadata\":{\"name\":\"hu_site_limit\",\" >>> scale\":2}},{\"name\":\"fl_site_limit\",\"type\":\" >>> decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\":\" >>> fl_site_limit\",\"scale\":2}},{\"name\":\"fr_site_limit\",\" >>> type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{ >>> \"name\":\"fr_site_limit\",\"scale\":2}},{\"name\":\"tiv_ >>> 2014\",\"type\":\"decimal(10,2)\",\"nullable\":true,\" >>> metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"name\" >>> :\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\": >>> true,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\" >>> name\":\"eq_site_deductible\",\"type\":\"integer\",\" >>> nullable\":true,\"metadata\":{\"name\":\"eq_site_deductible\ >>> ",\"scale\":0}},{\"name\":\"hu_site_deductible\",\"type\": >>> \"integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_ >>> site_deductible\",\"scale\":0}},{\"name\":\"fl_site_ >>> deductible\",\"type\":\"integer\",\"nullable\":true,\" >>> metadata\":{\"name\":\"fl_site_deductible\",\"scale\":0} >>> },{\"name\":\"fr_site_deductible\",\"type\":\" >>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"fr_site >>> _deductible\",\"scale\":0}},{\"name\":\"latitude\",\"type\": >>> \"decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\" >>> latitude\",\"scale\":6}},{\"name\":\"longitude\",\"type\":\" >>> decimal(6,6)\",\"nullable\":true,\"metadata\":{\"name\":\"lo >>> ngitude\",\"scale\":6}},{\"name\":\"line\",\"type\":\"string >>> \",\"nullable\":true,\"metadata\":{\"name\":\"line\",\" >>> scale\":0}},{\"name\":\"construction\",\"type\":\"string\",\ >>> "nullable\":true,\"metadata\":{\"name\":\"construction\",\" >>> scale\":0}},{\"name\":\"point_granularity\",\"type\":\" >>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"point_ >>> granularity\",\"scale\":0}}]}', >>> 'transient_lastDdlTime'='1518335598') >>> Time taken: 0.13 seconds, Fetched: 35 row(s) >>> >>> On Sun, Feb 11, 2018 at 2:36 AM, Shmuel Blitz < >>> shmuel.bl...@similarweb.com> wrote: >>> >>>> Please run the following command, and paste the result: >>>> SHOW CREATE TABLE <<TABLE-NAME>> >>>> >>>> On Sun, Feb 11, 2018 at 7:56 AM, ☼ R Nair (रविशंकर नायर) < >>>> ravishankar.n...@gmail.com> wrote: >>>> >>>>> No, No luck. >>>>> >>>>> Thanks >>>>> >>>>> On Sun, Feb 11, 2018 at 12:48 AM, Deepak Sharma <deepakmc...@gmail.com >>>>> > wrote: >>>>> >>>>>> In hive cli: >>>>>> msck repair table 《table_name》; >>>>>> >>>>>> Thanks >>>>>> Deepak >>>>>> >>>>>> On Feb 11, 2018 11:14, "☼ R Nair (रविशंकर नायर)" < >>>>>> ravishankar.n...@gmail.com> wrote: >>>>>> >>>>>>> NO, can you pease explain the command ? Let me try now. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> On Sun, Feb 11, 2018 at 12:40 AM, Deepak Sharma < >>>>>>> deepakmc...@gmail.com> wrote: >>>>>>> >>>>>>>> I am not sure about the exact issue bjt i see you are partioning >>>>>>>> while writing from spark. >>>>>>>> Did you tried msck repair on the table before reading it in hive ? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Deepak >>>>>>>> >>>>>>>> On Feb 11, 2018 11:06, "☼ R Nair (रविशंकर नायर)" < >>>>>>>> ravishankar.n...@gmail.com> wrote: >>>>>>>> >>>>>>>>> All, >>>>>>>>> >>>>>>>>> Thanks for the inputs. Again I am not successful. I think, we need >>>>>>>>> to resolve this, as this is a very common requirement. >>>>>>>>> >>>>>>>>> Please go through my complete code: >>>>>>>>> >>>>>>>>> STEP 1: Started Spark shell as spark-shell --master yarn >>>>>>>>> >>>>>>>>> STEP 2: Flowing code is being given as inout to shark shell >>>>>>>>> >>>>>>>>> import org.apache.spark.sql.Row >>>>>>>>> import org.apache.spark.sql.SparkSession >>>>>>>>> val warehouseLocation ="/user/hive/warehouse" >>>>>>>>> >>>>>>>>> val spark = SparkSession.builder().appName("Spark Hive >>>>>>>>> Example").config("spark.sql.warehouse.dir", >>>>>>>>> warehouseLocation).enableHiveSupport().getOrCreate() >>>>>>>>> >>>>>>>>> import org.apache.spark.sql._ >>>>>>>>> var passion_df = spark.read. >>>>>>>>> format("jdbc"). >>>>>>>>> option("url", "jdbc:mysql://localhost:3307/policies"). >>>>>>>>> option("driver" ,"com.mysql.jdbc.Driver"). >>>>>>>>> option("user", "root"). >>>>>>>>> option("password", "root"). >>>>>>>>> option("dbtable", "insurancedetails"). >>>>>>>>> option("partitionColumn", "policyid"). >>>>>>>>> option("lowerBound", "1"). >>>>>>>>> option("upperBound", "100000"). >>>>>>>>> option("numPartitions", "4"). >>>>>>>>> load() >>>>>>>>> //Made sure that passion_df is created, as passion_df.show(5) >>>>>>>>> shows me correct data. >>>>>>>>> passion_df.write.saveAsTable("default.mine") //Default parquet >>>>>>>>> >>>>>>>>> STEP 3: Went to HIVE. Started HIVE prompt. >>>>>>>>> >>>>>>>>> hive> show tables; >>>>>>>>> OK >>>>>>>>> callcentervoicelogs >>>>>>>>> mine >>>>>>>>> Time taken: 0.035 seconds, Fetched: 2 row(s) >>>>>>>>> //As you can see HIVE is showing the table "mine" in default >>>>>>>>> schema. >>>>>>>>> >>>>>>>>> STEP 4: HERE IS THE PROBLEM. >>>>>>>>> >>>>>>>>> hive> select * from mine; >>>>>>>>> OK >>>>>>>>> Time taken: 0.354 seconds >>>>>>>>> hive> >>>>>>>>> //Where is the data ??? >>>>>>>>> >>>>>>>>> STEP 5: >>>>>>>>> >>>>>>>>> See the below command on HIVE >>>>>>>>> >>>>>>>>> describe formatted mine; >>>>>>>>> OK >>>>>>>>> # col_name data_type comment >>>>>>>>> >>>>>>>>> policyid int >>>>>>>>> statecode string >>>>>>>>> socialid string >>>>>>>>> county string >>>>>>>>> eq_site_limit decimal(10,2) >>>>>>>>> hu_site_limit decimal(10,2) >>>>>>>>> fl_site_limit decimal(10,2) >>>>>>>>> fr_site_limit decimal(10,2) >>>>>>>>> tiv_2014 decimal(10,2) >>>>>>>>> tiv_2015 decimal(10,2) >>>>>>>>> eq_site_deductible int >>>>>>>>> hu_site_deductible int >>>>>>>>> fl_site_deductible int >>>>>>>>> fr_site_deductible int >>>>>>>>> latitude decimal(6,6) >>>>>>>>> longitude decimal(6,6) >>>>>>>>> line string >>>>>>>>> construction string >>>>>>>>> point_granularity int >>>>>>>>> >>>>>>>>> # Detailed Table Information >>>>>>>>> Database: default >>>>>>>>> Owner: ravishankarnair >>>>>>>>> CreateTime: Sun Feb 11 00:26:40 EST 2018 >>>>>>>>> LastAccessTime: UNKNOWN >>>>>>>>> Protect Mode: None >>>>>>>>> Retention: 0 >>>>>>>>> Location: file:/Users/ravishankarnair/sp >>>>>>>>> ark-warehouse/mine >>>>>>>>> Table Type: MANAGED_TABLE >>>>>>>>> Table Parameters: >>>>>>>>> spark.sql.sources.provider parquet >>>>>>>>> spark.sql.sources.schema.numParts 1 >>>>>>>>> spark.sql.sources.schema.part.0 {\"type\":\"struct\",\"fields\ >>>>>>>>> ":[{\"name\":\"policyid\",\"type\":\"integer\",\"nullable\": >>>>>>>>> true,\"metadata\":{\"name\":\"policyid\",\"scale\":0}},{\"na >>>>>>>>> me\":\"statecode\",\"type\":\"string\",\"nullable\":true,\"m >>>>>>>>> etadata\":{\"name\":\"statecode\",\"scale\":0}},{\"name\":\" >>>>>>>>> Socialid\",\"type\":\"string\",\"nullable\":true,\"metadata\ >>>>>>>>> ":{\"name\":\"Socialid\",\"scale\":0}},{\"name\":\"county\", >>>>>>>>> \"type\":\"string\",\"nullable\":true,\"metadata\":{\"name\" >>>>>>>>> :\"county\",\"scale\":0}},{\"name\":\"eq_site_limit\",\"type >>>>>>>>> \":\"decimal(10,2)\",\"nullable\":true,\"metadata\":{\"name\ >>>>>>>>> ":\"eq_site_limit\",\"scale\":2}},{\"name\":\"hu_site_limit\ >>>>>>>>> ",\"type\":\"decimal(10,2)\",\"nullable\":true,\"metadata\": >>>>>>>>> {\"name\":\"hu_site_limit\",\"scale\":2}},{\"name\":\"fl_sit >>>>>>>>> e_limit\",\"type\":\"decimal(10,2)\",\"nullable\":true,\"met >>>>>>>>> adata\":{\"name\":\"fl_site_limit\",\"scale\":2}},{\"name\": >>>>>>>>> \"fr_site_limit\",\"type\":\"decimal(10,2)\",\"nullable\":tr >>>>>>>>> ue,\"metadata\":{\"name\":\"fr_site_limit\",\"scale\":2}},{\ >>>>>>>>> "name\":\"tiv_2014\",\"type\":\"decimal(10,2)\",\"nullable\" >>>>>>>>> :true,\"metadata\":{\"name\":\"tiv_2014\",\"scale\":2}},{\"n >>>>>>>>> ame\":\"tiv_2015\",\"type\":\"decimal(10,2)\",\"nullable\":t >>>>>>>>> rue,\"metadata\":{\"name\":\"tiv_2015\",\"scale\":2}},{\"nam >>>>>>>>> e\":\"eq_site_deductible\",\"type\":\"integer\",\"nullable\ >>>>>>>>> ":true,\"metadata\":{\"name\":\"eq_site_deductible\",\" >>>>>>>>> scale\":0}},{\"name\":\"hu_site_deductible\",\"type\":\" >>>>>>>>> integer\",\"nullable\":true,\"metadata\":{\"name\":\"hu_site >>>>>>>>> _deductible\",\"scale\":0}},{\"name\":\"fl_site_deductible\" >>>>>>>>> ,\"type\":\"integer\",\"nullable\":true,\"metadata\":{ >>>>>>>>> \"name\":\"fl_site_deductible\",\"scale\":0}},{\"name\":\" >>>>>>>>> fr_site_deductible\",\"type\":\"integer\",\"nullable\":true, >>>>>>>>> \"metadata\":{\"name\":\"fr_site_deductible\",\"scale\":0} >>>>>>>>> },{\"name\":\"latitude\",\"type\":\"decimal(6,6)\",\" >>>>>>>>> nullable\":true,\"metadata\":{\"name\":\"latitude\",\"scale\ >>>>>>>>> ":6}},{\"name\":\"longitude\",\"type\":\"decimal(6,6)\",\" >>>>>>>>> nullable\":true,\"metadata\":{\"name\":\"longitude\",\" >>>>>>>>> scale\":6}},{\"name\":\"line\",\"type\":\"string\",\"nullabl >>>>>>>>> e\":true,\"metadata\":{\"name\":\"line\",\"scale\":0}},{\" >>>>>>>>> name\":\"construction\",\"type\":\"string\",\"nullable\" >>>>>>>>> :true,\"metadata\":{\"name\":\"construction\",\"scale\":0}}, >>>>>>>>> {\"name\":\"point_granularity\",\"type\":\"integer\",\"nulla >>>>>>>>> ble\":true,\"metadata\":{\"name\":\"point_granularity\",\ >>>>>>>>> "scale\":0}}]} >>>>>>>>> transient_lastDdlTime 1518326800 >>>>>>>>> >>>>>>>>> # Storage Information >>>>>>>>> SerDe Library: org.apache.hadoop.hive.ql.io.p >>>>>>>>> arquet.serde.ParquetHiveSerDe >>>>>>>>> InputFormat: org.apache.hadoop.hive.ql.io.p >>>>>>>>> arquet.MapredParquetInputFormat >>>>>>>>> OutputFormat: org.apache.hadoop.hive.ql.io.p >>>>>>>>> arquet.MapredParquetOutputFormat >>>>>>>>> Compressed: No >>>>>>>>> Num Buckets: -1 >>>>>>>>> Bucket Columns: [] >>>>>>>>> Sort Columns: [] >>>>>>>>> Storage Desc Params: >>>>>>>>> path hdfs://localhost:8020/user/hiv >>>>>>>>> e/warehouse/mine >>>>>>>>> serialization.format 1 >>>>>>>>> Time taken: 0.077 seconds, Fetched: 48 row(s) >>>>>>>>> >>>>>>>>> Now, I see your advise and support. Whats the issue? Am I doing >>>>>>>>> wrong, it it a bug ? I am using Spark 2.2.1, HIVE 1.2.1, HADOOP >>>>>>>>> 2.7.3. All >>>>>>>>> class path, configuration are set properly. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Ravion >>>>>>>>> >>>>>>>>> On Fri, Feb 9, 2018 at 1:29 PM, Nicholas Hakobian < >>>>>>>>> nicholas.hakob...@rallyhealth.com> wrote: >>>>>>>>> >>>>>>>>>> Its possible that the format of your table is not compatible with >>>>>>>>>> your version of hive, so Spark saved it in a way such that only >>>>>>>>>> Spark can >>>>>>>>>> read it. When this happens it prints out a very visible warning >>>>>>>>>> letting you >>>>>>>>>> know this has happened. >>>>>>>>>> >>>>>>>>>> We've seen it most frequently when trying to save a parquet file >>>>>>>>>> with a column in date format into a Hive table. In older versions of >>>>>>>>>> hive, >>>>>>>>>> its parquet reader/writer did not support Date formats (among a >>>>>>>>>> couple >>>>>>>>>> others). >>>>>>>>>> >>>>>>>>>> Nicholas Szandor Hakobian, Ph.D. >>>>>>>>>> Staff Data Scientist >>>>>>>>>> Rally Health >>>>>>>>>> nicholas.hakob...@rallyhealth.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Feb 9, 2018 at 9:59 AM, Prakash Joshi < >>>>>>>>>> prakashcjos...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Ravi, >>>>>>>>>>> >>>>>>>>>>> Can you send the result of >>>>>>>>>>> Show create table your_table_name >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Prakash >>>>>>>>>>> >>>>>>>>>>> On Feb 9, 2018 8:20 PM, "☼ R Nair (रविशंकर नायर)" < >>>>>>>>>>> ravishankar.n...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> All, >>>>>>>>>>>> >>>>>>>>>>>> It has been three days continuously I am on this issue. Not >>>>>>>>>>>> getting any clue. >>>>>>>>>>>> >>>>>>>>>>>> Environment: Spark 2.2.x, all configurations are correct. >>>>>>>>>>>> hive-site.xml is in spark's conf. >>>>>>>>>>>> >>>>>>>>>>>> 1) Step 1: I created a data frame DF1 reading a csv file. >>>>>>>>>>>> >>>>>>>>>>>> 2) Did manipulations on DF1. Resulting frame is passion_df. >>>>>>>>>>>> >>>>>>>>>>>> 3) passion_df.write.format("orc").saveAsTable("sampledb.passion >>>>>>>>>>>> ") >>>>>>>>>>>> >>>>>>>>>>>> 4) The metastore shows the hive table., when I do "show tables" >>>>>>>>>>>> in HIVE, I can see table name >>>>>>>>>>>> >>>>>>>>>>>> 5) I can't select in HIVE, though I can select from SPARK as >>>>>>>>>>>> spark.sql("select * from sampledb.passion") >>>>>>>>>>>> >>>>>>>>>>>> Whats going on here? Please help. Why I am not seeing data from >>>>>>>>>>>> HIVE prompt? >>>>>>>>>>>> The "describe formatted " command on the table in HIVE shows he >>>>>>>>>>>> data is is in default warehouse location ( /user/hive/warehouse) >>>>>>>>>>>> since I >>>>>>>>>>>> set it. >>>>>>>>>>>> >>>>>>>>>>>> I am not getting any definite answer anywhere. Many suggestions >>>>>>>>>>>> and answers given in Stackoverflow et al.Nothing really works. >>>>>>>>>>>> >>>>>>>>>>>> So asking experts here for some light on this, thanks >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Ravion >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>> >>>> >>>> -- >>>> Shmuel Blitz >>>> Big Data Developer >>>> Email: shmuel.bl...@similarweb.com >>>> www.similarweb.com >>>> <https://www.facebook.com/SimilarWeb/> >>>> <https://www.linkedin.com/company/429838/> >>>> <https://twitter.com/similarweb> >>>> >>> >>> >>> >>> -- >>> >>> >> > > > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net > --