Hi Ajay Looking at spark code , i can see you used hive context. Can you try using sql context instead of hive context there?
Thanks Deepak On Mon, Jun 13, 2016 at 10:15 PM, Ajay Chander <itsche...@gmail.com> wrote: > Hi Mohit, > > Thanks for your time. Please find my response below. > > Did you try the same with another database? > I do load the data from MySQL and SQL Server the same way(through SPARK > SQL JDBC) which works perfectly alright. > > As a workaround you can write the select statement yourself instead of > just providing the table name? > Yes I did that too. It did not made any difference. > > Thank you, > Ajay > > On Sunday, June 12, 2016, Mohit Jaggi <mohitja...@gmail.com> wrote: > >> Looks like a bug in the code generating the SQL query…why would it be >> specific to SAS, I can’t guess. Did you try the same with another database? >> As a workaround you can write the select statement yourself instead of just >> providing the table name. >> >> On Jun 11, 2016, at 6:27 PM, Ajay Chander <itsche...@gmail.com> wrote: >> >> I tried implementing the same functionality through Scala as well. But no >> luck so far. Just wondering if anyone here tried using Spark SQL to read >> SAS dataset? Thank you >> >> Regards, >> Ajay >> >> On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: >> >>> Mich, I completely agree with you. I built another Spark SQL application >>> which reads data from MySQL and SQL server and writes the data into >>> Hive(parquet+snappy format). I have this problem only when I read directly >>> from remote SAS system. The interesting part is I am using same driver to >>> read data through pure Java app and spark app. It works fine in Java >>> app, so I cannot blame SAS driver here. Trying to understand where the >>> problem could be. Thanks for sharing this with me. >>> >>> On Friday, June 10, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> I personally use Scala to do something similar. For example here I >>>> extract data from an Oracle table and store in ORC table in Hive. This is >>>> compiled via sbt as run with SparkSubmit. >>>> >>>> It is similar to your code but in Scala. Note that I do not enclose my >>>> column names in double quotes. >>>> >>>> import org.apache.spark.SparkContext >>>> import org.apache.spark.SparkConf >>>> import org.apache.spark.sql.Row >>>> import org.apache.spark.sql.hive.HiveContext >>>> import org.apache.spark.sql.types._ >>>> import org.apache.spark.sql.SQLContext >>>> import org.apache.spark.sql.functions._ >>>> >>>> object ETL_scratchpad_dummy { >>>> def main(args: Array[String]) { >>>> val conf = new SparkConf(). >>>> setAppName("ETL_scratchpad_dummy"). >>>> set("spark.driver.allowMultipleContexts", "true") >>>> val sc = new SparkContext(conf) >>>> // Create sqlContext based on HiveContext >>>> val sqlContext = new HiveContext(sc) >>>> import sqlContext.implicits._ >>>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>> println ("\nStarted at"); sqlContext.sql("SELECT >>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') >>>> ").collect.foreach(println) >>>> HiveContext.sql("use oraclehadoop") >>>> var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12" >>>> var _username : String = "scratchpad" >>>> var _password : String = "xxxxxxxx" >>>> >>>> // Get data from Oracle table scratchpad.dummy >>>> val d = HiveContext.load("jdbc", >>>> Map("url" -> _ORACLEserver, >>>> "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS >>>> CLUSTERED, to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS >>>> RANDOMISED, RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)", >>>> "user" -> _username, >>>> "password" -> _password)) >>>> >>>> d.registerTempTable("tmp") >>>> // >>>> // Need to create and populate target ORC table oraclehadoop.dummy >>>> // >>>> HiveContext.sql("use oraclehadoop") >>>> // >>>> // Drop and create table dummy >>>> // >>>> HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy") >>>> var sqltext : String = "" >>>> sqltext = """ >>>> CREATE TABLE oraclehadoop.dummy ( >>>> ID INT >>>> , CLUSTERED INT >>>> , SCATTERED INT >>>> , RANDOMISED INT >>>> , RANDOM_STRING VARCHAR(50) >>>> , SMALL_VC VARCHAR(10) >>>> , PADDING VARCHAR(10) >>>> ) >>>> CLUSTERED BY (ID) INTO 256 BUCKETS >>>> STORED AS ORC >>>> TBLPROPERTIES ( >>>> "orc.create.index"="true", >>>> "orc.bloom.filter.columns"="ID", >>>> "orc.bloom.filter.fpp"="0.05", >>>> "orc.compress"="SNAPPY", >>>> "orc.stripe.size"="16777216", >>>> "orc.row.index.stride"="10000" ) >>>> """ >>>> HiveContext.sql(sqltext) >>>> // >>>> // Put data in Hive table. Clean up is already done >>>> // >>>> sqltext = """ >>>> INSERT INTO TABLE oraclehadoop.dummy >>>> SELECT >>>> ID >>>> , CLUSTERED >>>> , SCATTERED >>>> , RANDOMISED >>>> , RANDOM_STRING >>>> , SMALL_VC >>>> , PADDING >>>> FROM tmp >>>> """ >>>> HiveContext.sql(sqltext) >>>> println ("\nFinished at"); sqlContext.sql("SELECT >>>> FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') >>>> ").collect.foreach(println) >>>> sys.exit() >>>> } >>>> } >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 10 June 2016 at 23:38, Ajay Chander <itsche...@gmail.com> wrote: >>>> >>>>> Hi Mich, >>>>> >>>>> Thanks for the response. If you look at my programs, I am not writings >>>>> my queries to include column names in a pair of "". My driver in spark >>>>> program is generating such query with column names in "" which I do not >>>>> want. On the other hand, I am using the same driver in my pure Java >>>>> program >>>>> which is attached, in that program the same driver is generating a proper >>>>> sql query with out "". >>>>> >>>>> Pure Java log: >>>>> >>>>> 2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT >>>>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result >>>>> set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) >>>>> Spark SQL log: >>>>> >>>>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT >>>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement >>>>> 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538) >>>>> >>>>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT >>>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set >>>>> 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) >>>>> Please find complete program and full logs attached in the below >>>>> thread. Thank you. >>>>> >>>>> Regards, >>>>> Ajay >>>>> >>>>> >>>>> On Friday, June 10, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> >>>>> wrote: >>>>> >>>>>> Assuming I understood your query, in Spark SQL (that is you log in to >>>>>> spark sql like spark-sql --master spark://<HOST_NAME>:7077 you do >>>>>> not need double quotes around column names for sql to work >>>>>> >>>>>> spark-sql> select "hello from Mich" from oraclehadoop.sales limit 1; >>>>>> hello from Mich >>>>>> >>>>>> Anything between a pair of "" will be interpreted as text NOT column >>>>>> name. >>>>>> >>>>>> In Spark SQL you do not need double quotes. So simply >>>>>> >>>>>> spark-sql> select prod_id, cust_id from sales limit 2; >>>>>> 17 28017 >>>>>> 18 10419 >>>>>> >>>>>> HTH >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 10 June 2016 at 21:54, Ajay Chander <itsche...@gmail.com> wrote: >>>>>> >>>>>>> Hi again, anyone in this group tried to access SAS dataset through >>>>>>> Spark SQL ? Thank you >>>>>>> >>>>>>> Regards, >>>>>>> Ajay >>>>>>> >>>>>>> >>>>>>> On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Spark Users, >>>>>>>> >>>>>>>> I hope everyone here are doing great. >>>>>>>> >>>>>>>> I am trying to read data from SAS through Spark SQL and write into >>>>>>>> HDFS. Initially, I started with pure java program please find the >>>>>>>> program and logs in the attached file sas_pure_java.txt . My >>>>>>>> program ran successfully and it returned the data from Sas to >>>>>>>> Spark_SQL. Please >>>>>>>> note the highlighted part in the log. >>>>>>>> >>>>>>>> My SAS dataset has 4 rows, >>>>>>>> >>>>>>>> Program ran successfully. So my output is, >>>>>>>> >>>>>>>> [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT >>>>>>>> a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created >>>>>>>> result set 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) >>>>>>>> >>>>>>>> [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next); >>>>>>>> time= 0.045 secs (com.sas.rio.MVAResultSet:773) >>>>>>>> >>>>>>>> 1,'2016-01-01','2016-01-31' >>>>>>>> >>>>>>>> 2,'2016-02-01','2016-02-29' >>>>>>>> >>>>>>>> 3,'2016-03-01','2016-03-31' >>>>>>>> >>>>>>>> 4,'2016-04-01','2016-04-30' >>>>>>>> >>>>>>>> >>>>>>>> Please find the full logs attached to this email in file >>>>>>>> sas_pure_java.txt. >>>>>>>> >>>>>>>> _______________________ >>>>>>>> >>>>>>>> >>>>>>>> Now I am trying to do the same via Spark SQL. Please find my >>>>>>>> program and logs attached to this email in file sas_spark_sql.txt . >>>>>>>> >>>>>>>> Connection to SAS dataset is established successfully. But please >>>>>>>> note the highlighted log below. >>>>>>>> >>>>>>>> [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT >>>>>>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared >>>>>>>> statement 2.1; time= 0.038 secs (com.sas.rio.MVAConnection:538) >>>>>>>> >>>>>>>> [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT >>>>>>>> "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created >>>>>>>> result set 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) >>>>>>>> Please find the full logs attached to this email in file >>>>>>>> sas_spark_sql.txt >>>>>>>> >>>>>>>> I am using same driver in both pure java and spark sql programs. >>>>>>>> But the query generated in spark sql has quotes around the column >>>>>>>> names(Highlighted above). >>>>>>>> So my resulting output for that query is like this, >>>>>>>> >>>>>>>> +-----+--------+------+ >>>>>>>> | _c0| _c1| _c2| >>>>>>>> +-----+--------+------+ >>>>>>>> |SR_NO|start_dt|end_dt| >>>>>>>> |SR_NO|start_dt|end_dt| >>>>>>>> |SR_NO|start_dt|end_dt| >>>>>>>> |SR_NO|start_dt|end_dt| >>>>>>>> +-----+--------+------+ >>>>>>>> >>>>>>>> Since both programs are using the same driver com.sas.rio.MVADriver >>>>>>>> . Expected output should be same as my pure java programs output. But >>>>>>>> something else is happening behind the scenes. >>>>>>>> >>>>>>>> Any insights on this issue. Thanks for your time. >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> Ajay >>>>>>>> >>>>>>> >>>>>> >>>> >> -- Thanks Deepak www.bigdatabig.com www.keosha.net