Looks like a bug in the code generating the SQL query…why would it be specific to SAS, I can’t guess. Did you try the same with another database? As a workaround you can write the select statement yourself instead of just providing the table name.
> On Jun 11, 2016, at 6:27 PM, Ajay Chander <itsche...@gmail.com> wrote: > > I tried implementing the same functionality through Scala as well. But no > luck so far. Just wondering if anyone here tried using Spark SQL to read SAS > dataset? Thank you > > Regards, > Ajay > > On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com > <mailto:itsche...@gmail.com>> wrote: > Mich, I completely agree with you. I built another Spark SQL application > which reads data from MySQL and SQL server and writes the data into > Hive(parquet+snappy format). I have this problem only when I read directly > from remote SAS system. The interesting part is I am using same driver to > read data through pure Java app and spark app. It works fine in Java app, so > I cannot blame SAS driver here. Trying to understand where the problem could > be. Thanks for sharing this with me. > > On Friday, June 10, 2016, Mich Talebzadeh <mich.talebza...@gmail.com > <javascript:_e(%7B%7D,'cvml','mich.talebza...@gmail.com');>> wrote: > I personally use Scala to do something similar. For example here I extract > data from an Oracle table and store in ORC table in Hive. This is compiled > via sbt as run with SparkSubmit. > > It is similar to your code but in Scala. Note that I do not enclose my column > names in double quotes. > > import org.apache.spark.SparkContext > import org.apache.spark.SparkConf > import org.apache.spark.sql.Row > import org.apache.spark.sql.hive.HiveContext > import org.apache.spark.sql.types._ > import org.apache.spark.sql.SQLContext > import org.apache.spark.sql.functions._ > > object ETL_scratchpad_dummy { > def main(args: Array[String]) { > val conf = new SparkConf(). > setAppName("ETL_scratchpad_dummy"). > set("spark.driver.allowMultipleContexts", "true") > val sc = new SparkContext(conf) > // Create sqlContext based on HiveContext > val sqlContext = new HiveContext(sc) > import sqlContext.implicits._ > val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > println ("\nStarted at"); sqlContext.sql("SELECT > FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') > ").collect.foreach(println) > HiveContext.sql("use oraclehadoop") > var _ORACLEserver : String = "jdbc:oracle:thin:@rhes564:1521:mydb12" > var _username : String = "scratchpad" > var _password : String = "xxxxxxxx" > > // Get data from Oracle table scratchpad.dummy > val d = HiveContext.load("jdbc", > Map("url" -> _ORACLEserver, > "dbtable" -> "(SELECT to_char(ID) AS ID, to_char(CLUSTERED) AS CLUSTERED, > to_char(SCATTERED) AS SCATTERED, to_char(RANDOMISED) AS RANDOMISED, > RANDOM_STRING, SMALL_VC, PADDING FROM scratchpad.dummy)", > "user" -> _username, > "password" -> _password)) > > d.registerTempTable("tmp") > // > // Need to create and populate target ORC table oraclehadoop.dummy > // > HiveContext.sql("use oraclehadoop") > // > // Drop and create table dummy > // > HiveContext.sql("DROP TABLE IF EXISTS oraclehadoop.dummy") > var sqltext : String = "" > sqltext = """ > CREATE TABLE oraclehadoop.dummy ( > ID INT > , CLUSTERED INT > , SCATTERED INT > , RANDOMISED INT > , RANDOM_STRING VARCHAR(50) > , SMALL_VC VARCHAR(10) > , PADDING VARCHAR(10) > ) > CLUSTERED BY (ID) INTO 256 BUCKETS > STORED AS ORC > TBLPROPERTIES ( > "orc.create.index"="true", > "orc.bloom.filter.columns"="ID", > "orc.bloom.filter.fpp"="0.05", > "orc.compress"="SNAPPY", > "orc.stripe.size"="16777216", > "orc.row.index.stride"="10000" ) > """ > HiveContext.sql(sqltext) > // > // Put data in Hive table. Clean up is already done > // > sqltext = """ > INSERT INTO TABLE oraclehadoop.dummy > SELECT > ID > , CLUSTERED > , SCATTERED > , RANDOMISED > , RANDOM_STRING > , SMALL_VC > , PADDING > FROM tmp > """ > HiveContext.sql(sqltext) > println ("\nFinished at"); sqlContext.sql("SELECT > FROM_unixtime(unix_timestamp(), 'dd/MM/yyyy HH:mm:ss.ss') > ").collect.foreach(println) > sys.exit() > } > } > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 10 June 2016 at 23:38, Ajay Chander <itsche...@gmail.com <>> wrote: > Hi Mich, > > Thanks for the response. If you look at my programs, I am not writings my > queries to include column names in a pair of "". My driver in spark program > is generating such query with column names in "" which I do not want. On the > other hand, I am using the same driver in my pure Java program which is > attached, in that program the same driver is generating a proper sql query > with out "". > > Pure Java log: > > 2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT > a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result set > 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) > > Spark SQL log: > > [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT > "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement 2.1; > time= 0.038 secs (com.sas.rio.MVAConnection:538) > > [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT > "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set > 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) > > Please find complete program and full logs attached in the below thread. > Thank you. > > Regards, > Ajay > > > On Friday, June 10, 2016, Mich Talebzadeh <mich.talebza...@gmail.com <>> > wrote: > Assuming I understood your query, in Spark SQL (that is you log in to spark > sql like spark-sql --master spark://<HOST_NAME>:7077 you do not need double > quotes around column names for sql to work > > spark-sql> select "hello from Mich" from oraclehadoop.sales limit 1; > hello from Mich > > Anything between a pair of "" will be interpreted as text NOT column name. > > In Spark SQL you do not need double quotes. So simply > > spark-sql> select prod_id, cust_id from sales limit 2; > 17 28017 > 18 10419 > > HTH > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > > On 10 June 2016 at 21:54, Ajay Chander <itsche...@gmail.com <>> wrote: > Hi again, anyone in this group tried to access SAS dataset through Spark SQL > ? Thank you > > Regards, > Ajay > > > On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com <>> wrote: > Hi Spark Users, > > I hope everyone here are doing great. > > I am trying to read data from SAS through Spark SQL and write into HDFS. > Initially, I started with pure java program please find the program and logs > in the attached file sas_pure_java.txt . My program ran successfully and it > returned the data from Sas to Spark_SQL. Please note the highlighted part in > the log. > > My SAS dataset has 4 rows, > > Program ran successfully. So my output is, > > [2016-06-10 10:35:21,584] INFO stmt(1.1)#executeQuery SELECT > a.sr_no,a.start_dt,a.end_dt FROM sasLib.run_control a; created result set > 1.1.1; time= 0.122 secs (com.sas.rio.MVAStatement:590) > > [2016-06-10 10:35:21,630] INFO rs(1.1.1)#next (first call to next); time= > 0.045 secs (com.sas.rio.MVAResultSet:773) > > 1,'2016-01-01','2016-01-31' > > 2,'2016-02-01','2016-02-29' > > 3,'2016-03-01','2016-03-31' > > 4,'2016-04-01','2016-04-30' > > > > Please find the full logs attached to this email in file sas_pure_java.txt. > > _______________________ > > > > Now I am trying to do the same via Spark SQL. Please find my program and logs > attached to this email in file sas_spark_sql.txt . > > Connection to SAS dataset is established successfully. But please note the > highlighted log below. > > [2016-06-10 10:29:05,834] INFO conn(2)#prepareStatement sql=SELECT > "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; prepared statement 2.1; > time= 0.038 secs (com.sas.rio.MVAConnection:538) > > [2016-06-10 10:29:05,935] INFO ps(2.1)#executeQuery SELECT > "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set > 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) > > Please find the full logs attached to this email in file sas_spark_sql.txt > > I am using same driver in both pure java and spark sql programs. But the > query generated in spark sql has quotes around the column names(Highlighted > above). > So my resulting output for that query is like this, > > +-----+--------+------+ > | _c0| _c1| _c2| > +-----+--------+------+ > |SR_NO|start_dt|end_dt| > |SR_NO|start_dt|end_dt| > |SR_NO|start_dt|end_dt| > |SR_NO|start_dt|end_dt| > +-----+--------+------+ > > Since both programs are using the same driver com.sas.rio.MVADriver . > Expected output should be same as my pure java programs output. But something > else is happening behind the scenes. > Any insights on this issue. Thanks for your time. > > > > Regards, > > Ajay > > >