Re: Spark Left outer Join issue using programmatic sql joins

Mich Talebzadeh Wed, 06 Jul 2016 10:42:51 -0700

This will work in Hive

Don't know why you are getting null values


val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
HiveContext.sql("use test")
val e = HiveContext.table("emp")
val d = HiveContext.table("dept")
val rs = e.join(d,e("deptid")===d("deptid"), "fullouter")
rs.registerTempTable("tmp")
HiveContext.sql("DROP TABLE IF EXISTS test.rs")
var sqltext: String = ""
sqltext =
"""
create table test.rs(emp_id int, name varchar(30), emp_deptid int,
dept_deptid int, dept_name varchar(30))
"""
HiveContext.sql(sqltext)
sqltext =
"""
INSERT INTO test.rs
SELECT * FROM tmp
"""
HiveContext.sql(sqltext)
HiveContext.sql("select * from test.rs").show
sys.exit ()


Now with Vertica I have no idea

Can you do something like below in sql with insert/select

CASE IF COLUMN is NULL THEN ""

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 6 July 2016 at 18:18, Radha krishna <[email protected]> wrote:

> Hi Mich,
> Here I given just a sample data,
> I have some GB's of files in HDFS and performing left outer joins on those
> files, and the final result I am going to store in Vertica data base table.
> There is no duplicate columns in the target table but for the non matching
> rows columns I want to insert "" empty value instead of null word.
> On 06-Jul-2016 10:31 pm, "Mich Talebzadeh" <[email protected]>
> wrote:
>
>> what do you mean database table here?
>>
>> you have repeating column names for the table namely deptid
>>
>>
>> 0: jdbc:hive2://rhes564:10010/default>  SELECT *  FROM emp e LEFT OUTER
>> JOIN dept d ON e.deptid = d.deptid;
>>
>> INFO  : OK
>> +-----------+---------+-----------+-----------+--------------+--+
>> | e.emp_id  | e.name  | e.deptid  | d.deptid  | d.dept_name  |
>> +-----------+---------+-----------+-----------+--------------+--+
>> | 1001      | aba     | 10        | 10        | DEV          |
>> | 1002      | abs     | 20        | 20        | TEST         |
>> | 1003      | abd     | 10        | 10        | DEV          |
>> | 1001      | aba     | 10        | 10        | DEV          |
>> | 1002      | abs     | 20        | 20        | TEST         |
>> | 1003      | abd     | 10        | 10        | DEV          |
>> | 1004      | abf     | 30        | 30        | IT           |
>> | 1005      | abg     | 10        | 10        | DEV          |
>> | 1004      | abf     | 30        | 30        | IT           |
>> | 1005      | abg     | 10        | 10        | DEV          |
>> | 1006      | abh     | 20        | 20        | TEST         |
>> | 1007      | abj     | 10        | 10        | DEV          |
>> | 1006      | abh     | 20        | 20        | TEST         |
>> | 1007      | abj     | 10        | 10        | DEV          |
>> | 1008      | abk     | 30        | 30        | IT           |
>> | 1009      | abl     | 20        | 20        | TEST         |
>> | 1010      | abq     | 10        | 10        | DEV          |
>> | 1008      | abk     | 30        | 30        | IT           |
>> | 1009      | abl     | 20        | 20        | TEST         |
>> | 1010      | abq     | 10        | 10        | DEV          |
>> +-----------+---------+-----------+-----------+--------------+--+
>> 20 rows selected (44.351 seconds)
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 6 July 2016 at 17:48, radha <[email protected]> wrote:
>>
>>> Hi ,
>>> Thanks all, its working fine the issue is with some space for the dept
>>> id,
>>>
>>> I have one more doubt for the non matching records its showing null word,
>>> even if i write into HDFS also its showing null word how can we avoid
>>> writing null for the non matching columns, i want just empty value ("")
>>>
>>> same input i used in the dept table i removed the last row and the below
>>> code i used to write into hdfs.
>>>
>>> DataFrame joinResult = sqlContext.sql("SELECT * FROM EMP e LEFT OUTER
>>> JOIN
>>> DEPT d ON e.deptid = d.deptid");
>>>                    joinResult.javaRDD().repartition(1).map(new
>>> Function<Row, String>() {
>>>                 private static final long serialVersionUID =
>>> 9185646063977504742L;
>>>                         @Override
>>>                         public String call(Row arg0) throws Exception {
>>>                                 String s;
>>>
>>>
>>> s=arg0.getString(0)+"\u001c"+arg0.getString(1)+"\u001c"+arg0.getString(2)+"\u001c"+arg0.getString(3)+"\u001c"+arg0.getString(4)+"\u001e";
>>>                                 return s;
>>>                         }
>>>                 }).saveAsTextFile(args[2]);
>>>
>>>
>>> Output in HDFS File
>>>
>>> 10 1001 aba 10 dev
>>> 10 1003 abd 10 dev
>>> 10 1005 abg 10 dev
>>> 10 1007 abj 10 dev
>>> 10 1010 abq 10 dev
>>> 20 1002 abs 20 Test
>>> 20 1006 abh 20 Test
>>> 20 1009 abl 20 Test
>>> 30 1004 abf null null
>>> 30 1008 abk null null
>>>
>>> in my case i want to store the join result back to data base table and
>>> its
>>> storing "null" word for those non matching records, i want to store as
>>> ""(empty value) for the non matching rows.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Left-outer-Join-issue-using-programmatic-sql-joins-tp27295p27299.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>>
>>>
>>

Re: Spark Left outer Join issue using programmatic sql joins

Reply via email to