Re: Hive to Oracle using Spark - Type(Date) conversion issue

2018-03-20 Thread Gurusamy Thirupathy
HI Jorn,

Thanks for your sharing different options, yes we are trying to build a
generic tool for Hive to Spark export.
FYI, currently we are using sqoop, we are trying to migrate from sqoop to
spark.

Thanks
-G

On Tue, Mar 20, 2018 at 2:17 AM, Jörn Franke  wrote:

> Write your own Spark UDF. Apply it to all varchar columns.
>
> Within this udf you can use the SimpleDateFormat parse method. If this
> method returns null you return the content as varchar if not you return a
> date. If the content is null you return null.
>
> Alternatively you can define an insert function as pl/sql on Oracle side.
>
> Another alternative is to read the Oracle metadata for the table at
> runtime and then adapt your conversion based on this.
>
> However, this may not be perfect depending on your use case. Can you
> please provide more details/examples? Do you aim at a generic hive to
> Oracle import tool using Spark? Sqoop would not be an alternative?
>
> On 20. Mar 2018, at 03:45, Gurusamy Thirupathy 
> wrote:
>
> Hi guha,
>
> Thanks for your quick response, option a and b are in our table already.
> For option b, again the same problem, we don't know which column is date.
>
>
> Thanks,
> -G
>
> On Sun, Mar 18, 2018 at 9:36 PM, Deepak Sharma 
> wrote:
>
>> The other approach would to write to temp table and then merge the data.
>> But this may be expensive solution.
>>
>> Thanks
>> Deepak
>>
>> On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy 
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to read data from Hive as DataFrame, then trying to write
>>> the DF into the Oracle data base. In this case, the date field/column in
>>> hive is with Type Varchar(20)
>>> but the corresponding column type in Oracle is Date. While reading from
>>> hive , the hive table names are dynamically decided(read from another
>>> table) based on some job condition(ex. Job1). There are multiple tables
>>> like this, so column and the table names are decided only run time. So I
>>> can't do type conversion explicitly when read from Hive.
>>>
>>> So is there any utility/api available in Spark to achieve this
>>> conversion issue?
>>>
>>>
>>> Thanks,
>>> Guru
>>>
>>
>
>
> --
> Thanks,
> Guru
>
>


-- 
Thanks,
Guru


Re: Hive to Oracle using Spark - Type(Date) conversion issue

2018-03-19 Thread Gurusamy Thirupathy
Hi guha,

Thanks for your quick response, option a and b are in our table already.
For option b, again the same problem, we don't know which column is date.


Thanks,
-G

On Sun, Mar 18, 2018 at 9:36 PM, Deepak Sharma 
wrote:

> The other approach would to write to temp table and then merge the data.
> But this may be expensive solution.
>
> Thanks
> Deepak
>
> On Mon, Mar 19, 2018, 08:04 Gurusamy Thirupathy 
> wrote:
>
>> Hi,
>>
>> I am trying to read data from Hive as DataFrame, then trying to write the
>> DF into the Oracle data base. In this case, the date field/column in hive
>> is with Type Varchar(20)
>> but the corresponding column type in Oracle is Date. While reading from
>> hive , the hive table names are dynamically decided(read from another
>> table) based on some job condition(ex. Job1). There are multiple tables
>> like this, so column and the table names are decided only run time. So I
>> can't do type conversion explicitly when read from Hive.
>>
>> So is there any utility/api available in Spark to achieve this conversion
>> issue?
>>
>>
>> Thanks,
>> Guru
>>
>


-- 
Thanks,
Guru


Hive to Oracle using Spark - Type(Date) conversion issue

2018-03-18 Thread Gurusamy Thirupathy
Hi,

I am trying to read data from Hive as DataFrame, then trying to write the
DF into the Oracle data base. In this case, the date field/column in hive
is with Type Varchar(20)
but the corresponding column type in Oracle is Date. While reading from
hive , the hive table names are dynamically decided(read from another
table) based on some job condition(ex. Job1). There are multiple tables
like this, so column and the table names are decided only run time. So I
can't do type conversion explicitly when read from Hive.

So is there any utility/api available in Spark to achieve this conversion
issue?


Thanks,
Guru


Subtract two DataFrames is not working

2016-05-26 Thread Gurusamy Thirupathy
I have to subtract two dataframes, I tried with except method but it's not
working. I tried with drop also. I am using spark 1.4.1 version. And Scala
2.10.
Can you please help?


Thanks,
Guru