For the record, see this ticket: https://issues.apache.org/jira/browse/HIVE-13125
2016-03-17 17:02 GMT+01:00 Ajay Chander <[email protected]>: > Thanks for your time Mich! I will try this one out. > > > On Thursday, March 17, 2016, Mich Talebzadeh <[email protected]> > wrote: > >> Then probably the easiest option would be in INSERT/SELECT from external >> table to target table and make that column NULL >> >> Check the VAT column here that I made it NULL >> >> DROP TABLE IF EXISTS stg_t2; >> CREATE EXTERNAL TABLE stg_t2 ( >> INVOICENUMBER string >> ,PAYMENTDATE string >> ,NET string >> ,VAT string >> ,TOTAL string >> ) >> COMMENT 'from csv file from excel sheet xxxx' >> ROW FORMAT serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde' >> STORED AS TEXTFILE >> LOCATION '/data/stg/table2' >> TBLPROPERTIES ("skip.header.line.count"="1") >> ; >> --3) >> DROP TABLE IF EXISTS t2; >> CREATE TABLE t2 ( >> INVOICENUMBER INT >> ,PAYMENTDATE timestamp >> ,NET DECIMAL(20,2) >> ,VAT DECIMAL(20,2) >> ,TOTAL DECIMAL(20,2) >> ) >> COMMENT 'from csv file from excel sheet xxxx' >> CLUSTERED BY (INVOICENUMBER) INTO 256 BUCKETS >> STORED AS ORC >> TBLPROPERTIES ( "orc.compress"="ZLIB", >> "transactional"="true") >> ; >> --4) Put data in target table. do the conversion and ignore empty rows >> INSERT INTO TABLE t2 >> SELECT >> INVOICENUMBER >> , CAST(UNIX_TIMESTAMP(paymentdate,'DD/MM/YYYY')*1000 as timestamp) >> , CAST(REGEXP_REPLACE(net,'[^\\d\\.]','') AS DECIMAL(20,2)) >> , NULL >> , CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) >> FROM >> stg_t2 >> WHERE >> -- INVOICENUMBER > 0 AND >> CAST(REGEXP_REPLACE(total,'[^\\d\\.]','') AS DECIMAL(20,2)) > 0.0 >> -- Exclude empty rows >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 17 March 2016 at 15:32, Ajay Chander <[email protected]> wrote: >> >>> Mich, I am okay with replacing the columns data with some characters >>> like asterisk. Thanks >>> >>> >>> On Thursday, March 17, 2016, Mich Talebzadeh <[email protected]> >>> wrote: >>> >>>> Hi Ajay, >>>> >>>> Do you want to be able to unmask it (at any time) or just have it >>>> totally scrambled (for example replace the column with random characters) >>>> in Hive? >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 17 March 2016 at 15:14, Ajay Chander <[email protected]> wrote: >>>> >>>>> Mich thbaks for looking into this. I have a 'csvfile.txt ' on hdfs. I >>>>> have created an external table 'xyz' to load that data into it. One of the >>>>> columns data 'ssn' needs to be masked. Is there any built in function is >>>>> give that I could use? >>>>> >>>>> >>>>> On Thursday, March 17, 2016, Mich Talebzadeh < >>>>> [email protected]> wrote: >>>>> >>>>>> Are you loading your CSV file from an External table into Hive table.? >>>>>> >>>>>> Basically you want to scramble that column before putting into Hive >>>>>> table? >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 17 March 2016 at 14:37, Ajay Chander <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Tustin, Is there anyway I can deidentify it in hive ? >>>>>>> >>>>>>> >>>>>>> On Thursday, March 17, 2016, Marcin Tustin <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> This is a classic transform-load problem. You'll want to anonymise >>>>>>>> it once before making it available for analysis. >>>>>>>> >>>>>>>> On Thursday, March 17, 2016, Ajay Chander <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Everyone, >>>>>>>>> >>>>>>>>> I have a csv.file which has some sensitive data in a particular >>>>>>>>> column in it. Now I have to create a table in hive and load the data >>>>>>>>> into >>>>>>>>> it. But when loading the data I have to make sure that the data is >>>>>>>>> masked. >>>>>>>>> Is there any built in function is used ch supports this or do I have >>>>>>>>> to >>>>>>>>> write UDF ? Any suggestions are appreciated. Thanks >>>>>>>> >>>>>>>> >>>>>>>> Want to work at Handy? Check out our culture deck and open roles >>>>>>>> <http://www.handy.com/careers> >>>>>>>> Latest news <http://www.handy.com/press> at Handy >>>>>>>> Handy just raised $50m >>>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> >>>>>>>> led >>>>>>>> by Fidelity >>>>>>>> >>>>>>>> >>>>>> >>>> >>
