Hi Everyone,

I am using hive 0.13! I want to find multiple tokens like "hip hop" and
"rock music" in my data and replace them with "hiphop" and "rockmusic" -
basically replace them without white space. I have used the regexp_replace
function in hive. Below is my query and it works great for above 2 examples.

drop table vp_hiphop;
create table vp_hiphop asselect userid, ntext,
       regexp_replace(regexp_replace(ntext, 'hip hop', 'hiphop'),
'rock music', 'rockmusic') as ntext1from  vp_nlp_protext_males;

But I have 100 such bigrams/ngrams and want to be able to do replace
efficiently where I just remove the whitespace. I can pattern match the
phrase - hip hop and rock music but in the replace I want to simply trim
the white spaces. Below is what I tried. I also tried using trim with
regexp_replace but it wants the third argument in the regexp_replace
function.

drop table vp_hiphop;
create table vp_hiphop asselect  userid, ntext,
        regexp_replace(ntext, '(hip hop)|(rock music)') as ntext1from
vp_nlp_protext_males;

Reply via email to