Hi Neil, Have you tried to test your regexes in Java? I was using one of the applets available on the web (e.g. http://www.cis.upenn.edu/~matuszek/General/RegexTester/regex-tester.html ) to test my expressions before running a hive query and it helped me a lot...
Usually all you need is to use double escaping, such as: select regexp_extract("abc def ghj","\\s(.*)\\s",1) from test limit 1; This correctly returns a string " def ". Best regards, Jan On Thu, Nov 1, 2012 at 3:05 PM, Neil Kodner <nkod...@gmail.com> wrote: > From the hive docs on regexp_extract: > > Note that some care is necessary in using predefined character classes: > using '\s' as the second argument will match the letter s; ' > s' is necessary to match whitespace, etc. The 'index' parameter is the Java > regex Matcher group() method index. See > docs/api/java/util/regex/Matcher.html for more information on the 'index' or > Java regex group() method. > > This is confusing, especially the line break after s; '. Can anyone explain > whether character classes work under regexp_extract? > > I'm asking because I've been having some trouble implementing regular > expression extracts using character classes such as \w. These regular > expressions are working in some other environments but I can't get them to > work correctly in hive.