Rémy SAISSY created HIVE-7125:
---------------------------------

             Summary: Support strings in the DELIMITED BY statement
                 Key: HIVE-7125
                 URL: https://issues.apache.org/jira/browse/HIVE-7125
             Project: Hive
          Issue Type: Improvement
          Components: Query Processor
    Affects Versions: 0.13.0
            Reporter: Rémy SAISSY


Hi,
I came to work with a dataset which look like that:
dataset.txt:
salut|;les|;|amiches
comment|;|allez|;|vous

This dataset's delimiter is not a specific character like | or ; but a string, 
|;| in this case.
 
Therefore I have created an external table with this delimiter:
hive> create external table ds (f1 string, f2 string, f3 string) 
      row format delimited fields terminated by '|;|' 
      location '/user/remy/dataset';                                            
  

But I got this error:

MismatchedTokenException(5!=301)
        at 
org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:617)
        at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormatFieldIdentifier(HiveParser.java:31433)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:30386)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:30662)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4683)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2144)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
        at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
        at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
        at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:373)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:291)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:944)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1009)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:880)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:870)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:102 mismatched input '|' expecting StringLiteral 
near 'by' in table row format's field separator

The workaround was to run a mapreduce job to preprocess the data and replace 
the delimiter by a single and unused character (my client uses a three 
characters delimiter in order to ensure that the sequence won't appear 
elsewhere in the csv).
However, it would be nice to be able to directly integrate it into an external 
table.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to