[jira] [Commented] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed
[ https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449978#comment-13449978 ] Brian Bloniarz commented on HIVE-1898: -- I think Luke is right -- maybe the bug title should be changed to simply say "data with newlines won't work in Text/LazySimpleSerDe tables"? I haven't tested it, but would STORED AS SEQUENCEFILE tables be immune to this problem? > The ESCAPED BY clause does not seem to pick up newlines in colums and the > line terminator cannot be changed > --- > > Key: HIVE-1898 > URL: https://issues.apache.org/jira/browse/HIVE-1898 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.5.0 >Reporter: Josh Patterson >Priority: Minor > > If I want to preserve data in columns which contains a newline (webcrawling > for instance) I cannot set the ESCAPED BY clause to escape these out (other > characters such as commas escape fine, however). This may be due to the line > terminators, which are locked to be newlines, are picked up first, and then > fields processed. > This seems to be related to: > "SerDe should escape some special characters" > https://issues.apache.org/jira/browse/HIVE-136 > and > "Implement "LINES TERMINATED BY"" > https://issues.apache.org/jira/browse/HIVE-302 > where at comment: > https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435 > "This is not fixable currently because the line terminator is determined by > LineRecordReader.LineReader which is in the Hadoop land." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)
[ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Bloniarz updated HIVE-3198: - Attachment: TestStorageHandler.java Here's a StorageHandler implementation which should help reproduce the bug. When I run it like this: {code} $ mkdir /tmp/test; touch /tmp/test/part-0 hive> add jar test.jar; hive> create external table test (a string) STORED BY 'TestStorageHandler' location '/tmp/test'; hive> select * from test; {code} I see "TESTPROP: hello world", which means that the properties are being setup correctly. But if you do: {code} hive> select a from test; {code} I see "TESTPROP: null", meaning that properties from configureInputJobProperties() don't get passed to the getRecordReader() call. > StorageHandler properties not passed to InputFormat (?) > --- > > Key: HIVE-3198 > URL: https://issues.apache.org/jira/browse/HIVE-3198 > Project: Hive > Issue Type: Bug > Environment: trunk r1352973 >Reporter: Brian Bloniarz > Attachments: TestStorageHandler.java, inputformat.patch > > > I'm working on a custom StorageHandler implementation. I use > configureTableJobProperties to pass properties onto a serde & InputFormat, > but it looks to me like the properties aren't present inside the InputFormat. > I found the following code which looks like it's supposed to propagate > JobProperties: > {code} > public class HiveInputFormat > ... > public RecordReader getRecordReader(InputSplit split, JobConf job, > Reporter reporter) throws IOException { > HiveInputSplit hsplit = (HiveInputSplit) split; > ... > boolean nonNative = false; > PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString()); > if ((part != null) && (part.getTableDesc() != null)) { > Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), > cloneJobConf); > nonNative = part.getTableDesc().isNonNative(); > } > {code} > In the debugger, I see that part==null so copyTableJobPropertiesToConf > doesn't get called. I see that for this table: > {code} > create external table test3 () STORED BY 'foo' location '/data/bar'; > {code} > The InputSplit path is the *file* (i.e. "/data/bar/part-0") but > pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar"). > I attached a patch which fixes the problem for me; it makes things explicit > by passing along the directory name inside the HiveInputSplit; this mean we > don't have to figure out which files are a part of which partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)
[ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413302#comment-13413302 ] Brian Bloniarz commented on HIVE-3198: -- Hi Navis, sorry it took me so long to get back to you. Your suggested fix also works & makes the problem go away. Thanks for helping, let me know if there's anything else w.r.t. getting this fixed. > StorageHandler properties not passed to InputFormat (?) > --- > > Key: HIVE-3198 > URL: https://issues.apache.org/jira/browse/HIVE-3198 > Project: Hive > Issue Type: Bug > Environment: trunk r1352973 >Reporter: Brian Bloniarz > Attachments: inputformat.patch > > > I'm working on a custom StorageHandler implementation. I use > configureTableJobProperties to pass properties onto a serde & InputFormat, > but it looks to me like the properties aren't present inside the InputFormat. > I found the following code which looks like it's supposed to propagate > JobProperties: > {code} > public class HiveInputFormat > ... > public RecordReader getRecordReader(InputSplit split, JobConf job, > Reporter reporter) throws IOException { > HiveInputSplit hsplit = (HiveInputSplit) split; > ... > boolean nonNative = false; > PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString()); > if ((part != null) && (part.getTableDesc() != null)) { > Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), > cloneJobConf); > nonNative = part.getTableDesc().isNonNative(); > } > {code} > In the debugger, I see that part==null so copyTableJobPropertiesToConf > doesn't get called. I see that for this table: > {code} > create external table test3 () STORED BY 'foo' location '/data/bar'; > {code} > The InputSplit path is the *file* (i.e. "/data/bar/part-0") but > pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar"). > I attached a patch which fixes the problem for me; it makes things explicit > by passing along the directory name inside the HiveInputSplit; this mean we > don't have to figure out which files are a part of which partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)
[ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Bloniarz updated HIVE-3198: - Attachment: inputformat.patch > StorageHandler properties not passed to InputFormat (?) > --- > > Key: HIVE-3198 > URL: https://issues.apache.org/jira/browse/HIVE-3198 > Project: Hive > Issue Type: Bug > Environment: trunk r1352973 >Reporter: Brian Bloniarz > Attachments: inputformat.patch > > > I'm working on a custom StorageHandler implementation. I use > configureTableJobProperties to pass properties onto a serde & InputFormat, > but it looks to me like the properties aren't present inside the InputFormat. > I found the following code which looks like it's supposed to propagate > JobProperties: > {code} > public class HiveInputFormat > ... > public RecordReader getRecordReader(InputSplit split, JobConf job, > Reporter reporter) throws IOException { > HiveInputSplit hsplit = (HiveInputSplit) split; > ... > boolean nonNative = false; > PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString()); > if ((part != null) && (part.getTableDesc() != null)) { > Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), > cloneJobConf); > nonNative = part.getTableDesc().isNonNative(); > } > {code} > In the debugger, I see that part==null so copyTableJobPropertiesToConf > doesn't get called. I see that for this table: > {code} > create external table test3 () STORED BY 'foo' location '/data/bar'; > {code} > The InputSplit path is the *file* (i.e. "/data/bar/part-0") but > pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar"). > I attached a patch which fixes the problem for me; it makes things explicit > by passing along the directory name inside the HiveInputSplit; this mean we > don't have to figure out which files are a part of which partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)
[ https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Bloniarz updated HIVE-3198: - Status: Patch Available (was: Open) > StorageHandler properties not passed to InputFormat (?) > --- > > Key: HIVE-3198 > URL: https://issues.apache.org/jira/browse/HIVE-3198 > Project: Hive > Issue Type: Bug > Environment: trunk r1352973 >Reporter: Brian Bloniarz > > I'm working on a custom StorageHandler implementation. I use > configureTableJobProperties to pass properties onto a serde & InputFormat, > but it looks to me like the properties aren't present inside the InputFormat. > I found the following code which looks like it's supposed to propagate > JobProperties: > {code} > public class HiveInputFormat > ... > public RecordReader getRecordReader(InputSplit split, JobConf job, > Reporter reporter) throws IOException { > HiveInputSplit hsplit = (HiveInputSplit) split; > ... > boolean nonNative = false; > PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString()); > if ((part != null) && (part.getTableDesc() != null)) { > Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), > cloneJobConf); > nonNative = part.getTableDesc().isNonNative(); > } > {code} > In the debugger, I see that part==null so copyTableJobPropertiesToConf > doesn't get called. I see that for this table: > {code} > create external table test3 () STORED BY 'foo' location '/data/bar'; > {code} > The InputSplit path is the *file* (i.e. "/data/bar/part-0") but > pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar"). > I attached a patch which fixes the problem for me; it makes things explicit > by passing along the directory name inside the HiveInputSplit; this mean we > don't have to figure out which files are a part of which partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)
Brian Bloniarz created HIVE-3198: Summary: StorageHandler properties not passed to InputFormat (?) Key: HIVE-3198 URL: https://issues.apache.org/jira/browse/HIVE-3198 Project: Hive Issue Type: Bug Environment: trunk r1352973 Reporter: Brian Bloniarz I'm working on a custom StorageHandler implementation. I use configureTableJobProperties to pass properties onto a serde & InputFormat, but it looks to me like the properties aren't present inside the InputFormat. I found the following code which looks like it's supposed to propagate JobProperties: {code} public class HiveInputFormat ... public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException { HiveInputSplit hsplit = (HiveInputSplit) split; ... boolean nonNative = false; PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString()); if ((part != null) && (part.getTableDesc() != null)) { Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf); nonNative = part.getTableDesc().isNonNative(); } {code} In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't get called. I see that for this table: {code} create external table test3 () STORED BY 'foo' location '/data/bar'; {code} The InputSplit path is the *file* (i.e. "/data/bar/part-0") but pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar"). I attached a patch which fixes the problem for me; it makes things explicit by passing along the directory name inside the HiveInputSplit; this mean we don't have to figure out which files are a part of which partition. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-3197) Hive compile errors under Java 7 (JDBC 4.1)
Brian Bloniarz created HIVE-3197: Summary: Hive compile errors under Java 7 (JDBC 4.1) Key: HIVE-3197 URL: https://issues.apache.org/jira/browse/HIVE-3197 Project: Hive Issue Type: Bug Environment: Ubuntu 12.04 Reporter: Brian Bloniarz Hi, I've been trying to compile Hive trunk from source and getting failures: {code} [javac] hive-svn/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveCallableStatement.java:48: error: HiveCallableStatement is not abstract and does not override abstract method getObject(String,Class) in CallableStatement [javac] public class HiveCallableStatement implements java.sql.CallableStatement { [javac]^ [javac] where T is a type-variable: [javac] T extends Object declared in method getObject(String,Class) {code} I think this is because JDBC 4.1 is part of Java 7, and is not source-compatible with older JDBC versions. Any chance you guys could add JDBC 4.1 support? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira