[jira] [Commented] (HIVE-1898) The ESCAPED BY clause does not seem to pick up newlines in colums and the line terminator cannot be changed

2012-09-06 Thread Brian Bloniarz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13449978#comment-13449978
 ] 

Brian Bloniarz commented on HIVE-1898:
--

I think Luke is right -- maybe the bug title should be changed to simply say 
"data with newlines won't work in Text/LazySimpleSerDe tables"?

I haven't tested it, but would STORED AS SEQUENCEFILE tables be immune to this 
problem?

> The ESCAPED BY clause does not seem to pick up newlines in colums and the 
> line terminator cannot be changed
> ---
>
> Key: HIVE-1898
> URL: https://issues.apache.org/jira/browse/HIVE-1898
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.5.0
>Reporter: Josh Patterson
>Priority: Minor
>
> If I want to preserve data in columns which contains a newline (webcrawling 
> for instance) I cannot set the ESCAPED BY clause to escape these out (other 
> characters such as commas escape fine, however). This may be due to the line 
> terminators, which are locked to be newlines, are picked up first, and then 
> fields processed. 
> This seems to be related to:
> "SerDe should escape some special characters"
> https://issues.apache.org/jira/browse/HIVE-136
> and
> "Implement "LINES TERMINATED BY""
> https://issues.apache.org/jira/browse/HIVE-302
> where at comment: 
> https://issues.apache.org/jira/browse/HIVE-302?focusedCommentId=12793435&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12793435
> "This is not fixable currently because the line terminator is determined by 
> LineRecordReader.LineReader which is in the Hadoop land."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

2012-07-16 Thread Brian Bloniarz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bloniarz updated HIVE-3198:
-

Attachment: TestStorageHandler.java

Here's a StorageHandler implementation which should help reproduce the bug. 
When I run it like this:
{code}
$ mkdir /tmp/test; touch /tmp/test/part-0
hive> add jar test.jar;
hive> create external table test (a string) STORED BY 'TestStorageHandler' 
location '/tmp/test';
hive> select * from test;
{code}
I see "TESTPROP: hello world", which means that the properties are being setup 
correctly. But if you do:
{code}
hive> select a from test;
{code}
I see "TESTPROP: null", meaning that properties from 
configureInputJobProperties() don't get passed to the getRecordReader() call.

> StorageHandler properties not passed to InputFormat (?)
> ---
>
> Key: HIVE-3198
> URL: https://issues.apache.org/jira/browse/HIVE-3198
> Project: Hive
>  Issue Type: Bug
> Environment: trunk r1352973
>Reporter: Brian Bloniarz
> Attachments: TestStorageHandler.java, inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use 
> configureTableJobProperties to pass properties onto a serde & InputFormat, 
> but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate 
> JobProperties:
> {code}
> public class HiveInputFormat
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>   Reporter reporter) throws IOException {
> HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
> boolean nonNative = false;
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
>   Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), 
> cloneJobConf);
>   nonNative = part.getTableDesc().isNonNative();
> }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf 
> doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-0") but 
> pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit 
> by passing along the directory name inside the HiveInputSplit; this mean we 
> don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

2012-07-12 Thread Brian Bloniarz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413302#comment-13413302
 ] 

Brian Bloniarz commented on HIVE-3198:
--

Hi Navis, sorry it took me so long to get back to you.

Your suggested fix also works & makes the problem go away. Thanks for helping, 
let me know if there's anything else w.r.t. getting this fixed.

> StorageHandler properties not passed to InputFormat (?)
> ---
>
> Key: HIVE-3198
> URL: https://issues.apache.org/jira/browse/HIVE-3198
> Project: Hive
>  Issue Type: Bug
> Environment: trunk r1352973
>Reporter: Brian Bloniarz
> Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use 
> configureTableJobProperties to pass properties onto a serde & InputFormat, 
> but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate 
> JobProperties:
> {code}
> public class HiveInputFormat
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>   Reporter reporter) throws IOException {
> HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
> boolean nonNative = false;
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
>   Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), 
> cloneJobConf);
>   nonNative = part.getTableDesc().isNonNative();
> }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf 
> doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-0") but 
> pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit 
> by passing along the directory name inside the HiveInputSplit; this mean we 
> don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

2012-06-25 Thread Brian Bloniarz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bloniarz updated HIVE-3198:
-

Attachment: inputformat.patch

> StorageHandler properties not passed to InputFormat (?)
> ---
>
> Key: HIVE-3198
> URL: https://issues.apache.org/jira/browse/HIVE-3198
> Project: Hive
>  Issue Type: Bug
> Environment: trunk r1352973
>Reporter: Brian Bloniarz
> Attachments: inputformat.patch
>
>
> I'm working on a custom StorageHandler implementation. I use 
> configureTableJobProperties to pass properties onto a serde & InputFormat, 
> but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate 
> JobProperties:
> {code}
> public class HiveInputFormat
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>   Reporter reporter) throws IOException {
> HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
> boolean nonNative = false;
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
>   Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), 
> cloneJobConf);
>   nonNative = part.getTableDesc().isNonNative();
> }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf 
> doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-0") but 
> pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit 
> by passing along the directory name inside the HiveInputSplit; this mean we 
> don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

2012-06-25 Thread Brian Bloniarz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Bloniarz updated HIVE-3198:
-

Status: Patch Available  (was: Open)

> StorageHandler properties not passed to InputFormat (?)
> ---
>
> Key: HIVE-3198
> URL: https://issues.apache.org/jira/browse/HIVE-3198
> Project: Hive
>  Issue Type: Bug
> Environment: trunk r1352973
>Reporter: Brian Bloniarz
>
> I'm working on a custom StorageHandler implementation. I use 
> configureTableJobProperties to pass properties onto a serde & InputFormat, 
> but it looks to me like the properties aren't present inside the InputFormat.
> I found the following code which looks like it's supposed to propagate 
> JobProperties:
> {code}
> public class HiveInputFormat
> ...
>   public RecordReader getRecordReader(InputSplit split, JobConf job,
>   Reporter reporter) throws IOException {
> HiveInputSplit hsplit = (HiveInputSplit) split;
> ...
> boolean nonNative = false;
> PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
> if ((part != null) && (part.getTableDesc() != null)) {
>   Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), 
> cloneJobConf);
>   nonNative = part.getTableDesc().isNonNative();
> }
> {code}
> In the debugger, I see that part==null so copyTableJobPropertiesToConf 
> doesn't get called. I see that for this table:
> {code}
> create external table test3 () STORED BY 'foo' location '/data/bar';
> {code}
> The InputSplit path is the *file* (i.e. "/data/bar/part-0") but 
> pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").
> I attached a patch which fixes the problem for me; it makes things explicit 
> by passing along the directory name inside the HiveInputSplit; this mean we 
> don't have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-3198) StorageHandler properties not passed to InputFormat (?)

2012-06-25 Thread Brian Bloniarz (JIRA)
Brian Bloniarz created HIVE-3198:


 Summary: StorageHandler properties not passed to InputFormat (?)
 Key: HIVE-3198
 URL: https://issues.apache.org/jira/browse/HIVE-3198
 Project: Hive
  Issue Type: Bug
 Environment: trunk r1352973
Reporter: Brian Bloniarz


I'm working on a custom StorageHandler implementation. I use 
configureTableJobProperties to pass properties onto a serde & InputFormat, but 
it looks to me like the properties aren't present inside the InputFormat.

I found the following code which looks like it's supposed to propagate 
JobProperties:
{code}
public class HiveInputFormat
...
  public RecordReader getRecordReader(InputSplit split, JobConf job,
  Reporter reporter) throws IOException {

HiveInputSplit hsplit = (HiveInputSplit) split;
...
boolean nonNative = false;
PartitionDesc part = pathToPartitionInfo.get(hsplit.getPath().toString());
if ((part != null) && (part.getTableDesc() != null)) {
  Utilities.copyTableJobPropertiesToConf(part.getTableDesc(), cloneJobConf);
  nonNative = part.getTableDesc().isNonNative();
}
{code}

In the debugger, I see that part==null so copyTableJobPropertiesToConf doesn't 
get called. I see that for this table:
{code}
create external table test3 () STORED BY 'foo' location '/data/bar';
{code}
The InputSplit path is the *file* (i.e. "/data/bar/part-0") but 
pathToPartitionInfo has an entry for the *dir* (i.e "/data/bar").

I attached a patch which fixes the problem for me; it makes things explicit by 
passing along the directory name inside the HiveInputSplit; this mean we don't 
have to figure out which files are a part of which partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HIVE-3197) Hive compile errors under Java 7 (JDBC 4.1)

2012-06-25 Thread Brian Bloniarz (JIRA)
Brian Bloniarz created HIVE-3197:


 Summary: Hive compile errors under Java 7 (JDBC 4.1)
 Key: HIVE-3197
 URL: https://issues.apache.org/jira/browse/HIVE-3197
 Project: Hive
  Issue Type: Bug
 Environment: Ubuntu 12.04
Reporter: Brian Bloniarz


Hi, I've been trying to compile Hive trunk from source and getting failures:

{code}
[javac] 
hive-svn/jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveCallableStatement.java:48:
 error: HiveCallableStatement is not abstract and does not override abstract 
method getObject(String,Class) in CallableStatement
[javac] public class HiveCallableStatement implements 
java.sql.CallableStatement {
[javac]^
[javac]   where T is a type-variable:
[javac] T extends Object declared in method 
getObject(String,Class)
{code}

I think this is because JDBC 4.1 is part of Java 7, and is not 
source-compatible with older JDBC versions. Any chance you guys could add JDBC 
4.1 support?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira