[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Description: By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example: was: By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Description: By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example: {code} create table test (id string,hivearray arraybinary,hivemap mapstring,int) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES (field.delim=[,],collection.delim=:,mapkey.delim=@); {code} where {{field.delim}} is the field delimiter, {{collection.delim}} and {{mapkey.delim}} is the delimiter for collection items and key value pairs, respectively. Among these delimiters, {{field.delim}} is mandatory and can be of multiple characters, while {{collection.delim}} and {{mapkey.delim}} is optional and only support single character. To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class path, e.g. with the {{add jar}} command. was: By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example: Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. For example: {code} create table test (id string,hivearray arraybinary,hivemap mapstring,int) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES (field.delim=[,],collection.delim=:,mapkey.delim=@); {code} where {{field.delim}} is the field delimiter, {{collection.delim}} and {{mapkey.delim}} is the delimiter for collection items and key value pairs, respectively. Among these delimiters, {{field.delim}} is mandatory and can be of multiple characters, while {{collection.delim}} and {{mapkey.delim}} is optional and only support single character. To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class path, e.g. with the {{add jar}} command. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5871: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you [~lirui]!! I have committed this patch to trunk! Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-5871: - Labels: TODOC14 (was: ) Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Attachment: HIVE-5871.5.patch Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.5.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Attachment: HIVE-5871.3.patch Update the patch for latest code base Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Attachment: HIVE-5871.4.patch Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Status: Open (was: Patch Available) Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871-v2.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Attachment: HIVE-5871.2.patch Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871.2.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Status: Patch Available (was: Open) Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871.2.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5871: --- Assignee: Rui Li Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Assignee: Rui Li Attachments: HIVE-5871-v2.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Description: By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. was:Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Attachments: HIVE-5871-v2.patch, HIVE-5871.patch By default, hive only allows user to use single character as field delimiter. Although there's RegexSerDe to specify multiple-character delimiter, it can be daunting to use, especially for amateurs. In the patch, I add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables, in a way most similar to typical table creations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Attachment: HIVE-5871-v2.patch Fix previous implementation. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Attachments: HIVE-5871-v2.patch, HIVE-5871.patch Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Status: Patch Available (was: Open) Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Attachments: HIVE-5871-v2.patch, HIVE-5871.patch Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter
[ https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-5871: - Attachment: HIVE-5871.patch This implementation mainly relies on LazySimpleSerDe for serialization and deserialization. I added some methods to LazyStruct to parse a row delimited by multiple-character string. Another difference from LazySimpleSerDe is that MultiDelimitSerDe doesn't use Base64 to encode binary fields in serialization. Because the encoded string may interfere with the delimiter. I also modified LazyBinary, so that when it deserializes a binary field and is unable to Base64 decode the field, it just keeps the data unchanged. A simple use case is as follow: create table test (id string,hivearray arraybinary,hivemap mapstring,int) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES (field.delimited=[,],collection.delimited=:,mapkey.delimited=@); where field.delimited is the multiple-char field delimiter. collection.delimited is the delimiter for collection items. mapkey.delimited is the delimiter for keys and values in maps. We currently don't support multiple-char for these two delimiters. Use multiple-characters as field delimiter -- Key: HIVE-5871 URL: https://issues.apache.org/jira/browse/HIVE-5871 Project: Hive Issue Type: Improvement Components: Contrib Affects Versions: 0.12.0 Reporter: Rui Li Attachments: HIVE-5871.patch Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can specify a multiple-character field delimiter when creating tables. -- This message was sent by Atlassian JIRA (v6.1#6144)