[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-09 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-
Description: 
By default, hive only allows user to use single character as field delimiter. 
Although there's RegexSerDe to specify multiple-character delimiter, it can be 
daunting to use, especially for amateurs.
The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, 
users can specify a multiple-character field delimiter when creating tables, in 
a way most similar to typical table creations. For example:



  was:
By default, hive only allows user to use single character as field delimiter. 
Although there's RegexSerDe to specify multiple-character delimiter, it can be 
daunting to use, especially for amateurs.
In the patch, I add a new SerDe named MultiDelimitSerDe. With 
MultiDelimitSerDe, users can specify a multiple-character field delimiter when 
creating tables, in a way most similar to typical table creations.


 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, 
 users can specify a multiple-character field delimiter when creating tables, 
 in a way most similar to typical table creations. For example:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-09 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-
Description: 
By default, hive only allows user to use single character as field delimiter. 
Although there's RegexSerDe to specify multiple-character delimiter, it can be 
daunting to use, especially for amateurs.
The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, 
users can specify a multiple-character field delimiter when creating tables, in 
a way most similar to typical table creations. For example:
{code}
create table test (id string,hivearray arraybinary,hivemap mapstring,int) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH 
SERDEPROPERTIES (field.delim=[,],collection.delim=:,mapkey.delim=@);
{code}
where {{field.delim}} is the field delimiter, {{collection.delim}} and 
{{mapkey.delim}} is the delimiter for collection items and key value pairs, 
respectively. Among these delimiters, {{field.delim}} is mandatory and can be 
of multiple characters, while {{collection.delim}} and {{mapkey.delim}} is 
optional and only support single character.

To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class 
path, e.g. with the {{add jar}} command.

  was:
By default, hive only allows user to use single character as field delimiter. 
Although there's RegexSerDe to specify multiple-character delimiter, it can be 
daunting to use, especially for amateurs.
The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, 
users can specify a multiple-character field delimiter when creating tables, in 
a way most similar to typical table creations. For example:




 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 The patch adds a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, 
 users can specify a multiple-character field delimiter when creating tables, 
 in a way most similar to typical table creations. For example:
 {code}
 create table test (id string,hivearray arraybinary,hivemap mapstring,int) 
 ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
 WITH SERDEPROPERTIES 
 (field.delim=[,],collection.delim=:,mapkey.delim=@);
 {code}
 where {{field.delim}} is the field delimiter, {{collection.delim}} and 
 {{mapkey.delim}} is the delimiter for collection items and key value pairs, 
 respectively. Among these delimiters, {{field.delim}} is mandatory and can be 
 of multiple characters, while {{collection.delim}} and {{mapkey.delim}} is 
 optional and only support single character.
 To use MultiDelimitSerDe, you have to add the hive-contrib jar to the class 
 path, e.g. with the {{add jar}} command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-05 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5871:
---
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Thank you [~lirui]!! I have committed this patch to trunk!

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-09-05 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-5871:
-
Labels: TODOC14  (was: )

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.6.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-12 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Attachment: HIVE-5871.5.patch

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.5.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-06 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Attachment: HIVE-5871.3.patch

Update the patch for latest code base

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-06 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Attachment: HIVE-5871.4.patch

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871.2.patch, HIVE-5871.3.patch, HIVE-5871.4.patch, 
 HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-05 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Status: Open  (was: Patch Available)

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871-v2.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-05 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Attachment: HIVE-5871.2.patch

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871.2.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-05 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Status: Patch Available  (was: Open)

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871.2.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-08-01 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5871:
---

Assignee: Rui Li

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-5871-v2.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-05-19 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Description: 
By default, hive only allows user to use single character as field delimiter. 
Although there's RegexSerDe to specify multiple-character delimiter, it can be 
daunting to use, especially for amateurs.
In the patch, I add a new SerDe named MultiDelimitSerDe. With 
MultiDelimitSerDe, users can specify a multiple-character field delimiter when 
creating tables, in a way most similar to typical table creations.

  was:Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users 
can specify a multiple-character field delimiter when creating tables.


 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
 Attachments: HIVE-5871-v2.patch, HIVE-5871.patch


 By default, hive only allows user to use single character as field delimiter. 
 Although there's RegexSerDe to specify multiple-character delimiter, it can 
 be daunting to use, especially for amateurs.
 In the patch, I add a new SerDe named MultiDelimitSerDe. With 
 MultiDelimitSerDe, users can specify a multiple-character field delimiter 
 when creating tables, in a way most similar to typical table creations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-05-11 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Attachment: HIVE-5871-v2.patch

Fix previous implementation.

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
 Attachments: HIVE-5871-v2.patch, HIVE-5871.patch


 Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can 
 specify a multiple-character field delimiter when creating tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2014-05-11 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Status: Patch Available  (was: Open)

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
 Attachments: HIVE-5871-v2.patch, HIVE-5871.patch


 Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can 
 specify a multiple-character field delimiter when creating tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5871) Use multiple-characters as field delimiter

2013-11-22 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-5871:
-

Attachment: HIVE-5871.patch

This implementation mainly relies on LazySimpleSerDe for serialization and 
deserialization. I added some methods to LazyStruct to parse a row delimited by 
multiple-character string. Another difference from LazySimpleSerDe is that 
MultiDelimitSerDe doesn't use Base64 to encode binary fields in serialization. 
Because the encoded string may interfere with the delimiter. I also modified 
LazyBinary, so that when it deserializes a binary field and is  unable to 
Base64 decode the field, it just keeps the data unchanged. A simple use case is 
as follow:

create table test (id string,hivearray arraybinary,hivemap mapstring,int) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH 
SERDEPROPERTIES 
(field.delimited=[,],collection.delimited=:,mapkey.delimited=@);

where field.delimited is the multiple-char field delimiter. 
collection.delimited is the delimiter for collection items. mapkey.delimited is 
the delimiter for  keys and values in maps. We currently don't support 
multiple-char for these two delimiters.

 Use multiple-characters as field delimiter
 --

 Key: HIVE-5871
 URL: https://issues.apache.org/jira/browse/HIVE-5871
 Project: Hive
  Issue Type: Improvement
  Components: Contrib
Affects Versions: 0.12.0
Reporter: Rui Li
 Attachments: HIVE-5871.patch


 Add a new SerDe named MultiDelimitSerDe. With MultiDelimitSerDe, users can 
 specify a multiple-character field delimiter when creating tables.



--
This message was sent by Atlassian JIRA
(v6.1#6144)