[jira] Created: (PIG-1015) [piggybank] DateExtractor should take into account timezones
[piggybank] DateExtractor should take into account timezones Key: PIG-1015 URL: https://issues.apache.org/jira/browse/PIG-1015 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy The current implementation defaults to the local timezone when parsing strings, thereby providing inconsistent results depending on the settings of the computer the program is executing on (this is causing unit test failures). We should set the timezone to a consistent default, and allow users to override this default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1015) [piggybank] DateExtractor should take into account timezones
[ https://issues.apache.org/jira/browse/PIG-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1015: --- Attachment: date_extractor.patch Note that this changes the contract slightly, as the DateExtractor extracts dates in GMT by default, whereas before it extracted them in system's local time. [piggybank] DateExtractor should take into account timezones Key: PIG-1015 URL: https://issues.apache.org/jira/browse/PIG-1015 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: date_extractor.patch The current implementation defaults to the local timezone when parsing strings, thereby providing inconsistent results depending on the settings of the computer the program is executing on (this is causing unit test failures). We should set the timezone to a consistent default, and allow users to override this default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1015) [piggybank] DateExtractor should take into account timezones
[ https://issues.apache.org/jira/browse/PIG-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1015: --- Fix Version/s: 0.6.0 Status: Patch Available (was: Open) [piggybank] DateExtractor should take into account timezones Key: PIG-1015 URL: https://issues.apache.org/jira/browse/PIG-1015 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Fix For: 0.6.0 Attachments: date_extractor.patch The current implementation defaults to the local timezone when parsing strings, thereby providing inconsistent results depending on the settings of the computer the program is executing on (this is causing unit test failures). We should set the timezone to a consistent default, and allow users to override this default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-868) indexof / lastindexof / lower / replace / substring udf's
[ https://issues.apache.org/jira/browse/PIG-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764533#action_12764533 ] Dmitriy V. Ryaboy commented on PIG-868: --- The dateExtractor issue is addressed by PIG-1015 ; just changing the testcase is not sufficient, as the testcase will still break in some parts of the world because it relies on local settings. indexof / lastindexof / lower / replace / substring udf's - Key: PIG-868 URL: https://issues.apache.org/jira/browse/PIG-868 Project: Pig Issue Type: New Feature Reporter: Bennie Schut Priority: Trivial Attachments: addSomeUDFsPatch.patch, dateExtractorPatch.patch We parse some apache logs using pig and are using some pretty simple udf's like this: B = FOREACH A GENERATE substring(uri, lastindexof(uri, '/')+1, indexof(uri, '.txt')) as lang; It's pretty simple stuff but I figured someone else might find it useful. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support
[ https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-986: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I just committed this. Thanks Yan. [zebra] Zebra Column Group Naming Support - Key: PIG-986 URL: https://issues.apache.org/jira/browse/PIG-986 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, ColumnGroupName.patch We introduce column group name to Zebra and make it a first-class citizen in Zebra. This can ease management of column groups. We plan to introduce an as clause for column group name in Zebra's syntax. Functional Specifications: 1) Column group names are optional. For column groups which do not have a user-provided name, Zebra will assign some default column group names internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is used by user, then it can not be used for internal names. 2) We introduce an AS clause in Zebra's syntax for column group names. If it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI secure by user:joe group:secure perm:640; [a3, a4] as General compress by lzo. Note that keyword AS is case insensitive. 3) Column group names are unique within one table and are case sensitive, i.e., c1 and C1 are different. 4) Column group names will be used as the physical column group directory path names. 5) Zebra V2 will support dropColumnGroup by column group names (will integrate with Raghu's A29 drop column work). 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created tables in production when V2 is released). More specifically, this means that Zebra V2 can load from V1-created tables and do dropColumnGroup on it. 7) Does NOT support renaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-993) [zebra] Abitlity to drop a column group in a table
[ https://issues.apache.org/jira/browse/PIG-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12764552#action_12764552 ] Raghu Angadi commented on PIG-993: -- This patch depends on PIG-992. It is not a functional dependency and can be removed if required. [zebra] Abitlity to drop a column group in a table -- Key: PIG-993 URL: https://issues.apache.org/jira/browse/PIG-993 Project: Pig Issue Type: Bug Reporter: Raghu Angadi Assignee: Raghu Angadi Fix For: 0.6.0 Attachments: DropColumnGroupExample.java, zebra-drop-cg.patch, zebra-drop-cg.patch A Zebra table is stored as multiple sub tables each containing a set of columns called column group (CG). The user specifies how these columns are grouped while creating a table through the _storage hint_. For some of the large tables, it might be necessary for users to remove a set of columns and retain the rest. This jira provides a way for users to delete an entire column group. The following comments will have more details on API and the semantics. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.