[jira] Created: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line
Add -q command line option to set queue name for Pig jobs from command line --- Key: PIG-1495 URL: https://issues.apache.org/jira/browse/PIG-1495 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 rjurney$ pig -q default This sets the mapred.job.queue.name property in the execution engine from the pig properties for MAPRED type jobs. Patch attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line
[ https://issues.apache.org/jira/browse/PIG-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1495: Status: Patch Available (was: Open) Add -q command line option to set queue name for Pig jobs from command line --- Key: PIG-1495 URL: https://issues.apache.org/jira/browse/PIG-1495 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Attachments: set_queue.patch rjurney$ pig -q default This sets the mapred.job.queue.name property in the execution engine from the pig properties for MAPRED type jobs. Patch attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line
[ https://issues.apache.org/jira/browse/PIG-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1495: Status: Open (was: Patch Available) Add -q command line option to set queue name for Pig jobs from command line --- Key: PIG-1495 URL: https://issues.apache.org/jira/browse/PIG-1495 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Attachments: set_queue.patch rjurney$ pig -q default This sets the mapred.job.queue.name property in the execution engine from the pig properties for MAPRED type jobs. Patch attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line
[ https://issues.apache.org/jira/browse/PIG-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887585#action_12887585 ] Russell Jurney commented on PIG-1495: - This doesn't work yet. Doh! Add -q command line option to set queue name for Pig jobs from command line --- Key: PIG-1495 URL: https://issues.apache.org/jira/browse/PIG-1495 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Attachments: set_queue.patch rjurney$ pig -q default This sets the mapred.job.queue.name property in the execution engine from the pig properties for MAPRED type jobs. Patch attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
[ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney resolved PIG-1476. - Resolution: Fixed This is actually ok. Add trailing flag to commands to prevent retention of relation name in field names: STRIP ? --- Key: PIG-1476 URL: https://issues.apache.org/jira/browse/PIG-1476 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Environment: sunny, 60% humidity with a chance of rain. Reporter: Russell Jurney Fix For: 0.8.0 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy. What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; or foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3; This is a poor choice because it is verbose and cumbersome. With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations
[ https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884557#action_12884557 ] Russell Jurney commented on PIG-1430: - I've been thinking about the feedback at the contributors meeting Monday. I propose that we postpone the addition of a full datetime PIG-1314 type in lieu of the builtins described below. This change is easy and I can do it immediately and get it in 0.8. The original proposal is quite hard, and I can't really estimate when I could have it completed. I'm not sure we need it. There are many other more important things I would rather do. I'd like to remove the piggybank classes org.apache.pig.piggybank.evaluation.datetime.* or at least deprecate them. I'd like to add the following builtins, which act on both ISO8601 datetime strings and long unix times. These could be made into many functions each, but I'd prefer to keep them as short as possible. I suggest we mirror the oracle date/time functions when possible: http://psoug.org/reference/date_func.html * Units When listed below, units are defined as one of: YEAR MONTH WEEK DAY HOUR MINUTE SECOND * Truncations TRUNC(date, unit) or TRUNC_DATE(date, unit) long/epoch input returns long/epoch output. ISO8601 string input returns IS08601 datetime output. * Dates to durations DURATION(date, unit) long/epoch input returns long output in the unit specified. ISO8601 input returns an ISO8601 duration * Adding/subtracting durations and dates: use longs. * Utilities CURRENT_ISOTIME CURRENT_UNIXTIME ISOTOUNIX UNIXTOISO The only ugly part to this is that ISO times are 2nd class citizens in that they cannot be added/subtracted. I'm prepared to live with that :) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations -- Key: PIG-1430 URL: https://issues.apache.org/jira/browse/PIG-1430 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 All functions in contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime should seamlessly accept integer Unix/POSIX times, and return Unix time output when given an int, and ISO output when given a chararray. Note: Unix/POSIX times are the number of seconds elapsed since midnight proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds. See http://en.wikipedia.org/wiki/Unix_time -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884558#action_12884558 ] Russell Jurney commented on PIG-1314: - Been thinking about this... I don't think we should add a full datetime type at this time. See comments in PIG-1314 on alternative approach using builtins. Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884562#action_12884562 ] Russell Jurney commented on PIG-1314: - I suck at JIRA. See proposal in PIG-1430. Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
[ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1476: Description: After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; or foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3; This is a poor choice because it is verbose and cumbersome. With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. was: After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; This is a poor choice because it is verbose and cumbersome. Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. Add trailing flag to commands to prevent retention of relation name in field names: STRIP ? --- Key: PIG-1476 URL: https://issues.apache.org/jira/browse/PIG-1476 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Environment: sunny, 60% humidity with a chance of rain. Reporter: Russell Jurney Fix For: 0.8.0 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; or foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3; This is a poor choice because it is verbose and cumbersome. With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
Add trailing flag to commands to prevent retention of relation name in field names: STRIP ? --- Key: PIG-1476 URL: https://issues.apache.org/jira/browse/PIG-1476 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Environment: sunny, 60% humidity with a chance of rain. Reporter: Russell Jurney Fix For: 0.8.0 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; This is a poor choice because it is verbose and cumbersome. Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?
[ https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1476: Description: After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy. What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; or foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3; This is a poor choice because it is verbose and cumbersome. With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. was: After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; or foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3; This is a poor choice because it is verbose and cumbersome. With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. Add trailing flag to commands to prevent retention of relation name in field names: STRIP ? --- Key: PIG-1476 URL: https://issues.apache.org/jira/browse/PIG-1476 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Environment: sunny, 60% humidity with a chance of rain. Reporter: Russell Jurney Fix For: 0.8.0 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like: DESCRIBE foo; foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int} If oun was to let this chain, ouin can end up with: first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy. What wunn usually wants is: foo: {f1:int, f2:chararray, f3: int} At this point, won is left with two choices, neither of which is very good. Choice wan: foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3; This is a poor choice because later when wahn edits this file, it is confusing to remember what order is what field when wun manipulates something up stream in the script. So instead whun does this: foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, old_thing::f3 AS f3; or foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3; This is a poor choice because it is verbose and cumbersome. With no good choices available, whan is unsure what to do, pauses and reflects that the Pig is perplexing, and hopes for a better tomorrow. Here's what wuhn should do to avoid this situation: foo = JOIN old_thing by f1, other_thing BY f1 STRIP; DESCRIBE foo foo: {f1:int, f2:chararray, f3: int}; I think so, anyway. I leave the behavior of duplicate fields to more enlightened beings, but I think this would be a big improvement to Pig Latin. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880731#action_12880731 ] Russell Jurney commented on PIG-1429: - I'll be able to wrap this up next weekend. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877544#action_12877544 ] Russell Jurney commented on PIG-1429: - The patch needs more work. Should knock it out in the next couple weeks. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1436) Print number of records outputted at each step of a Pig script
Print number of records outputted at each step of a Pig script -- Key: PIG-1436 URL: https://issues.apache.org/jira/browse/PIG-1436 Project: Pig Issue Type: New Feature Components: grunt Affects Versions: 0.7.0 Reporter: Russell Jurney Priority: Minor Fix For: 0.8.0 I often run a script multiple times, or have to go and look through Hadoop task logs, to figure out where I broke a long script in such a way that I get 0 records out of it. I think this is a common problem. If someone can point me in the right direction, I can make a pass at this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873629#action_12873629 ] Russell Jurney commented on PIG-1429: - Some more work to be done with operators. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873664#action_12873664 ] Russell Jurney commented on PIG-1314: - Hmmm not sure if I should use durations or periods, or both. See http://joda-time.sourceforge.net/apidocs/org/joda/time/Period.html Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1429: Attachment: boolean.patch Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: boolean.patch, boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1429: Attachment: (was: boolean.patch) Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873474#action_12873474 ] Russell Jurney commented on PIG-1429: - Did some more work, have a new patch... seems the problem is in PigMapBase.runPipeline: protected void runPipeline(PhysicalOperator leaf) throws IOException, InterruptedException { while(true){ String foo = ; String bar = ; Result res = leaf.getNext(DUMMYTUPLE); res is NULL, so it dies. The leaf is: (Name: A: New For Each(false,false)[bag] - 1-13 Operator Key: 1-13) Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1429: Attachment: working_boolean.patch Attached patch can LOAD/DUMP a boolean type :D I'll work on more tests, but it roughly works. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: boolean.patch, working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1429: Attachment: (was: boolean.patch) Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: working_boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1429: Attachment: boolean.patch Broken patch that adds boolean type. Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig
[ https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873363#action_12873363 ] Russell Jurney commented on PIG-1429: - Did the work I think is required based on Alan's comments in PIG-1314 and help from Dmitriy. It builds - I still have to add tests (may be the only way to fix this), but I'm hoping someone can ID my problem. I keep getting the exception below. Anyone know where I should look? I've traced this through, and nothing stands out. - org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received Error while processing the map plan. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:228) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) 2010-05-29 20:04:25,363 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001 2010-05-29 20:04:29,866 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2010-05-29 20:04:29,866 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map reduce job(s) failed! 2010-05-29 20:04:29,868 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed to produce result in: file:/tmp/temp-537038699/tmp-381529216 2010-05-29 20:04:29,868 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Records written : Unable to determine number of records written 2010-05-29 20:04:29,868 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Bytes written : Unable to determine number of bytes written 2010-05-29 20:04:29,868 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Spillable Memory Manager spill count : 0 2010-05-29 20:04:29,869 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Proactive spill count : 0 2010-05-29 20:04:29,869 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2010-05-29 20:04:29,872 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2010-05-29 20:04:29,876 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias A 2010-05-29 20:04:29,876 [main] ERROR org.apache.pig.tools.grunt.Grunt - org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias A at org.apache.pig.PigServer.openIterator(PigServer.java:663) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:598) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:291) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76) at org.apache.pig.Main.main(Main.java:410) Caused by: java.io.IOException: Job terminated with anomalous status FAILED at org.apache.pig.PigServer.openIterator(PigServer.java:657) ... 6 more Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.8.0 Attachments: boolean.patch Original Estimate: 8h Remaining Estimate: 8h Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873382#action_12873382 ] Russell Jurney commented on PIG-1314: - Ok, thinking about really doing this soon, after Boolean. I'd like to add two new primitives to Pig - DateTime and Duration. I'd do this on the wiki, but I don't have edit access. Can someone please grant the ability to make a new page to user RussellJurney on the Pig wiki? Design Notes: 1) I'd like to use Jodatime for this, as I did in the DateTime UDFs. It is possible to use the Java date libs, but it would be painful to do so. Jodatime also performs better than Java's native date classes. It is Apache 2.0 licensed and is already pulled in via ivy in the DateTime UDFs - see PIG-1310 2) Date Format for text/dumps: ISO8601. Looks like: [][MM][DD]T[hh][mm]Z It is a human readable, sortable/comparable, international standard. See http://en.wikipedia.org/wiki/ISO_8601#Dates 2.5) In memory type: org.joda.time.DateTime. See http://joda-time.sourceforge.net/apidocs/org/joda/time/DateTime.html The internal format of jodatime is a Long epoch/Unix/POSIX time. See http://joda-time.sourceforge.net/faq.html#internalstorage 3) Duration Format for text/dumps: ISO8601. Looks like: P[n]Y[n]M[n]DT[n]H[n]M[n]S It is a human readable, sortable/comparable, international standard. See http://en.wikipedia.org/wiki/ISO_8601#Durations 3.5) In-memory format: org.joda.time.Duration. See http://joda-time.sourceforge.net/apidocs/org/joda/time/Duration.html 4) All date functions in PIG-1310 should be included, except those replaced by the use of operators on datetimes and durations. Adding/subtracting datetimes should result in a duration. Durations can be added/subtracted/divided/multiplied/negated. Date/Duration truncation, date differences, date parsing/conversion should be included. Conversion from int/long POSIX, SQL and datemonth should be included. Conversion from any string with a DateFormat string should be included. 5) Casting to and from Integer and Long should be supported, as a Unix/POSIX time. Casting to/from chararray in ISO8601 format should be supported. Comments? Suggestions? Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1429) Add Boolean Data Type to Pig
Add Boolean Data Type to Pig Key: PIG-1429 URL: https://issues.apache.org/jira/browse/PIG-1429 Project: Pig Issue Type: New Feature Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Pig needs a Boolean data type. Pig-1097 is dependent on doing this. I volunteer. Is there anything beyond the work in src/org/apache/pig/data/ plus unit tests to make this work? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations
ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations -- Key: PIG-1430 URL: https://issues.apache.org/jira/browse/PIG-1430 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 All functions in contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime should seamlessly accept integer Unix/POSIX times, and return Unix time output when given an int, and ISO output when given a chararray. Note: Unix/POSIX times are the number of seconds elapsed since midnight proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds. See http://en.wikipedia.org/wiki/Unix_time -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations
[ https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873217#action_12873217 ] Russell Jurney commented on PIG-1430: - Actually, I think it should interpret int as unix time in seconds, and long as unix time in miliseconds. Thoughts? ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations -- Key: PIG-1430 URL: https://issues.apache.org/jira/browse/PIG-1430 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 All functions in contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime should seamlessly accept integer Unix/POSIX times, and return Unix time output when given an int, and ISO output when given a chararray. Note: Unix/POSIX times are the number of seconds elapsed since midnight proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds. See http://en.wikipedia.org/wiki/Unix_time -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1431) Current DateTime UDFs: ISONOW(), UNIXNOW()
Current DateTime UDFs: ISONOW(), UNIXNOW() -- Key: PIG-1431 URL: https://issues.apache.org/jira/browse/PIG-1431 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Need a NOW() for getting datetime diffs between now and a prior or future date. Will use the system timezone. Will make one for ISO datetime and one for Unix time. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations
[ https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873233#action_12873233 ] Russell Jurney commented on PIG-1430: - Good idea, will do! ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations -- Key: PIG-1430 URL: https://issues.apache.org/jira/browse/PIG-1430 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 All functions in contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime should seamlessly accept integer Unix/POSIX times, and return Unix time output when given an int, and ISO output when given a chararray. Note: Unix/POSIX times are the number of seconds elapsed since midnight proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting leap seconds. See http://en.wikipedia.org/wiki/Unix_time -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873234#action_12873234 ] Russell Jurney commented on PIG-1314: - As a first pass, I am going to add Boolean, which should be easier than DateTime, but will inform this implementation. See PIG-1429 Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Attachment: addconcat2.patch Fixed bad comment re: copying bytes. Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: addconcat2.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868765#action_12868765 ] Russell Jurney commented on PIG-1420: - Dmitriy, it applies with -p1 Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: concat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Attachment: addconcat.patch New, working patch made with git diff --no-prefix, applies with -p0 Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: addconcat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Attachment: (was: concat.patch) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: addconcat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Patch Info: [Patch Available] Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: addconcat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Status: In Progress (was: Patch Available) I don't know what resume progress does, but I'm about to find out. Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: concat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Status: Open (was: Patch Available) Redoing CONCAT of DataByteArrays using java.nio.ByteBuffer Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: concat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Status: Patch Available (was: Open) Re-submitting original, java.nio.ByteBuffer isn't very helpful. Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: concat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Attachment: concat.patch Patch that adds: 1) CONCAT handles all fields in the supplied tuple, instead of just the first two. 2) StringConcat handles all fields in the supplied tuple, instead of just the first two. 3) DataByteArray gets an append() to make the implementation of 1 2 clean (I think). 4) Unit Tests for CONCAT and StringCONCAT in TestBuiltin 5) Unit Tests for DataByteArray.append() in TestDataModel Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: concat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple
[ https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1420: Status: Patch Available (was: Open) Release Note: CONCAT handles all fields in the supplied tuple, instead of just the first two. This is backwards compatible unless you were relying on it only using the first two fields, which seems unlikely. DataByteArray now has an append() method. Example use before: B = FOREACH A GENERATE CONCAT(CONCAT(first_name, ' '), last_name); Example extended use now: D = FOREACH C GENERATE CONCAT(first_name, ' ', last_name); Passes all tests for me. I like Asparagus. Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple - Key: PIG-1420 URL: https://issues.apache.org/jira/browse/PIG-1420 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Attachments: concat.patch Original Estimate: 24h Remaining Estimate: 24h org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and org.apache.pig.builtin.StringConcat (which acts on Strings internally), both act on the first two fields of a tuple. This results in ugly nested CONCAT calls like: CONCAT(CONCAT(A, ' '), B) The more desirable form is: CONCAT(A, ' ', B) This change will be backwards compatible, provided that no one was relying on the fact that CONCAT ignores fields after the first two in a tuple. This seems a reasonable assumption to make, or at least a small break in compatibility for a sizable improvement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851542#action_12851542 ] Russell Jurney commented on PIG-1310: - Cool - one thing though - Piggybank itself does not build in trunk. It must not have built since 0.6, since the load/store func changes went in. Does something need to be done there? Should I submit a patch that removes all the broken UDFs to make ant build in piggybank work on trunk? To get piggybank to build, I had to remove: ! contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java ! contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestSequenceFileLoader.java ! contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestRegExLoader.java ! contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/TestPigStorageSchema.java ! contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/string/TestLookupInFiles.java ! contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/TestEvalString.java Is this just me, is this fixed on other branches? ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Assignee: Russell Jurney Fix For: 0.7.0 Attachments: joda-mavenstuff.diff, pass.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: datetime3.patch) ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: combined.patch, joda-mavenstuff.diff Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: combined.patch All inclusive patch from pig root, includes all classes and tests, and ivy updates for jodatime. Applied this to a fresh trunk svn checkout, and all works ok - once I remove failing tests in piggybank unrelated to this commit. ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: combined.patch, joda-mavenstuff.diff Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: tests.patch) ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: combined.patch, joda-mavenstuff.diff Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: combined.patch) ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: joda-mavenstuff.diff Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: deargod.patch Last shot at a patch - this builds and tests ok against a fresh checkout of pig (once all the unrelated broken tests are rm'd). ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: deargod.patch, joda-mavenstuff.diff Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: deargod.patch) ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: joda-mavenstuff.diff Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: pass.patch Ok, this patch works. ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: joda-mavenstuff.diff, pass.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: datetime.patch) ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: datetime2.patch) ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849623#action_12849623 ] Russell Jurney commented on PIG-1310: - Oh, I'm allergic to XML. Seriously allergic. Can someone purty please help me out with the ivy bit? ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime3.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Summary: ISO Date UDFs: Conversion, Trucation and Date Math (was: ISO Date UDFs: Conversion, Rounding and Date Math) ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime3.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: tests.patch Tests attached. ISO Date UDFs: Conversion, Trucation and Date Math -- Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime3.patch, joda-mavenstuff.diff, tests.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849528#action_12849528 ] Russell Jurney commented on PIG-1310: - Thanks, Alan, I'll add all those changes tonight. I confess to not really testing CustomFormatToISO other than the test case, I'll update the docs :) As to ISO format - I will link to it and jodatime, and I would suggest ISO8601 be the standard representation of datetimes in Pig, as it handles time zones and is sortable as text - which is nice. ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime.patch, datetime2.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: (was: datetime.patch) ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848618#action_12848618 ] Russell Jurney commented on PIG-1314: - I would not say this blocks PIG-1310 at all - the UDFs there simply treat ISO dates as strings, which works reasonably well. They should also handle Long unix times, and will in a next patch. In any case, this isn't a blocker to that ticket, for which a patch was just submitted. Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848958#action_12848958 ] Russell Jurney commented on PIG-1310: - Alan, yes jodatime is on a maven repo - I have it pulling via ivy into my local pig trunk. Wasn't sure what to do in piggybank since there was no ivy.xml, but I will look at the maven docs and add it. ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1310: Attachment: datetime2.patch Added checks for null/ foo sized tuples. Added CustomFormatToISO class and test, allowing any date format to be parsed in jodatime. This patch is a replacement to the previous one. One hitch - I'm out of time tonight and me and build XML's do not get on well. Any chance someone more familiar with ivy can add jodatime to piggybank's build.xml? I got it working easily in the pig project itself, but am not sure how to get ivy going in piggybank. The working dependency I put in Pig's build.xml is: dependency org=joda-time name=joda-time rev=${joda-time.version} conf=compile-master/ And libraries.properties got: joda-time.version=1.6 And it worked. ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime.patch, datetime2.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1314) Add DateTime Support to Pig
Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.8.0 Reporter: Russell Jurney Fix For: 0.7.0 Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848356#action_12848356 ] Russell Jurney commented on PIG-1314: - Thanks, Alan. That is quite helpful. Let me look into it and see about feasibility. What about durations as well? http://en.wikipedia.org/wiki/ISO_8601#Durations ISO8601 durations would be very handy in enabling use of pig operators on datetimes via +/-, etc. This might be something to do later, though. Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847874#action_12847874 ] Russell Jurney commented on PIG-1310: - I'm thinking it would be good if DateTime was a Pig primitive. Can someone give me an idea how much work it is to add a Pig primitive, and if this patch would be accepted for 0.8? ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.8.0 Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847600#action_12847600 ] Russell Jurney commented on PIG-1150: - Yes, this sounds like the thing to do :) On Tue, Mar 16, 2010 at 5:29 PM, Dmitriy V. Ryaboy (JIRA) VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Fix For: 0.7.0 Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Russell Jurney updated PIG-1150: Attachment: var.patch This patch will not cut the mustard - it lacks Javadoc, and test cases and its just plain ugly. That being said, people requested this on twitter, so I'm pushing this one for people to use if they want to. Will get a passable patch up later this week. VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Fix For: 0.7.0 Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1150) VAR() Variance UDF
[ https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791582#action_12791582 ] Russell Jurney commented on PIG-1150: - Oh - one other thing - I've read that this naive parallel method of calculating variance can have precision problems - all those double's getting subtracted from one another and then squared. I've thought of using BigDecimal, which can handle arbitrary precision numbers. My understanding is that this would be slow, but that it would probably still be IO bound. Is that something people would like to see? I could maybe make another UDF that uses BigDecimal or something. I've never actually encountered the precision problems in practice, but I can see how that might be a big problem for some people. VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Fix For: 0.7.0 Attachments: var.patch I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1150) VAR() Variance UDF
VAR() Variance UDF -- Key: PIG-1150 URL: https://issues.apache.org/jira/browse/PIG-1150 Project: Pig Issue Type: New Feature Affects Versions: 0.5.0 Environment: UDF, written in Pig 0.5 contrib/ Reporter: Russell Jurney Fix For: 0.5.0 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates variance in a distributed manner, based on the AVG() builtin. It works by calculating the count, sum and sum of squares, as described here: http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm Is this a worthwhile contribution? Taking the square root of this value using the contrib SQRT() function gives Standard Deviation, which is missing from Pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-896) Pig doesnt run on Mac OSX
[ https://issues.apache.org/jira/browse/PIG-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735729#action_12735729 ] Russell Jurney commented on PIG-896: Not sure if you're able to get it running or not, but Pig will run on OS X 10.5 if you set: export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home For previous versions of OS X you have to upgrade to 1.6, but for 10.5.7/Pig 0.3.0 at least - do this and it will 'just work.' I'm not sure this constitutes a bug, since JAVA_HOME is an environment variable. Pig doesnt run on Mac OSX - Key: PIG-896 URL: https://issues.apache.org/jira/browse/PIG-896 Project: Pig Issue Type: Bug Components: build Environment: Mac OSX Reporter: Rajagopal Natarajan There are harcoded references like $JAVA_HOME/bin/java in the pig run scripts. Due to this it fails on Mac OSX. It would be nice if pig would be supported on Mac -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.