[jira] Created: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line

2010-07-12 Thread Russell Jurney (JIRA)
Add -q command line option to set queue name for Pig jobs from command line
---

 Key: PIG-1495
 URL: https://issues.apache.org/jira/browse/PIG-1495
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


rjurney$ pig -q default

This sets the mapred.job.queue.name property in the execution engine from the 
pig properties for MAPRED type jobs.  

Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line

2010-07-12 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1495:


Status: Patch Available  (was: Open)

 Add -q command line option to set queue name for Pig jobs from command line
 ---

 Key: PIG-1495
 URL: https://issues.apache.org/jira/browse/PIG-1495
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

 Attachments: set_queue.patch


 rjurney$ pig -q default
 This sets the mapred.job.queue.name property in the execution engine from the 
 pig properties for MAPRED type jobs.  
 Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line

2010-07-12 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1495:


Status: Open  (was: Patch Available)

 Add -q command line option to set queue name for Pig jobs from command line
 ---

 Key: PIG-1495
 URL: https://issues.apache.org/jira/browse/PIG-1495
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

 Attachments: set_queue.patch


 rjurney$ pig -q default
 This sets the mapred.job.queue.name property in the execution engine from the 
 pig properties for MAPRED type jobs.  
 Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1495) Add -q command line option to set queue name for Pig jobs from command line

2010-07-12 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887585#action_12887585
 ] 

Russell Jurney commented on PIG-1495:
-

This doesn't work yet.  Doh!

 Add -q command line option to set queue name for Pig jobs from command line
 ---

 Key: PIG-1495
 URL: https://issues.apache.org/jira/browse/PIG-1495
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

 Attachments: set_queue.patch


 rjurney$ pig -q default
 This sets the mapred.job.queue.name property in the execution engine from the 
 pig properties for MAPRED type jobs.  
 Patch attached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

2010-07-06 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney resolved PIG-1476.
-

Resolution: Fixed

This is actually ok.

 Add trailing flag to commands to prevent retention of relation name in field 
 names: STRIP ?
 ---

 Key: PIG-1476
 URL: https://issues.apache.org/jira/browse/PIG-1476
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
 Environment: sunny, 60% humidity with a chance of rain.
Reporter: Russell Jurney
 Fix For: 0.8.0


 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking 
 like:
  DESCRIBE foo;
foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
 If oun was to let this chain, ouin can end up with: 
 first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty 
 hairy.
 What wunn usually wants is:
foo: {f1:int, f2:chararray, f3: int}
 At this point, won is left with two choices, neither of which is very good.  
 Choice wan:
  foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
 This is a poor choice because later when wahn edits this file, it is 
 confusing to remember what order is what field when wun manipulates something 
 up stream in the script.  So instead whun does this:
  foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
  old_thing::f3 AS f3;
 or
  foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
 This is a poor choice because it is verbose and cumbersome.
 With no good choices available, whan is unsure what to do, pauses and 
 reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's 
 what wuhn should do to avoid this situation:
 foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
 DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};
 I think so, anyway.  I leave the behavior of duplicate fields to more 
 enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations

2010-07-02 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884557#action_12884557
 ] 

Russell Jurney commented on PIG-1430:
-

I've been thinking about the feedback at the contributors meeting Monday. I 
propose that we postpone the addition of a full datetime PIG-1314 type in lieu 
of the builtins described below. This change is easy and I can do it 
immediately and get it in 0.8. The original proposal is quite hard, and I can't 
really estimate when I could have it completed. I'm not sure we need it. There 
are many other more important things I would rather do. 

I'd like to remove the piggybank classes 
org.apache.pig.piggybank.evaluation.datetime.* or at least deprecate them. 

I'd like to add the following builtins, which act on both ISO8601 datetime 
strings and long unix times. These could be made into many functions each, but 
I'd prefer to keep them as short as possible. I suggest we mirror the oracle 
date/time functions when possible: http://psoug.org/reference/date_func.html 

* Units 

When listed below, units are defined as one of: 

YEAR 
MONTH 
WEEK 
DAY 
HOUR 
MINUTE 
SECOND 

* Truncations 

TRUNC(date, unit) or TRUNC_DATE(date, unit) 

long/epoch input returns long/epoch output. 
ISO8601 string input returns IS08601 datetime output. 

* Dates to durations 

DURATION(date, unit) 

long/epoch input returns long output in the unit specified. 
ISO8601 input returns an ISO8601 duration 

* Adding/subtracting durations and dates: use longs. 

* Utilities 

CURRENT_ISOTIME 
CURRENT_UNIXTIME 
ISOTOUNIX 
UNIXTOISO 

The only ugly part to this is that ISO times are 2nd class citizens in that 
they cannot be added/subtracted. I'm prepared to live with that :)

 ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix 
 Times in All Operations
 --

 Key: PIG-1430
 URL: https://issues.apache.org/jira/browse/PIG-1430
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


 All functions in 
 contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime
  should seamlessly accept integer Unix/POSIX times, and return Unix time 
 output when given an int, and ISO output when given a chararray.
 Note: Unix/POSIX times are the number of seconds elapsed since midnight 
 proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting 
 leap seconds.  See http://en.wikipedia.org/wiki/Unix_time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-07-02 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884558#action_12884558
 ] 

Russell Jurney commented on PIG-1314:
-

Been thinking about this... I don't think we should add a full datetime type at 
this time.  See comments in PIG-1314 on alternative approach using builtins.

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-07-02 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884562#action_12884562
 ] 

Russell Jurney commented on PIG-1314:
-

I suck at JIRA. See proposal in PIG-1430.



 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

2010-06-30 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1476:


Description: 
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

 DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  
Choice wan:

 foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing 
to remember what order is what field when wun manipulates something up stream 
in the script.  So instead whun does this:

 foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
 old_thing::f3 AS f3;

or

 foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

With no good choices available, whan is unsure what to do, pauses and reflects 
that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn 
should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more 
enlightened beings, but I think this would be a big improvement to Pig Latin.


  was:
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

 DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  
Choice wan:

 foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing 
to remember what order is what field when wun manipulates something up stream 
in the script.  So instead whun does this:

 foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
 old_thing::f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and 
hopes for a better tomorrow.  Here's what wuhn should do to avoid this 
situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more 
enlightened beings, but I think this would be a big improvement to Pig Latin.



 Add trailing flag to commands to prevent retention of relation name in field 
 names: STRIP ?
 ---

 Key: PIG-1476
 URL: https://issues.apache.org/jira/browse/PIG-1476
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
 Environment: sunny, 60% humidity with a chance of rain.
Reporter: Russell Jurney
 Fix For: 0.8.0


 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking 
 like:
  DESCRIBE foo;
foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
 What wunn usually wants is:
foo: {f1:int, f2:chararray, f3: int}
 At this point, won is left with two choices, neither of which is very good.  
 Choice wan:
  foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
 This is a poor choice because later when wahn edits this file, it is 
 confusing to remember what order is what field when wun manipulates something 
 up stream in the script.  So instead whun does this:
  foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
  old_thing::f3 AS f3;
 or
  foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
 This is a poor choice because it is verbose and cumbersome.
 With no good choices available, whan is unsure what to do, pauses and 
 reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's 
 what wuhn should do to avoid this situation:
 foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
 DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};
 I think so, anyway.  I leave the behavior of duplicate fields to more 
 enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

2010-06-29 Thread Russell Jurney (JIRA)
Add trailing flag to commands to prevent retention of relation name in field 
names: STRIP ?
---

 Key: PIG-1476
 URL: https://issues.apache.org/jira/browse/PIG-1476
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
 Environment: sunny, 60% humidity with a chance of rain.
Reporter: Russell Jurney
 Fix For: 0.8.0


After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

 DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  
Choice wan:

 foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing 
to remember what order is what field when wun manipulates something up stream 
in the script.  So instead whun does this:

 foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
 old_thing::f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and 
hopes for a better tomorrow.  Here's what wuhn should do to avoid this 
situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more 
enlightened beings, but I think this would be a big improvement to Pig Latin.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1476) Add trailing flag to commands to prevent retention of relation name in field names: STRIP ?

2010-06-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1476:


Description: 
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

 DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

If oun was to let this chain, ouin can end up with: 
first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty hairy.

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  
Choice wan:

 foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing 
to remember what order is what field when wun manipulates something up stream 
in the script.  So instead whun does this:

 foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
 old_thing::f3 AS f3;

or

 foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

With no good choices available, whan is unsure what to do, pauses and reflects 
that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn 
should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more 
enlightened beings, but I think this would be a big improvement to Pig Latin.


  was:
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:

 DESCRIBE foo;

   foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}

What wunn usually wants is:

   foo: {f1:int, f2:chararray, f3: int}

At this point, won is left with two choices, neither of which is very good.  
Choice wan:

 foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;

This is a poor choice because later when wahn edits this file, it is confusing 
to remember what order is what field when wun manipulates something up stream 
in the script.  So instead whun does this:

 foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
 old_thing::f3 AS f3;

or

 foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;

This is a poor choice because it is verbose and cumbersome.

With no good choices available, whan is unsure what to do, pauses and reflects 
that the Pig is perplexing, and hopes for a better tomorrow.  Here's what wuhn 
should do to avoid this situation:

foo = JOIN old_thing by f1, other_thing BY f1 STRIP;

DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};

I think so, anyway.  I leave the behavior of duplicate fields to more 
enlightened beings, but I think this would be a big improvement to Pig Latin.



 Add trailing flag to commands to prevent retention of relation name in field 
 names: STRIP ?
 ---

 Key: PIG-1476
 URL: https://issues.apache.org/jira/browse/PIG-1476
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
 Environment: sunny, 60% humidity with a chance of rain.
Reporter: Russell Jurney
 Fix For: 0.8.0


 After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking 
 like:
  DESCRIBE foo;
foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
 If oun was to let this chain, ouin can end up with: 
 first_thing::second_thing::third_thing::fourth_thing::f1 which is pretty 
 hairy.
 What wunn usually wants is:
foo: {f1:int, f2:chararray, f3: int}
 At this point, won is left with two choices, neither of which is very good.  
 Choice wan:
  foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
 This is a poor choice because later when wahn edits this file, it is 
 confusing to remember what order is what field when wun manipulates something 
 up stream in the script.  So instead whun does this:
  foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2, 
  old_thing::f3 AS f3;
 or
  foo = FOREACH foo GENERATE f1 AS f1, f2 AS f2, f3 AS f3;
 This is a poor choice because it is verbose and cumbersome.
 With no good choices available, whan is unsure what to do, pauses and 
 reflects that the Pig is perplexing, and hopes for a better tomorrow.  Here's 
 what wuhn should do to avoid this situation:
 foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
 DESCRIBE foo foo: {f1:int, f2:chararray, f3: int};
 I think so, anyway.  I leave the behavior of duplicate fields to more 
 enlightened beings, but I think this would be a big improvement to Pig Latin.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig

2010-06-21 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880731#action_12880731
 ] 

Russell Jurney commented on PIG-1429:
-

I'll be able to wrap this up next weekend.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig

2010-06-10 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877544#action_12877544
 ] 

Russell Jurney commented on PIG-1429:
-

The patch needs more work.  Should knock it out in the next couple weeks.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1436) Print number of records outputted at each step of a Pig script

2010-06-03 Thread Russell Jurney (JIRA)
Print number of records outputted at each step of a Pig script
--

 Key: PIG-1436
 URL: https://issues.apache.org/jira/browse/PIG-1436
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.7.0
Reporter: Russell Jurney
Priority: Minor
 Fix For: 0.8.0


I often run a script multiple times, or have to go and look through Hadoop task 
logs, to figure out where I broke a long script in such a way that I get 0 
records out of it.  I think this is a common problem.

If someone can point me in the right direction, I can make a pass at this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig

2010-05-31 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873629#action_12873629
 ] 

Russell Jurney commented on PIG-1429:
-

Some more work to be done with operators.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-05-31 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873664#action_12873664
 ] 

Russell Jurney commented on PIG-1314:
-

Hmmm not sure if I should use durations or periods, or both.  See 
http://joda-time.sourceforge.net/apidocs/org/joda/time/Period.html


 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-05-30 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1429:


Attachment: boolean.patch

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: boolean.patch, boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-05-30 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1429:


Attachment: (was: boolean.patch)

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig

2010-05-30 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873474#action_12873474
 ] 

Russell Jurney commented on PIG-1429:
-

Did some more work, have a new patch... seems the problem is in 
PigMapBase.runPipeline:

protected void runPipeline(PhysicalOperator leaf) throws IOException, 
InterruptedException {
while(true){
String foo = ; String bar = ;
Result res = leaf.getNext(DUMMYTUPLE);

res is NULL, so it dies.  

The leaf is: (Name: A: New For Each(false,false)[bag] - 1-13 Operator Key: 1-13)

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-05-30 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1429:


Attachment: working_boolean.patch

Attached patch can LOAD/DUMP a boolean type :D  I'll work on more tests, but it 
roughly works.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: boolean.patch, working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-05-30 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1429:


Attachment: (was: boolean.patch)

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: working_boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1429) Add Boolean Data Type to Pig

2010-05-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1429:


Attachment: boolean.patch

Broken patch that adds boolean type.

 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1429) Add Boolean Data Type to Pig

2010-05-29 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873363#action_12873363
 ] 

Russell Jurney commented on PIG-1429:
-

Did the work I think is required based on Alan's comments in PIG-1314 and help 
from Dmitriy.  It builds - I still have to add tests (may be the only way to 
fix this), but I'm hoping someone can ID my problem.  I keep getting the 
exception below.  Anyone know where I should look?  I've traced this through, 
and nothing stands out.

-

org.apache.pig.backend.executionengine.ExecException: ERROR 2055: Received 
Error while processing the map plan.
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:261)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:228)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
2010-05-29 20:04:25,363 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- HadoopJobId: job_local_0001
2010-05-29 20:04:29,866 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2010-05-29 20:04:29,866 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 1 map reduce job(s) failed!
2010-05-29 20:04:29,868 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed to produce result in: file:/tmp/temp-537038699/tmp-381529216
2010-05-29 20:04:29,868 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Records written : Unable to determine number of records written
2010-05-29 20:04:29,868 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Bytes written : Unable to determine number of bytes written
2010-05-29 20:04:29,868 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Spillable Memory Manager spill count : 0
2010-05-29 20:04:29,869 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Proactive spill count : 0
2010-05-29 20:04:29,869 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!
2010-05-29 20:04:29,872 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2010-05-29 20:04:29,876 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias A
2010-05-29 20:04:29,876 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias A
at org.apache.pig.PigServer.openIterator(PigServer.java:663)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:598)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:291)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
at org.apache.pig.Main.main(Main.java:410)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:657)
... 6 more



 Add Boolean Data Type to Pig
 

 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.8.0

 Attachments: boolean.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  
 I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
 plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-05-29 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873382#action_12873382
 ] 

Russell Jurney commented on PIG-1314:
-

Ok, thinking about really doing this soon, after Boolean.  I'd like to add two 
new primitives to Pig - DateTime and Duration.  

I'd do this on the wiki, but I don't have edit access.  Can someone please 
grant the ability to make a new page to user RussellJurney on the Pig wiki?

Design Notes:

1) I'd like to use Jodatime for this, as I did in the DateTime UDFs.  It is 
possible to use the Java date libs, but it would be painful to do so.  Jodatime 
also performs better than Java's native date classes.  It is Apache 2.0 
licensed and is already pulled in via ivy in the DateTime UDFs - see PIG-1310

2) Date Format for text/dumps: ISO8601.  Looks like: [][MM][DD]T[hh][mm]Z  
It is a human readable, sortable/comparable, international standard.  See 
http://en.wikipedia.org/wiki/ISO_8601#Dates

2.5) In memory type: org.joda.time.DateTime.  See 
http://joda-time.sourceforge.net/apidocs/org/joda/time/DateTime.html

The internal format of jodatime is a Long epoch/Unix/POSIX time.  See 
http://joda-time.sourceforge.net/faq.html#internalstorage

3) Duration Format for text/dumps: ISO8601.  Looks like: 
P[n]Y[n]M[n]DT[n]H[n]M[n]S  It is a human readable, sortable/comparable, 
international standard.  See http://en.wikipedia.org/wiki/ISO_8601#Durations

3.5) In-memory format: org.joda.time.Duration.  See 
http://joda-time.sourceforge.net/apidocs/org/joda/time/Duration.html

4) All date functions in PIG-1310 should be included, except those replaced by 
the use of operators on datetimes and durations.  Adding/subtracting datetimes 
should result in a duration.  Durations can be 
added/subtracted/divided/multiplied/negated.  

Date/Duration truncation, date differences, date parsing/conversion should be 
included.  Conversion from int/long POSIX, SQL and datemonth should be 
included.  Conversion from any string with a DateFormat string should be 
included.

5) Casting to and from Integer and Long should be supported, as a Unix/POSIX 
time.  Casting to/from chararray in ISO8601 format should be supported.

Comments?  Suggestions?

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1429) Add Boolean Data Type to Pig

2010-05-28 Thread Russell Jurney (JIRA)
Add Boolean Data Type to Pig


 Key: PIG-1429
 URL: https://issues.apache.org/jira/browse/PIG-1429
 Project: Pig
  Issue Type: New Feature
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


Pig needs a Boolean data type.  Pig-1097 is dependent on doing this.  

I volunteer.  Is there anything beyond the work in src/org/apache/pig/data/ 
plus unit tests to make this work?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations

2010-05-28 Thread Russell Jurney (JIRA)
ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix 
Times in All Operations
--

 Key: PIG-1430
 URL: https://issues.apache.org/jira/browse/PIG-1430
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


All functions in 
contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime
 should seamlessly accept integer Unix/POSIX times, and return Unix time output 
when given an int, and ISO output when given a chararray.

Note: Unix/POSIX times are the number of seconds elapsed since midnight 
proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting 
leap seconds.  See http://en.wikipedia.org/wiki/Unix_time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations

2010-05-28 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873217#action_12873217
 ] 

Russell Jurney commented on PIG-1430:
-

Actually, I think it should interpret int as unix time in seconds, and long as 
unix time in miliseconds.  Thoughts?

 ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix 
 Times in All Operations
 --

 Key: PIG-1430
 URL: https://issues.apache.org/jira/browse/PIG-1430
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


 All functions in 
 contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime
  should seamlessly accept integer Unix/POSIX times, and return Unix time 
 output when given an int, and ISO output when given a chararray.
 Note: Unix/POSIX times are the number of seconds elapsed since midnight 
 proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting 
 leap seconds.  See http://en.wikipedia.org/wiki/Unix_time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1431) Current DateTime UDFs: ISONOW(), UNIXNOW()

2010-05-28 Thread Russell Jurney (JIRA)
Current DateTime UDFs: ISONOW(), UNIXNOW()
--

 Key: PIG-1431
 URL: https://issues.apache.org/jira/browse/PIG-1431
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


Need a NOW() for getting datetime diffs between now and a prior or future date. 
 Will use the system timezone.  Will make one for ISO datetime and one for Unix 
time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1430) ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix Times in All Operations

2010-05-28 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873233#action_12873233
 ] 

Russell Jurney commented on PIG-1430:
-

Good idea, will do!

 ISODateTime - DateTime: DateTime UDFs Should Also Support int/second Unix 
 Times in All Operations
 --

 Key: PIG-1430
 URL: https://issues.apache.org/jira/browse/PIG-1430
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0


 All functions in 
 contrib.piggybank.java.src.main.java.org.apache.pig.piggybank.evaluation.datetime
  should seamlessly accept integer Unix/POSIX times, and return Unix time 
 output when given an int, and ISO output when given a chararray.
 Note: Unix/POSIX times are the number of seconds elapsed since midnight 
 proleptic Coordinated Universal Time (UTC) of January 1, 1970, not counting 
 leap seconds.  See http://en.wikipedia.org/wiki/Unix_time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-05-28 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873234#action_12873234
 ] 

Russell Jurney commented on PIG-1314:
-

As a first pass, I am going to add Boolean, which should be easier than 
DateTime, but will inform this implementation.  See PIG-1429

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-19 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Attachment: addconcat2.patch

Fixed bad comment re: copying bytes.

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: addconcat2.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-18 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12868765#action_12868765
 ] 

Russell Jurney commented on PIG-1420:
-

Dmitriy, it applies with -p1

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: concat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-18 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Attachment: addconcat.patch

New, working patch made with git diff --no-prefix, applies with -p0

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: addconcat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-18 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Attachment: (was: concat.patch)

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: addconcat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-18 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Patch Info: [Patch Available]

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: addconcat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-17 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Status: In Progress  (was: Patch Available)

I don't know what resume progress does, but I'm about to find out.

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: concat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-16 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Status: Open  (was: Patch Available)

Redoing CONCAT of DataByteArrays using java.nio.ByteBuffer

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: concat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-16 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Status: Patch Available  (was: Open)

Re-submitting original, java.nio.ByteBuffer isn't very helpful.

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: concat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-14 Thread Russell Jurney (JIRA)
Make CONCAT act on all fields of a tuple, instead of just the first two fields 
of a tuple
-

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0


org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
act on the first two fields of a tuple.  This results in ugly nested CONCAT 
calls like:

CONCAT(CONCAT(A, ' '), B)

The more desirable form is:

CONCAT(A, ' ', B)

This change will be backwards compatible, provided that no one was relying on 
the fact that CONCAT ignores fields after the first two in a tuple.  This seems 
a reasonable assumption to make, or at least a small break in compatibility for 
a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-14 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


Attachment: concat.patch

Patch that adds:

1) CONCAT handles all fields in the supplied tuple, instead of just the first 
two.
2) StringConcat handles all fields in the supplied tuple, instead of just the 
first two.
3) DataByteArray gets an append() to make the implementation of 1  2 clean (I 
think).
4) Unit Tests for CONCAT and StringCONCAT in TestBuiltin
5) Unit Tests for DataByteArray.append() in TestDataModel

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: concat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1420) Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple

2010-05-14 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1420:


  Status: Patch Available  (was: Open)
Release Note: 
CONCAT handles all fields in the supplied tuple, instead of just the first two. 
 This is backwards compatible unless you were relying on it only using the 
first two fields, which seems unlikely.  DataByteArray now has an append() 
method.

Example use before:

B = FOREACH A GENERATE CONCAT(CONCAT(first_name, ' '), last_name);

Example extended use now: 

D = FOREACH C GENERATE CONCAT(first_name, ' ', last_name);

Passes all tests for me.  I like Asparagus.

 Make CONCAT act on all fields of a tuple, instead of just the first two 
 fields of a tuple
 -

 Key: PIG-1420
 URL: https://issues.apache.org/jira/browse/PIG-1420
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: concat.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 org.apache.pig.builtin.CONCAT (which acts on DataByteArray's internally) and 
 org.apache.pig.builtin.StringConcat (which acts on Strings internally), both 
 act on the first two fields of a tuple.  This results in ugly nested CONCAT 
 calls like:
 CONCAT(CONCAT(A, ' '), B)
 The more desirable form is:
 CONCAT(A, ' ', B)
 This change will be backwards compatible, provided that no one was relying on 
 the fact that CONCAT ignores fields after the first two in a tuple.  This 
 seems a reasonable assumption to make, or at least a small break in 
 compatibility for a sizable improvement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-30 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851542#action_12851542
 ] 

Russell Jurney commented on PIG-1310:
-

Cool - one thing though - Piggybank itself does not build in trunk.  It must 
not have built since 0.6, since the load/store func changes went in.  Does 
something need to be done there?  Should I submit a patch that removes all the 
broken UDFs to make ant build in piggybank work on trunk?

To get piggybank to build, I had to remove:

!   
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestMultiStorage.java
!   
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestSequenceFileLoader.java
!   
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestRegExLoader.java
!   
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/TestPigStorageSchema.java
!   
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/string/TestLookupInFiles.java
!   
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/evaluation/TestEvalString.java

Is this just me, is this fixed on other branches?

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.7.0

 Attachments: joda-mavenstuff.diff, pass.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: datetime3.patch)

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: combined.patch, joda-mavenstuff.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: combined.patch

All inclusive patch from pig root, includes all classes and tests, and ivy 
updates for jodatime.  Applied this to a fresh trunk svn checkout, and all 
works ok - once I remove failing tests in piggybank unrelated to this commit.



 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: combined.patch, joda-mavenstuff.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: tests.patch)

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: combined.patch, joda-mavenstuff.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: combined.patch)

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: joda-mavenstuff.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: deargod.patch

Last shot at a patch - this builds and tests ok against a fresh checkout of pig 
(once all the unrelated broken tests are rm'd).



 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: deargod.patch, joda-mavenstuff.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: deargod.patch)

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: joda-mavenstuff.diff

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: pass.patch

Ok, this patch works.  



 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: joda-mavenstuff.diff, pass.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-25 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: datetime.patch)

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-25 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: datetime2.patch)

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-25 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849623#action_12849623
 ] 

Russell Jurney commented on PIG-1310:
-

Oh, I'm allergic to XML.  Seriously allergic.  Can someone purty please help me 
out with the ivy bit?

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-25 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Summary: ISO Date UDFs: Conversion, Trucation and Date Math  (was: ISO Date 
UDFs: Conversion, Rounding and Date Math)

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime3.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Trucation and Date Math

2010-03-25 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: tests.patch

Tests attached.

 ISO Date UDFs: Conversion, Trucation and Date Math
 --

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime3.patch, joda-mavenstuff.diff, tests.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-24 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849528#action_12849528
 ] 

Russell Jurney commented on PIG-1310:
-

Thanks, Alan, I'll add all those changes tonight.  I confess to not really 
testing CustomFormatToISO other than the test case, I'll update the docs :)

As to ISO format - I will link to it and jodatime, and I would suggest ISO8601 
be the standard representation of datetimes in Pig, as it handles time zones 
and is sortable as text - which is nice.  

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime.patch, datetime2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-23 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: (was: datetime.patch)

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-03-23 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848618#action_12848618
 ] 

Russell Jurney commented on PIG-1314:
-

I would not say this blocks PIG-1310 at all - the UDFs there simply treat ISO 
dates as strings, which works reasonably well.  They should also handle Long 
unix times, and will in a next patch.  In any case, this isn't a blocker to 
that ticket, for which a patch was just submitted.

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-23 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848958#action_12848958
 ] 

Russell Jurney commented on PIG-1310:
-

Alan, yes jodatime is on a maven repo - I have it pulling via ivy into my local 
pig trunk.  Wasn't sure what to do in piggybank since there was no ivy.xml, but 
I will look at the maven docs and add it.


 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-23 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1310:


Attachment: datetime2.patch

Added checks for null/  foo sized tuples.  Added CustomFormatToISO class and 
test, allowing any date format to be parsed in jodatime.  This patch is a 
replacement to the previous one.

One hitch - I'm out of time tonight and me and build XML's do not get on well.  
Any chance someone more familiar with ivy can add jodatime to piggybank's 
build.xml? I got it working easily in the pig project itself, but am not sure 
how to get ivy going in piggybank.  

The working dependency I put in Pig's build.xml is:

dependency org=joda-time name=joda-time rev=${joda-time.version} 
conf=compile-master/

And libraries.properties got:

joda-time.version=1.6

And it worked.

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime.patch, datetime2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1314) Add DateTime Support to Pig

2010-03-22 Thread Russell Jurney (JIRA)
Add DateTime Support to Pig
---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.8.0
Reporter: Russell Jurney
 Fix For: 0.7.0


Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp 
component.  Therefore Pig should support dates as a primitive.

Can someone familiar with adding types to pig comment on how hard this is?  
We're looking at doing this, rather than use UDFs.  Is this a patch that would 
be accepted?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1314) Add DateTime Support to Pig

2010-03-22 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848356#action_12848356
 ] 

Russell Jurney commented on PIG-1314:
-

Thanks, Alan.  That is quite helpful.  Let me look into it and see about 
feasibility.  

What about durations as well?  http://en.wikipedia.org/wiki/ISO_8601#Durations 
ISO8601 durations would be very handy in enabling use of pig operators on 
datetimes via +/-, etc.  This might be something to do later, though.

 Add DateTime Support to Pig
 ---

 Key: PIG-1314
 URL: https://issues.apache.org/jira/browse/PIG-1314
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.7.0
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 672h
  Remaining Estimate: 672h

 Hadoop/Pig are primarily used to parse log data, and most logs have a 
 timestamp component.  Therefore Pig should support dates as a primitive.
 Can someone familiar with adding types to pig comment on how hard this is?  
 We're looking at doing this, rather than use UDFs.  Is this a patch that 
 would be accepted?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-20 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847874#action_12847874
 ] 

Russell Jurney commented on PIG-1310:
-

I'm thinking it would be good if DateTime was a Pig primitive.  Can someone 
give me an idea how much work it is to add a Pig primitive, and if this patch 
would be accepted for 0.8?

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.8.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-19 Thread Russell Jurney (JIRA)
ISO Date UDFs: Conversion, Rounding and Date Math
-

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0


I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
formatted date strings, and working with them as ISO datetimes using jodatime.

The working code is here: 
http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/

It needs to be documented and tests added, and a couple UDFs are missing, but 
these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
get this stuff in piggybank before someone else writes it this time :)  The 
rounding also may not be performant, but the code works.

Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
slap me if this isn't done soon, it is not much work and this should help 
everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2010-03-19 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847600#action_12847600
 ] 

Russell Jurney commented on PIG-1150:
-

Yes, this sounds like the thing to do :)

On Tue, Mar 16, 2010 at 5:29 PM, Dmitriy V. Ryaboy (JIRA)



 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1150) VAR() Variance UDF

2009-12-16 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-1150:


Attachment: var.patch

This patch will not cut the mustard - it lacks Javadoc, and test cases and its 
just plain ugly.

That being said, people requested this on twitter, so I'm pushing this one for 
people to use if they want to.  Will get a passable patch up later this week.

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1150) VAR() Variance UDF

2009-12-16 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791582#action_12791582
 ] 

Russell Jurney commented on PIG-1150:
-

Oh - one other thing - I've read that this naive parallel method of calculating 
variance can have precision problems - all those double's getting subtracted 
from one another and then squared.  I've thought of using BigDecimal, which can 
handle arbitrary precision numbers.  My understanding is that this would be 
slow, but that it would probably still be IO bound.  

Is that something people would like to see?  I could maybe make another UDF 
that uses BigDecimal or something.  I've never actually encountered the 
precision problems in practice, but I can see how that might be a big problem 
for some people.

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1150) VAR() Variance UDF

2009-12-14 Thread Russell Jurney (JIRA)
VAR() Variance UDF
--

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
 Fix For: 0.5.0


I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
variance in a distributed manner, based on the AVG() builtin.  It works by 
calculating the count, sum and sum of squares, as described here: 
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

Is this a worthwhile contribution?  Taking the square root of this value using 
the contrib SQRT() function gives Standard Deviation, which is missing from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-896) Pig doesnt run on Mac OSX

2009-07-27 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735729#action_12735729
 ] 

Russell Jurney commented on PIG-896:


Not sure if you're able to get it running or not, but Pig will run on OS X 10.5 
if you set:

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home

For previous versions of OS X you have to upgrade to 1.6, but for 10.5.7/Pig 
0.3.0 at least - do this and it will 'just work.'  I'm not sure this 
constitutes a bug, since JAVA_HOME is an environment variable.

 Pig doesnt run on Mac OSX
 -

 Key: PIG-896
 URL: https://issues.apache.org/jira/browse/PIG-896
 Project: Pig
  Issue Type: Bug
  Components: build
 Environment: Mac OSX
Reporter: Rajagopal Natarajan

 There are harcoded references like $JAVA_HOME/bin/java in the pig run 
 scripts. Due to this it fails on Mac OSX. It would be nice if pig would be 
 supported on Mac

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.