[jira] [Commented] (PIG-2828) DataType.compare null

2013-06-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672880#comment-13672880
 ] 

Aniket Mokashi commented on PIG-2828:
-

DataType compare api is little broken.
public static int compare(Object o1, Object o2) - uses reflection to infer 
datatypes of o1 and o2.
public static int compare(Object o1, Object o2, byte dt1, byte dt2) - doesn't 
use reflection, however callers of this api use reflection and also deal with 
NULLs. 
Currently, callers of second API handle NULLs somewhat similarly but its not 
consistent. We can refactor the api to avoid reflection and handle NULLs 
consistently in a separate jira.
Right now, TOP that uses second api directly fails with NPE if o1 or o2 has 
null data. We should fix that with NULL  non-NULL semantics. 

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
 Attachments: DataType.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2828) DataType.compare null

2013-06-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2828:


Status: Patch Available  (was: Open)

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
 Attachments: DataType.patch, PIG-2828.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2828) DataType.compare null

2013-06-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2828:


Attachment: PIG-2828.patch

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
 Attachments: DataType.patch, PIG-2828.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2828) DataType.compare null

2013-06-03 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672896#comment-13672896
 ] 

Julien Le Dem commented on PIG-2828:


Sounds good to me.

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
 Attachments: DataType.patch, PIG-2828.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-06-03 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/#review21312
---


Just minor comments in the naming of the variable. Java variable names should 
be camel case.


http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
https://reviews.apache.org/r/11333/#comment44210

goldenOutput



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
https://reviews.apache.org/r/11333/#comment44209

output



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
https://reviews.apache.org/r/11333/#comment44211

golden output



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
https://reviews.apache.org/r/11333/#comment44212

fileOutput


- Rohini Palaniswamy


On May 29, 2013, 11:07 p.m., Viraj Bhat wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11333/
 ---
 
 (Updated May 29, 2013, 11:07 p.m.)
 
 
 Review request for pig and Rohini Palaniswamy.
 
 
 Description
 ---
 
 Null pointer exception when loading union with null in it's schema. Test case 
 was also updated with a sample test case.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testLoadAvrowithNulls.txt
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/11333/diff/
 
 
 Testing
 ---
 
 Yes all tests pass in the piggybank
 
 
 Thanks,
 
 Viraj Bhat
 




Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-06-03 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/#review21316
---



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
https://reviews.apache.org/r/11333/#comment44214

Isn't a load and store enough to reproduce the test case? Why such a long 
pig script?


- Rohini Palaniswamy


On May 29, 2013, 11:07 p.m., Viraj Bhat wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11333/
 ---
 
 (Updated May 29, 2013, 11:07 p.m.)
 
 
 Review request for pig and Rohini Palaniswamy.
 
 
 Description
 ---
 
 Null pointer exception when loading union with null in it's schema. Test case 
 was also updated with a sample test case.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/expected_testLoadAvrowithNulls.txt
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/11333/diff/
 
 
 Testing
 ---
 
 Yes all tests pass in the piggybank
 
 
 Thanks,
 
 Viraj Bhat
 




Re: Review Request: PIG-3331 Default values not written to Schema when specified in the output schema

2013-06-03 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11355/#review21315
---



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java
https://reviews.apache.org/r/11355/#comment44213

Initialize defaultValue in a variable and pass defaultValue instead of 
doing a if else condition.



http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
https://reviews.apache.org/r/11355/#comment44215

Isn't a load and store enough to reproduce the test case? Why such a long 
pig script? Please try to keep the unit tests simple.


- Rohini Palaniswamy


On May 30, 2013, 2:29 a.m., Viraj Bhat wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11355/
 ---
 
 (Updated May 30, 2013, 2:29 a.m.)
 
 
 Review request for pig and Rohini Palaniswamy.
 
 
 Description
 ---
 
 Patch to write default values to the Schema when the writer schema contains 
 that in the AvroStorage.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigSchema2Avro.java
  1485826 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
  1485826 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files/numbers.txt
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/11355/diff/
 
 
 Testing
 ---
 
 Yes against the Piggybank  in Pig trunk/Pig 0.12
 
 
 Thanks,
 
 Viraj Bhat
 




[jira] [Commented] (PIG-3341) Improving performance of loading datetime values

2013-06-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673122#comment-13673122
 ] 

Rohini Palaniswamy commented on PIG-3341:
-

bq. Before making the fix, I think there needs to be a little more clarity 
around exactly what formats are supported. For example, pig 0.11.1 currently 
supports datetime strings with no date - T00:00:00 produces a date in 1970. 
Is this intentional?
   I don't think anyone is looking for such a behaviour. Not intuitive. 

 I think we can go with option 1 (more is better) but also state which of those 
formats supported are not part of w3c profile. We also need to return null if 
it does not confirm to the format instead of throwing an error. 

 Improving performance of loading datetime values
 

 Key: PIG-3341
 URL: https://issues.apache.org/jira/browse/PIG-3341
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: pat chan
Priority: Minor
 Fix For: 0.12, 0.11.2


 The performance of loading datetime values can be improved by about 25% by 
 moving a single line in ToDate.java:
 public static DateTimeZone extractDateTimeZone(String dtStr) {
   Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);;
 should become:
 static Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);
 public static DateTimeZone extractDateTimeZone(String dtStr) {
 There is no need to recompile the regular expression for every value. I'm not 
 sure if this function is ever called concurrently, but Pattern objects are 
 thread-safe anyways.
 As a test, I created a file of 10M timestamps:
   for i in 0..1000
 puts '2000-01-01T00:00:00+23'
   end
 I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is null; dump B;
 Before the change it took 160s.
 After the change, the script took 120s.
 
 Another performance improvement can be made for invalid datetime values. If a 
 datetime value is invalid, an exception is created and thrown, which is a 
 costly way to fail a validity check. To test the performance impact, I 
 created 10M invalid datetime values:
   for i in 0..1000
 puts '2000-99-01T00:00:00+23'
   end
 In this test, the regex pattern was always recompiled. I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is not null; dump 
 B;
 The script took 190s.
 I understand this could be considered an edge case and might not be worth 
 changing. However, if there are use cases where invalid dates are part of 
 normal processing, then you might consider fixing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3327) Pig hits OOM when fetching task Reports

2013-06-03 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3327:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk (0.12). Thanks Cheolsoo

 Pig hits OOM when fetching task Reports
 ---

 Key: PIG-3327
 URL: https://issues.apache.org/jira/browse/PIG-3327
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3327-1.patch


 java.lang.OutOfMemoryError: GC overhead limit exceeded is hit with hadoop 23 
 by the pig script when a launched job has 80K+ maps. The TaskReport[] array 
 is causing OOM. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3337) Fix remaining Window e2e tests

2013-06-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673234#comment-13673234
 ] 

Hudson commented on PIG-3337:
-

Integrated in Hive-trunk-h0.21 #2125 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2125/])
PIG-3337: Fix remaining Window e2e tests (Revision 1487967)

 Result = FAILURE
daijy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1487967
Files : 
* /pig/trunk/CHANGES.txt
* /pig/trunk/test/e2e/harness/TestDriver.pm
* /pig/trunk/test/e2e/pig/drivers/TestDriverPig.pm


 Fix remaining Window e2e tests
 --

 Key: PIG-3337
 URL: https://issues.apache.org/jira/browse/PIG-3337
 Project: Pig
  Issue Type: Sub-task
  Components: e2e harness
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3337-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3337) Fix remaining Window e2e tests

2013-06-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673254#comment-13673254
 ] 

Rohini Palaniswamy commented on PIG-3337:
-

[~daijy],
   Any idea why hive hudson messages are appearing here? Saw this before in 
PIG-2955 and PIG-3069 also 

 Fix remaining Window e2e tests
 --

 Key: PIG-3337
 URL: https://issues.apache.org/jira/browse/PIG-3337
 Project: Pig
  Issue Type: Sub-task
  Components: e2e harness
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3337-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3343) Refactor DataType.compare api to handle NULLs and reflection

2013-06-03 Thread Aniket Mokashi (JIRA)
Aniket Mokashi created PIG-3343:
---

 Summary: Refactor DataType.compare api to handle NULLs and 
reflection
 Key: PIG-3343
 URL: https://issues.apache.org/jira/browse/PIG-3343
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Aniket Mokashi




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2828) DataType.compare null

2013-06-03 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673318#comment-13673318
 ] 

Aniket Mokashi commented on PIG-2828:
-

I have created https://issues.apache.org/jira/browse/PIG-3343 to track api 
refactor.

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
 Attachments: DataType.patch, PIG-2828.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3344) Add a spatial datatype to Pig

2013-06-03 Thread Ahmed Eldawy (JIRA)
Ahmed Eldawy created PIG-3344:
-

 Summary: Add a spatial datatype to Pig
 Key: PIG-3344
 URL: https://issues.apache.org/jira/browse/PIG-3344
 Project: Pig
  Issue Type: New Feature
  Components: parser
Reporter: Ahmed Eldawy


This issue is about adding a new datatype to Pig that abstracts a spatial 
attribute. Following OGC [http://www.opengeospatial.org/], we will add a new 
datatype called 'Geometry' that abstracts all standard shapes (e.g., Point, 
Polygon and Linestring). This datatype is automatically parsed from either a 
Well-Known Text (WKT) or Well-Known Binary (WKB) represented as a Hex string. 
These two types are the standard export formats for OGC shapes and they are 
supported by many existing tools including PostGIS [http://postgis.net/]. 
Exporting through PigStorage should default to a WKB represented as Hex string 
and there will be additional functions to convert to WKT.

This new datatype maps internally to the class OGCGeometry 
[https://github.com/Esri/geometry-api-java/blob/master/src/com/esri/core/geometry/ogc/OGCGeometry.java]
 licensed under Apache license. This class contains functionality to 
import/export to the WKT and WKB formats.

Data manipulation functions to the new datatype will be all done through UDFs. 
Currently, there is a spatial extension to Pig (called Pigeon) 
[https://github.com/aseldawy/pigeon] that provides basic spatial functionality 
via UDFs powered by the aforementioned library. Currently, it automatically 
converts WKB and WKT fields to OGCGeometry class, performs the spatial 
operation, and produces the result back as WKB. Once the Geometry datatype is 
added, it will natively use it to avoid the conversion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A major addition to Pig. Working with spatial data

2013-06-03 Thread Ahmed Eldawy
I've just created a new JIRA issue for the spatial functionality.
https://issues.apache.org/jira/browse/PIG-3344
This issue is all about the new datatype which is the only thing that needs
to be changed internally in Pig in this phase. Pigeon is already working
with the ESRI library but it converts between binary representation and
Geometry class back and forth. Once the new datatype is added, we can
change Pigeon to work with this datatype too. We can still keep the current
conversion functionality as it allows the system to automatically perform
the conversion from the bytearray datatype as it adds the autodetect
functionality when a column is not given a type in the schema.

I don't know if I should provide a patch to this issue myself or there is
someone else who can work on it. I can of course do it but I think it will
take me some time to finish as I'm not yet familiar with the internals of
Pig. Someone who is familiar with the parser would definitely make a better
job here. I can focus on Pigeon and add more spatial functions there so
that we can have a plenty of functions once the new datatype is added. I'm
open to both solutions but I'm just checking with you.

Thanks
Ahmed

Best regards,
Ahmed Eldawy


On Wed, May 29, 2013 at 12:17 PM, Russell Jurney
russell.jur...@gmail.comwrote:

 Awesome. This would be a great addition to Pig. Please create a JIRA.

 Russell Jurney http://datasyndrome.com

 On May 29, 2013, at 8:51 AM, Ahmed Eldawy aseld...@gmail.com wrote:

  Hi all,
 
  Nick has pointed out to me an alternative GIS package that can replace
 JTS.
  ESRI has recently released a GIS
  packagehttps://github.com/Esri/geometry-api-javaunder Apache
  license. I changed Pigeon to work with that new package. I
  think it could be easier now to integrate this work with main branch of
  Apache Pig. I will go on with the current project and add more spatial
  functionality. We can then add a new datatype to Apache and link it to
  those functions.
 
  ESRI package contains a class OGCGeometry
  
 http://esri.github.io/geometry-api-java/javadoc/com/esri/core/geometry/ogc/OGCGeometry.html
 which
  can be linked to a new datatype 'Geometry'. Do you think we can rely on
 the
  new package and integrate the work with Apache Pig?
 
  On May 23, 2013 11:40 PM, Ahmed Eldawy aseld...@gmail.com wrote:
 
  Hi all,
   Thanks for your help. I've started the project with a minimal
  functionality as a start. It's currently hosted in github. It is
 licensed
  under the Apache public license to make it easier to merge with Pig.
  Currently it has only a very few functions. I implemented a function
 from
  different types of functions (e.g., Aggregate and create). I'll keep
 adding
  functions and any contributions to the project are welcome. As a
 beginning,
  I need an ANT build file that runs the tests, compiles and generates a
 jar
  file. I'm not familiar with ANT so any help in this is encouraged.
  Here's the project home page
  https://github.com/aseldawy/pigeon
 
 
  If you have any comments or suggestion please contact me.
 
 
  Best regards,
  Ahmed Eldawy
 
 
  On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney jcove...@gmail.com
 wrote:
 
  Nick: the only issue is that the way types are implemented in Pig don't
  allow us to easily plug-in types externally. Adding support for that
  would be cool, but a fair bit of work.
 
 
  2013/5/6 Nick Dimiduk ndimi...@gmail.com
 
  I'm to a lawyer, but I see no reason why this cannot be an external
  extension to Pig. It would behave the same way PostGIS is an external
  extension to Postgres. Any Apache issues would be toward general
  purpose enhancements, not specific to your project.
 
  Good on you!
  -n
 
  On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy aseld...@gmail.com
  wrote:
 
  I contacted solr developers to see how JTS can be included in an
  Apache
  project. See
 
 http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
  As far as I understand, they did not include it in the main solr
  project,
  rather, they created a separate project (spatial 4j) which is still
  licensed under Apache license and refers to JTS. Users will have to
  download JTS libraries separately to make it run. That's pretty much
  the
  same plan that Jonathan mentioned. We will still have the overhead of
  serializing/deserializing the shapes each time a function is called.
  Also,
  we will have to use the ugly bytearray data type for spatial data
  instead
  of creating its own data type (e.g., Geometry).
  I think using spatial 4j instead of JTS will not be sufficient for
 our
  case
  as we need to provide an access to all spatial functions of JTS such
  as
  Union, Intersection, Difference, ... etc. This way we can claim
  conformity
  with OGC standards which gives visibility and appreciations of the
  spatial
  community.
  I think also that this means I will not add any issues to JIRA as it
  is
  now
  a 

Re: A major addition to Pig. Working with spatial data

2013-06-03 Thread Russell Jurney
Those JIRAs do best that are completed by one person driving them.


On Mon, Jun 3, 2013 at 10:26 AM, Ahmed Eldawy aseld...@gmail.com wrote:

 I've just created a new JIRA issue for the spatial functionality.
 https://issues.apache.org/jira/browse/PIG-3344
 This issue is all about the new datatype which is the only thing that needs
 to be changed internally in Pig in this phase. Pigeon is already working
 with the ESRI library but it converts between binary representation and
 Geometry class back and forth. Once the new datatype is added, we can
 change Pigeon to work with this datatype too. We can still keep the current
 conversion functionality as it allows the system to automatically perform
 the conversion from the bytearray datatype as it adds the autodetect
 functionality when a column is not given a type in the schema.

 I don't know if I should provide a patch to this issue myself or there is
 someone else who can work on it. I can of course do it but I think it will
 take me some time to finish as I'm not yet familiar with the internals of
 Pig. Someone who is familiar with the parser would definitely make a better
 job here. I can focus on Pigeon and add more spatial functions there so
 that we can have a plenty of functions once the new datatype is added. I'm
 open to both solutions but I'm just checking with you.

 Thanks
 Ahmed

 Best regards,
 Ahmed Eldawy


 On Wed, May 29, 2013 at 12:17 PM, Russell Jurney
 russell.jur...@gmail.comwrote:

  Awesome. This would be a great addition to Pig. Please create a JIRA.
 
  Russell Jurney http://datasyndrome.com
 
  On May 29, 2013, at 8:51 AM, Ahmed Eldawy aseld...@gmail.com wrote:
 
   Hi all,
  
   Nick has pointed out to me an alternative GIS package that can replace
  JTS.
   ESRI has recently released a GIS
   packagehttps://github.com/Esri/geometry-api-javaunder Apache
   license. I changed Pigeon to work with that new package. I
   think it could be easier now to integrate this work with main branch of
   Apache Pig. I will go on with the current project and add more spatial
   functionality. We can then add a new datatype to Apache and link it to
   those functions.
  
   ESRI package contains a class OGCGeometry
   
 
 http://esri.github.io/geometry-api-java/javadoc/com/esri/core/geometry/ogc/OGCGeometry.html
  which
   can be linked to a new datatype 'Geometry'. Do you think we can rely on
  the
   new package and integrate the work with Apache Pig?
  
   On May 23, 2013 11:40 PM, Ahmed Eldawy aseld...@gmail.com wrote:
  
   Hi all,
Thanks for your help. I've started the project with a minimal
   functionality as a start. It's currently hosted in github. It is
  licensed
   under the Apache public license to make it easier to merge with Pig.
   Currently it has only a very few functions. I implemented a function
  from
   different types of functions (e.g., Aggregate and create). I'll keep
  adding
   functions and any contributions to the project are welcome. As a
  beginning,
   I need an ANT build file that runs the tests, compiles and generates a
  jar
   file. I'm not familiar with ANT so any help in this is encouraged.
   Here's the project home page
   https://github.com/aseldawy/pigeon
  
  
   If you have any comments or suggestion please contact me.
  
  
   Best regards,
   Ahmed Eldawy
  
  
   On Mon, May 6, 2013 at 3:09 PM, Jonathan Coveney jcove...@gmail.com
  wrote:
  
   Nick: the only issue is that the way types are implemented in Pig
 don't
   allow us to easily plug-in types externally. Adding support for
 that
   would be cool, but a fair bit of work.
  
  
   2013/5/6 Nick Dimiduk ndimi...@gmail.com
  
   I'm to a lawyer, but I see no reason why this cannot be an external
   extension to Pig. It would behave the same way PostGIS is an
 external
   extension to Postgres. Any Apache issues would be toward general
   purpose enhancements, not specific to your project.
  
   Good on you!
   -n
  
   On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy aseld...@gmail.com
   wrote:
  
   I contacted solr developers to see how JTS can be included in an
   Apache
   project. See
  
 
 http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
   As far as I understand, they did not include it in the main solr
   project,
   rather, they created a separate project (spatial 4j) which is still
   licensed under Apache license and refers to JTS. Users will have to
   download JTS libraries separately to make it run. That's pretty
 much
   the
   same plan that Jonathan mentioned. We will still have the overhead
 of
   serializing/deserializing the shapes each time a function is
 called.
   Also,
   we will have to use the ugly bytearray data type for spatial data
   instead
   of creating its own data type (e.g., Geometry).
   I think using spatial 4j instead of JTS will not be sufficient for
  our
   case
   as we need to provide an access to all spatial functions of JTS
 

[jira] [Updated] (PIG-2828) DataType.compare null

2013-06-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2828:
---

Assignee: Aniket Mokashi

+1 to PIG-2828.patch. Looks good to me.

[~aniket486], can you please replace all the tabs with 4 spaces when committing 
your patch? Thanks!

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
Assignee: Aniket Mokashi
 Attachments: DataType.patch, PIG-2828.patch, test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3279) Support nested RANK

2013-06-03 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang updated PIG-3279:
--

Attachment: PIG-3279-3.patch.txt

Thanks a lot for your comments, [~daijy]! Appreciate. I changed 
LogToPhyTranslationVisitor.java:
1. for RANK BY operation, only include POSort - POCounter - PORank - 
POForEach. The current physical plan looks like:
c: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-42
|
|---c: New For Each(true)[bag] - scope-41
|   |
|   RelationToExpressionProject[bag][*] - scope-32
|   |
|   |---New For Each(false,true)[tuple] - scope-40
|   |   |
|   |   Project[long][0] - scope-38
|   |   |
|   |   Project[bag][2] - scope-39
|   |
|   |---d: PORank[tuple] - scope-37
|   |   |
|   |   Project[int][0] - scope-34
|   |
|   |---d: POCounter[tuple] - scope-36
|   |   |
|   |   Project[int][0] - scope-34
|   |
|   |---d: POSort[tuple]() - scope-35
|   |   |
|   |   Project[int][0] - scope-34
|   |
|   |---Project[bag][1] - scope-33
|
|---b: Package[tuple]{chararray} - scope-29
|
|---b: Global Rearrange[tuple] - scope-28
|
|---b: Local Rearrange[tuple]{chararray}(false) - scope-30
|   |
|   Project[chararray][1] - scope-31
|
|---a: New For Each(false,false,false)[bag] - scope-27
|   |
|   Cast[chararray] - scope-19
|   |
|   |---Project[bytearray][0] - scope-18
|   |
|   Cast[chararray] - scope-22
|   |
|   |---Project[bytearray][1] - scope-21
|   |
|   Cast[int] - scope-25
|   |
|   |---Project[bytearray][2] - scope-24
|
|---a: 
Load(file:///home/xiaoyuz/PIG-new/pig/input1:org.apache.pig.builtin.PigStorage) 
- scope-17


2. for RANK operation, there is no difference between nested and non-nested 
RANK. Since there is no POPackage, global rearrange for non-nested RANK anyway

However, I still got exception for RANK BY and RANK operations
{noformat}
Caused by: java.lang.RuntimeException: Unable to read counter 
pig.counters.counter_2415405541993583480_-1
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:165)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNextTuple(PORank.java:134)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
... 13 more
{noformat}
thing get closer, but still not complete. Thanks.

 Support nested RANK
 ---

 Key: PIG-3279
 URL: https://issues.apache.org/jira/browse/PIG-3279
 Project: Pig
  Issue Type: Improvement
Reporter: Gianmarco De Francisci Morales
Assignee: Johnny Zhang
 Attachments: PIG-3279-1.patch.txt, PIG-3279-2.patch.txt, 
 PIG-3279-3.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2828) DataType.compare null

2013-06-03 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-2828:


Attachment: PIG-2828-format.patch

 DataType.compare null
 -

 Key: PIG-2828
 URL: https://issues.apache.org/jira/browse/PIG-2828
 Project: Pig
  Issue Type: Bug
Reporter: Haitao Yao
Assignee: Aniket Mokashi
 Attachments: DataType.patch, PIG-2828-format.patch, PIG-2828.patch, 
 test.patch


 While using TOP, and if the DataBag contains null value to compare, it will 
 generate the following exception:
 Caused by: java.lang.NullPointerException
   at org.apache.pig.data.DataType.compare(DataType.java:427)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:97)
   at org.apache.pig.builtin.TOP$TupleComparator.compare(TOP.java:1)
   at java.util.PriorityQueue.siftUpUsingComparator(PriorityQueue.java:649)
   at java.util.PriorityQueue.siftUp(PriorityQueue.java:627)
   at java.util.PriorityQueue.offer(PriorityQueue.java:329)
   at java.util.PriorityQueue.add(PriorityQueue.java:306)
   at org.apache.pig.builtin.TOP.updateTop(TOP.java:141)
   at org.apache.pig.builtin.TOP.exec(TOP.java:116)
 code: (TOP.java, starts with line 91)
 Object field1 = o1.get(fieldNum);
 Object field2 = o2.get(fieldNum);
 if (!typeFound) {
 datatype = DataType.findType(field1);
 typeFound = true;
 }
 return DataType.compare(field1, field2, datatype, datatype);
 The reason is that if the typeFound is true , and the dataType is not null, 
 and field1 is null, the script failed.
 So we need to judge the field1 whether is null.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3345) Handle null in DateTime functions

2013-06-03 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3345:


Attachment: PIG-3345-1.patch

 Handle null in DateTime functions
 -

 Key: PIG-3345
 URL: https://issues.apache.org/jira/browse/PIG-3345
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3345-1.patch


  NPE is thrown in date time functions when a null value is passed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3345) Handle null in DateTime functions

2013-06-03 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3345:


Status: Patch Available  (was: Open)

 Handle null in DateTime functions
 -

 Key: PIG-3345
 URL: https://issues.apache.org/jira/browse/PIG-3345
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3345-1.patch


  NPE is thrown in date time functions when a null value is passed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-06-03 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3285:


Status: Open  (was: Patch Available)

Canceling patch for now so that it does not show in Patch Available list. 

 Jobs using HBaseStorage fail to ship dependency jars
 

 Key: PIG-3285
 URL: https://issues.apache.org/jira/browse/PIG-3285
 Project: Pig
  Issue Type: Bug
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.11.1

 Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig


 Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
 must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
 Exceptions look something like this:
 {noformat}
 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
 child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
   at 
 org.apache.hadoop.hbase.io.HbaseObjectWritable.clinit(HbaseObjectWritable.java:266)
   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
   at 
 org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
   at $Proxy7.getProtocolVersion(Unknown Source)
   at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3342) Allow conditions in case statement

2013-06-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673690#comment-13673690
 ] 

Rohini Palaniswamy commented on PIG-3342:
-

Since it is slightly big, can you upload it in review board?

 Allow conditions in case statement
 --

 Key: PIG-3342
 URL: https://issues.apache.org/jira/browse/PIG-3342
 Project: Pig
  Issue Type: Improvement
  Components: parser
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.12

 Attachments: PIG-3342.patch


 PIG-3268 added case statement support. But conditions are currently not 
 allowed in when branches. For example,
 {code}
 CASE
   WHEN i % 5 == 0 THEN '5n'
   WHEN i % 5 == 1 THEN '5n+1'
   WHEN i % 5 == 2 THEN '5n+2'
   WHEN i % 5 == 3 THEN '5n+3'
   ELSE '5n+4'
 END
 {code}
 This is invalid now. However, it will be useful if it's allowed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3341) Improving performance of loading datetime values

2013-06-03 Thread pat chan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673742#comment-13673742
 ] 

pat chan commented on PIG-3341:
---

Hi, you bring up two good design points.

1. are more formats the better for this use case? Some possible cons:

a) the spec becomes more complicated for probably unused formats. The simplest 
spec would be to conform to the w3c profile.
b) you will have to support all these formats forever
c) there could be a performance overhead to support the possibly unused formats
d) ToDate(s,f) and UDFs already give users the ability to handle any format 
that's needed.
e) asymmetry: seems cleaner if the default parseable format is exactly the 
default printed format


2. What is the design philosophy for invalid conversions? Quietly turning 
invalid values into null seems like it could be a possibly dangerous default 
since it would be really hard to know if your query on terabytes of data is 
encountering problems which are quietly being ignored. A safer philosophy would 
have the default be as strict with the data as possible and then if the user 
finds a legitimate case for null-conversions, provide a way for the user to 
enable it explicitly in the script.

cheers


 Improving performance of loading datetime values
 

 Key: PIG-3341
 URL: https://issues.apache.org/jira/browse/PIG-3341
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: pat chan
Priority: Minor
 Fix For: 0.12, 0.11.2


 The performance of loading datetime values can be improved by about 25% by 
 moving a single line in ToDate.java:
 public static DateTimeZone extractDateTimeZone(String dtStr) {
   Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);;
 should become:
 static Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);
 public static DateTimeZone extractDateTimeZone(String dtStr) {
 There is no need to recompile the regular expression for every value. I'm not 
 sure if this function is ever called concurrently, but Pattern objects are 
 thread-safe anyways.
 As a test, I created a file of 10M timestamps:
   for i in 0..1000
 puts '2000-01-01T00:00:00+23'
   end
 I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is null; dump B;
 Before the change it took 160s.
 After the change, the script took 120s.
 
 Another performance improvement can be made for invalid datetime values. If a 
 datetime value is invalid, an exception is created and thrown, which is a 
 costly way to fail a validity check. To test the performance impact, I 
 created 10M invalid datetime values:
   for i in 0..1000
 puts '2000-99-01T00:00:00+23'
   end
 In this test, the regex pattern was always recompiled. I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is not null; dump 
 B;
 The script took 190s.
 I understand this could be considered an edge case and might not be worth 
 changing. However, if there are use cases where invalid dates are part of 
 normal processing, then you might consider fixing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3341) Improving performance of loading datetime values

2013-06-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673760#comment-13673760
 ] 

Rohini Palaniswamy commented on PIG-3341:
-

The current behavior returns null if there is a invalid value while loading as 
datetime. Pig as far as I have seen does not fail loading when there is invalid 
values. But UDFs do fail. 

Asking the old timers..

[~alangates]/[~daijy]/[~dvryaboy]/[~julienledem]/[~thejas],
   How should we handle the invalid dates?
   

 Improving performance of loading datetime values
 

 Key: PIG-3341
 URL: https://issues.apache.org/jira/browse/PIG-3341
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: pat chan
Priority: Minor
 Fix For: 0.12, 0.11.2


 The performance of loading datetime values can be improved by about 25% by 
 moving a single line in ToDate.java:
 public static DateTimeZone extractDateTimeZone(String dtStr) {
   Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);;
 should become:
 static Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);
 public static DateTimeZone extractDateTimeZone(String dtStr) {
 There is no need to recompile the regular expression for every value. I'm not 
 sure if this function is ever called concurrently, but Pattern objects are 
 thread-safe anyways.
 As a test, I created a file of 10M timestamps:
   for i in 0..1000
 puts '2000-01-01T00:00:00+23'
   end
 I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is null; dump B;
 Before the change it took 160s.
 After the change, the script took 120s.
 
 Another performance improvement can be made for invalid datetime values. If a 
 datetime value is invalid, an exception is created and thrown, which is a 
 costly way to fail a validity check. To test the performance impact, I 
 created 10M invalid datetime values:
   for i in 0..1000
 puts '2000-99-01T00:00:00+23'
   end
 In this test, the regex pattern was always recompiled. I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is not null; dump 
 B;
 The script took 190s.
 I understand this could be considered an edge case and might not be worth 
 changing. However, if there are use cases where invalid dates are part of 
 normal processing, then you might consider fixing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3345) Handle null in DateTime functions

2013-06-03 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673761#comment-13673761
 ] 

Prashant Kommireddi commented on PIG-3345:
--

Hi [~rohini], patch looks good. Would you like to add tests for ToDate* 
functions too (under testConversionBetweenDateTimeAndString())? 

 Handle null in DateTime functions
 -

 Key: PIG-3345
 URL: https://issues.apache.org/jira/browse/PIG-3345
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3345-1.patch


  NPE is thrown in date time functions when a null value is passed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3341) Improving performance of loading datetime values

2013-06-03 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673780#comment-13673780
 ] 

Dmitriy V. Ryaboy commented on PIG-3341:


I don't think we are completely consistent, but turning invalid into null has 
been pretty standard.

My personal preference is also to increment a counter for # of such 
conversions, and to log the first N occurrences (when N errors are encountered, 
log something to the effect of not logging this error any more because there's 
so much of it.)

 Improving performance of loading datetime values
 

 Key: PIG-3341
 URL: https://issues.apache.org/jira/browse/PIG-3341
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: pat chan
Priority: Minor
 Fix For: 0.12, 0.11.2


 The performance of loading datetime values can be improved by about 25% by 
 moving a single line in ToDate.java:
 public static DateTimeZone extractDateTimeZone(String dtStr) {
   Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);;
 should become:
 static Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);
 public static DateTimeZone extractDateTimeZone(String dtStr) {
 There is no need to recompile the regular expression for every value. I'm not 
 sure if this function is ever called concurrently, but Pattern objects are 
 thread-safe anyways.
 As a test, I created a file of 10M timestamps:
   for i in 0..1000
 puts '2000-01-01T00:00:00+23'
   end
 I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is null; dump B;
 Before the change it took 160s.
 After the change, the script took 120s.
 
 Another performance improvement can be made for invalid datetime values. If a 
 datetime value is invalid, an exception is created and thrown, which is a 
 costly way to fail a validity check. To test the performance impact, I 
 created 10M invalid datetime values:
   for i in 0..1000
 puts '2000-99-01T00:00:00+23'
   end
 In this test, the regex pattern was always recompiled. I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is not null; dump 
 B;
 The script took 190s.
 I understand this could be considered an edge case and might not be worth 
 changing. However, if there are use cases where invalid dates are part of 
 normal processing, then you might consider fixing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-06-03 Thread Viraj Bhat

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/
---

(Updated June 4, 2013, 12:15 a.m.)


Review request for pig and Rohini Palaniswamy.


Changes
---

Using MockStorage instead of the PigStorage and comparing results inline for 4 
records.


Description
---

Null pointer exception when loading union with null in it's schema. Test case 
was also updated with a sample test case.


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
 1485358 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
 1485358 
  
http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
 1485358 

Diff: https://reviews.apache.org/r/11333/diff/


Testing
---

Yes all tests pass in the piggybank


Thanks,

Viraj Bhat



Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-06-03 Thread Viraj Bhat


 On June 3, 2013, 1:03 p.m., Rohini Palaniswamy wrote:
  Just minor comments in the naming of the variable. Java variable names 
  should be camel case.

Thanks but now the verifyTxtResults method is not used any more


- Viraj


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/#review21312
---


On June 4, 2013, 12:15 a.m., Viraj Bhat wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11333/
 ---
 
 (Updated June 4, 2013, 12:15 a.m.)
 
 
 Review request for pig and Rohini Palaniswamy.
 
 
 Description
 ---
 
 Null pointer exception when loading union with null in it's schema. Test case 
 was also updated with a sample test case.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
  1485358 
 
 Diff: https://reviews.apache.org/r/11333/diff/
 
 
 Testing
 ---
 
 Yes all tests pass in the piggybank
 
 
 Thanks,
 
 Viraj Bhat
 




Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-06-03 Thread Viraj Bhat


 On June 2, 2013, 9:27 p.m., Cheolsoo Park wrote:
  http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java,
   line 1104
  https://reviews.apache.org/r/11333/diff/5/?file=298357#file298357line1104
 
  If you use mock.Storage here instead of PigStoage, you won't need the 
  verifyTextResults method and extra output file. Can you please update your 
  test?
  
  Please see org.apache.pig.builtin.mock.Storage.java.

Added Mock Storage


- Viraj


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/#review21305
---


On June 4, 2013, 12:15 a.m., Viraj Bhat wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11333/
 ---
 
 (Updated June 4, 2013, 12:15 a.m.)
 
 
 Review request for pig and Rohini Palaniswamy.
 
 
 Description
 ---
 
 Null pointer exception when loading union with null in it's schema. Test case 
 was also updated with a sample test case.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
  1485358 
 
 Diff: https://reviews.apache.org/r/11333/diff/
 
 
 Testing
 ---
 
 Yes all tests pass in the piggybank
 
 
 Thanks,
 
 Viraj Bhat
 




[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-06-03 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: test_loadavrowithnulls.avro

 AVRO: AvroStorage give NPE on reading file with union as top level schema
 -

 Key: PIG-3322
 URL: https://issues.apache.org/jira/browse/PIG-3322
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.2
Reporter: Egil Sorensen
Assignee: Viraj Bhat
  Labels: patch
 Fix For: 0.12

 Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro


 I am getting NPE when loading a file with AvroStorage a file that has schema 
 like:
 {code}
 [null,{type:record,name:TUPLE_0,fields:[{name:name,type:[null,string],doc:autogenerated
  from Pig Field 
 Schema},{name:age,type:[null,int],doc:autogenerated from Pig 
 Field Schema},{name:gpa,type:[null,double],doc:autogenerated 
 from Pig Field Schema}]}]
 {code}
 E.g. see the e2e style test, which fails on this:
 {code}
 {
 'num' = 4,
 # storing file with Pig type tuple relying on 
 conversion to record
 # loading using stored schemas 
 'notmq' = 1,
 'pig' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 exec;
 -- Read back what was stored with Avro
 u = load ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 describe u;
 store u into ':OUTPATH:';
 \,
 'verify_pig_script' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:';
 \,
 },
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-06-03 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: (was: expected_testLoadAvrowithNulls.txt)

 AVRO: AvroStorage give NPE on reading file with union as top level schema
 -

 Key: PIG-3322
 URL: https://issues.apache.org/jira/browse/PIG-3322
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.2
Reporter: Egil Sorensen
Assignee: Viraj Bhat
  Labels: patch
 Fix For: 0.12

 Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro


 I am getting NPE when loading a file with AvroStorage a file that has schema 
 like:
 {code}
 [null,{type:record,name:TUPLE_0,fields:[{name:name,type:[null,string],doc:autogenerated
  from Pig Field 
 Schema},{name:age,type:[null,int],doc:autogenerated from Pig 
 Field Schema},{name:gpa,type:[null,double],doc:autogenerated 
 from Pig Field Schema}]}]
 {code}
 E.g. see the e2e style test, which fails on this:
 {code}
 {
 'num' = 4,
 # storing file with Pig type tuple relying on 
 conversion to record
 # loading using stored schemas 
 'notmq' = 1,
 'pig' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 exec;
 -- Read back what was stored with Avro
 u = load ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 describe u;
 store u into ':OUTPATH:';
 \,
 'verify_pig_script' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:';
 \,
 },
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-06-03 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: (was: PIG-3322_2.patch)

 AVRO: AvroStorage give NPE on reading file with union as top level schema
 -

 Key: PIG-3322
 URL: https://issues.apache.org/jira/browse/PIG-3322
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.2
Reporter: Egil Sorensen
Assignee: Viraj Bhat
  Labels: patch
 Fix For: 0.12

 Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro


 I am getting NPE when loading a file with AvroStorage a file that has schema 
 like:
 {code}
 [null,{type:record,name:TUPLE_0,fields:[{name:name,type:[null,string],doc:autogenerated
  from Pig Field 
 Schema},{name:age,type:[null,int],doc:autogenerated from Pig 
 Field Schema},{name:gpa,type:[null,double],doc:autogenerated 
 from Pig Field Schema}]}]
 {code}
 E.g. see the e2e style test, which fails on this:
 {code}
 {
 'num' = 4,
 # storing file with Pig type tuple relying on 
 conversion to record
 # loading using stored schemas 
 'notmq' = 1,
 'pig' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 exec;
 -- Read back what was stored with Avro
 u = load ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 describe u;
 store u into ':OUTPATH:';
 \,
 'verify_pig_script' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:';
 \,
 },
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-06-03 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: (was: test_loadavrowithnulls.avro)

 AVRO: AvroStorage give NPE on reading file with union as top level schema
 -

 Key: PIG-3322
 URL: https://issues.apache.org/jira/browse/PIG-3322
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.2
Reporter: Egil Sorensen
Assignee: Viraj Bhat
  Labels: patch
 Fix For: 0.12

 Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro


 I am getting NPE when loading a file with AvroStorage a file that has schema 
 like:
 {code}
 [null,{type:record,name:TUPLE_0,fields:[{name:name,type:[null,string],doc:autogenerated
  from Pig Field 
 Schema},{name:age,type:[null,int],doc:autogenerated from Pig 
 Field Schema},{name:gpa,type:[null,double],doc:autogenerated 
 from Pig Field Schema}]}]
 {code}
 E.g. see the e2e style test, which fails on this:
 {code}
 {
 'num' = 4,
 # storing file with Pig type tuple relying on 
 conversion to record
 # loading using stored schemas 
 'notmq' = 1,
 'pig' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 exec;
 -- Read back what was stored with Avro
 u = load ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 describe u;
 store u into ':OUTPATH:';
 \,
 'verify_pig_script' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:';
 \,
 },
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-06-03 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-3322:


Attachment: PIG-3322_3.patch

 AVRO: AvroStorage give NPE on reading file with union as top level schema
 -

 Key: PIG-3322
 URL: https://issues.apache.org/jira/browse/PIG-3322
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.2
Reporter: Egil Sorensen
Assignee: Viraj Bhat
  Labels: patch
 Fix For: 0.12

 Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro


 I am getting NPE when loading a file with AvroStorage a file that has schema 
 like:
 {code}
 [null,{type:record,name:TUPLE_0,fields:[{name:name,type:[null,string],doc:autogenerated
  from Pig Field 
 Schema},{name:age,type:[null,int],doc:autogenerated from Pig 
 Field Schema},{name:gpa,type:[null,double],doc:autogenerated 
 from Pig Field Schema}]}]
 {code}
 E.g. see the e2e style test, which fails on this:
 {code}
 {
 'num' = 4,
 # storing file with Pig type tuple relying on 
 conversion to record
 # loading using stored schemas 
 'notmq' = 1,
 'pig' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 exec;
 -- Read back what was stored with Avro
 u = load ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 describe u;
 store u into ':OUTPATH:';
 \,
 'verify_pig_script' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:';
 \,
 },
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3322 Fix the issue where NPE is thrown when reading a union which has nulls and add a testcase

2013-06-03 Thread Rohini Palaniswamy

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11333/#review21383
---

Ship it!


Ship It!

- Rohini Palaniswamy


On June 4, 2013, 12:15 a.m., Viraj Bhat wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11333/
 ---
 
 (Updated June 4, 2013, 12:15 a.m.)
 
 
 Review request for pig and Rohini Palaniswamy.
 
 
 Description
 ---
 
 Null pointer exception when loading union with null in it's schema. Test case 
 was also updated with a sample test case.
 
 
 Diffs
 -
 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/AvroStorage.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroRecordReader.java
  1485358 
   
 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/TestAvroStorage.java
  1485358 
 
 Diff: https://reviews.apache.org/r/11333/diff/
 
 
 Testing
 ---
 
 Yes all tests pass in the piggybank
 
 
 Thanks,
 
 Viraj Bhat
 




[jira] [Commented] (PIG-3341) Improving performance of loading datetime values

2013-06-03 Thread pat chan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673876#comment-13673876
 ] 

pat chan commented on PIG-3341:
---

I was looking in the docs for any documentation on this topic. I found the 
following in http://wiki.apache.org/pig/UDFManual

quote
The first thing to decide is what to do with invalid data. This depends on the 
format of the data. If the data is of type bytearray it means that it has not 
yet been converted to its proper type. In this case, if the format of the data 
does not match the expected type, a null value should be returned. If, on the 
other hand, the input data is of another type, this means that the conversion 
has already happened and the data should be in the correct format. This is the 
case with our example and that's why it throws an error (line 16.) Note that 
WrappedIOException is a helper class to convert the actual exception to an 
IOException.

Also, note that lines 10-11 check if the input data is null or empty and if so 
returns null.
/quote

If I'm reading this correctly, it says that if the type of the input doesn't 
match the signature of the UDF, a null should be returned. However, I get this:

  grunt A = load 'o' as (a:bytearray);
  grunt B = foreach A generate ToDate(a); dump B;
  2013-06-03 17:15:09,253 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1046: 
  line 2, column 23 Multiple matching functions for 
org.apache.pig.builtin.ToDate with input schema: ({long}, {chararray}). Please 
use an explicit cast.

It also seems to be saying that if the types are right and the format is 
invalid, an error should be thrown. I just checked and yes, I get an error. 
However, this doesn't match Rohini's proposal to return a null instead. Also, 
as Dmitriy hinted, it's not philosophically consistent with loading behavior 
where invalid things turn into nulls.

  2013-06-03 17:25:12,977 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!
  2013-06-03 17:25:12,981 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias B


BTW, the note about lines 10-11 isn't quite right. The code in the example 
doesn't have a check for null and so a null would cause an exception.


 Improving performance of loading datetime values
 

 Key: PIG-3341
 URL: https://issues.apache.org/jira/browse/PIG-3341
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.11.1
Reporter: pat chan
Priority: Minor
 Fix For: 0.12, 0.11.2


 The performance of loading datetime values can be improved by about 25% by 
 moving a single line in ToDate.java:
 public static DateTimeZone extractDateTimeZone(String dtStr) {
   Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);;
 should become:
 static Pattern pattern = 
 Pattern.compile((Z|(?=(T[0-9\\.:]{0,12}))((\\+|-)\\d{2}(:?\\d{2})?))$);
 public static DateTimeZone extractDateTimeZone(String dtStr) {
 There is no need to recompile the regular expression for every value. I'm not 
 sure if this function is ever called concurrently, but Pattern objects are 
 thread-safe anyways.
 As a test, I created a file of 10M timestamps:
   for i in 0..1000
 puts '2000-01-01T00:00:00+23'
   end
 I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is null; dump B;
 Before the change it took 160s.
 After the change, the script took 120s.
 
 Another performance improvement can be made for invalid datetime values. If a 
 datetime value is invalid, an exception is created and thrown, which is a 
 costly way to fail a validity check. To test the performance impact, I 
 created 10M invalid datetime values:
   for i in 0..1000
 puts '2000-99-01T00:00:00+23'
   end
 In this test, the regex pattern was always recompiled. I then ran this script:
   grunt A = load 'data' as (a:datetime); B = filter A by a is not null; dump 
 B;
 The script took 190s.
 I understand this could be considered an edge case and might not be worth 
 changing. However, if there are use cases where invalid dates are part of 
 normal processing, then you might consider fixing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3346) New property that controls the number of combined splits

2013-06-03 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3346:
--

 Summary: New property that controls the number of combined splits
 Key: PIG-3346
 URL: https://issues.apache.org/jira/browse/PIG-3346
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.12


Currently, the size of combined splits can be configured by the 
{{pig.maxCombinedSplitSize}} property.

Although this works fine most of time, it can lead to a undesired situation 
where a single mapper ends up loading a lot of combined splits. Particularly, 
this is bad if Pig uploads them from S3.

So it will be useful if the max number of combined splits can be configured via 
a property something like {{pig.maxCombinedSplitNum}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-06-03 Thread jira
Issue Subscription
Filter: PIG patch available (19 issues)

Subscriber: pigdaily

Key Summary
PIG-3345Handle null in DateTime functions
https://issues.apache.org/jira/browse/PIG-3345
PIG-3342Allow conditions in case statement
https://issues.apache.org/jira/browse/PIG-3342
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3318AVRO: 'default value' not honored when merging schemas on load with 
AvroStorage
https://issues.apache.org/jira/browse/PIG-3318
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3288Kill jobs if the number of output files is over a configurable limit
https://issues.apache.org/jira/browse/PIG-3288
PIG-3280Document IN operator and CASE expression
https://issues.apache.org/jira/browse/PIG-3280
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2828Handle nulls in DataType.compare
https://issues.apache.org/jira/browse/PIG-2828
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Resolved] (PIG-3322) AVRO: AvroStorage give NPE on reading file with union as top level schema

2013-06-03 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-3322.
-

Resolution: Fixed

Committed to trunk (0.12). Thanks Viraj and Cheolsoo.

 AVRO: AvroStorage give NPE on reading file with union as top level schema
 -

 Key: PIG-3322
 URL: https://issues.apache.org/jira/browse/PIG-3322
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: 0.11.2
Reporter: Egil Sorensen
Assignee: Viraj Bhat
  Labels: patch
 Fix For: 0.12

 Attachments: PIG-3322_3.patch, test_loadavrowithnulls.avro


 I am getting NPE when loading a file with AvroStorage a file that has schema 
 like:
 {code}
 [null,{type:record,name:TUPLE_0,fields:[{name:name,type:[null,string],doc:autogenerated
  from Pig Field 
 Schema},{name:age,type:[null,int],doc:autogenerated from Pig 
 Field Schema},{name:gpa,type:[null,double],doc:autogenerated 
 from Pig Field Schema}]}]
 {code}
 E.g. see the e2e style test, which fails on this:
 {code}
 {
 'num' = 4,
 # storing file with Pig type tuple relying on 
 conversion to record
 # loading using stored schemas 
 'notmq' = 1,
 'pig' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 exec;
 -- Read back what was stored with Avro
 u = load ':OUTPATH:.intermediate' USING 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 describe u;
 store u into ':OUTPATH:';
 \,
 'verify_pig_script' = q\
 a = load ':INPATH:/singlefile/studentcomplextab10k' using PigStorage() as 
 (m:[], t:(name:chararray, age:int, gpa:double), b:{t:(name:chararray, 
 age:int, gpa:double)});
 b = foreach a generate t;
 describe b;
 store b into ':OUTPATH:';
 \,
 },
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: PIG-3342 Allow conditions in case statement

2013-06-03 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11613/
---

Review request for pig.


Description
---

Allows condition expression in case statement.


This addresses bug PIG-3342.
https://issues.apache.org/jira/browse/PIG-3342


Diffs
-

  src/org/apache/pig/parser/AstPrinter.g c2abede 
  src/org/apache/pig/parser/AstValidator.g 2c6d4dc 
  src/org/apache/pig/parser/LogicalPlanGenerator.g 9375d60 
  src/org/apache/pig/parser/QueryParser.g 2b84c86 
  test/org/apache/pig/test/TestCase.java dbee495 

Diff: https://reviews.apache.org/r/11613/diff/


Testing
---

All unit tests pass.


Thanks,

Cheolsoo Park



[jira] [Commented] (PIG-3342) Allow conditions in case statement

2013-06-03 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673933#comment-13673933
 ] 

Cheolsoo Park commented on PIG-3342:


Thanks Rohini for taking a look.

Here is the RB request:
https://reviews.apache.org/r/11613/

 Allow conditions in case statement
 --

 Key: PIG-3342
 URL: https://issues.apache.org/jira/browse/PIG-3342
 Project: Pig
  Issue Type: Improvement
  Components: parser
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.12

 Attachments: PIG-3342.patch


 PIG-3268 added case statement support. But conditions are currently not 
 allowed in when branches. For example,
 {code}
 CASE
   WHEN i % 5 == 0 THEN '5n'
   WHEN i % 5 == 1 THEN '5n+1'
   WHEN i % 5 == 2 THEN '5n+2'
   WHEN i % 5 == 3 THEN '5n+3'
   ELSE '5n+4'
 END
 {code}
 This is invalid now. However, it will be useful if it's allowed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3346) New property that controls the number of combined splits

2013-06-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3346:
---

Attachment: PIG-3346.patch

The attached patch includes the following changes:
* Adds a new property {{pig.maxCombinedSplitNum}}. By default, it is set to 
Long.MAX_VALUE.
* Updates the logic of {{MapRedUtil.getCombinePigSplits()}} to take the number 
of combined splits into account.
* Adds a new test case to {{TestSplitCombine}}.
* Updates the document regarding the new property.

Test done:
* ant test-commit
* ant test -Dtestcase=TestSplitCombine

Thanks!

 New property that controls the number of combined splits
 

 Key: PIG-3346
 URL: https://issues.apache.org/jira/browse/PIG-3346
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.12

 Attachments: PIG-3346.patch


 Currently, the size of combined splits can be configured by the 
 {{pig.maxCombinedSplitSize}} property.
 Although this works fine most of time, it can lead to a undesired situation 
 where a single mapper ends up loading a lot of combined splits. Particularly, 
 this is bad if Pig uploads them from S3.
 So it will be useful if the max number of combined splits can be configured 
 via a property something like {{pig.maxCombinedSplitNum}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3329) RANK operator failed when working with SPLIT

2013-06-03 Thread Johnny Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673936#comment-13673936
 ] 

Johnny Zhang commented on PIG-3329:
---

[~xalan], are you working this right now? I got the similar exception when I 
was working on another patch, so it will be very nice if I can understand how 
will you resolve this issue. Thanks a lot!

 RANK operator failed when working with SPLIT 
 -

 Key: PIG-3329
 URL: https://issues.apache.org/jira/browse/PIG-3329
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Redis Liu
Assignee: Allan AvendaƱo
Priority: Critical

 input.txt:
 1 2 3
 4 5 6
 7 8 9
 script:
 a = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
 SPLIT a into b if a  0, c if a  5;
 d = RANK b;
 dump d;
 job will fail with error message:
 java.lang.RuntimeException: Unable to read counter 
 pig.counters.counter_4929375455335572575_-1
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.addRank(PORank.java:161)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PORank.getNext(PORank.java:134)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:157)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:275)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1340)
   at org.apache.hadoop.mapred.Child.main(Child.java:269)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3346) New property that controls the number of combined splits

2013-06-03 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3346:
---

Status: Patch Available  (was: Open)

 New property that controls the number of combined splits
 

 Key: PIG-3346
 URL: https://issues.apache.org/jira/browse/PIG-3346
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.12

 Attachments: PIG-3346.patch


 Currently, the size of combined splits can be configured by the 
 {{pig.maxCombinedSplitSize}} property.
 Although this works fine most of time, it can lead to a undesired situation 
 where a single mapper ends up loading a lot of combined splits. Particularly, 
 this is bad if Pig uploads them from S3.
 So it will be useful if the max number of combined splits can be configured 
 via a property something like {{pig.maxCombinedSplitNum}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PIG-3345) Handle null in DateTime functions

2013-06-03 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674025#comment-13674025
 ] 

Rohini Palaniswamy edited comment on PIG-3345 at 6/4/13 4:25 AM:
-

Thanks Prashant. Added for all the udfs in testConversionBetweenDTAndString

  was (Author: rohini):
Thanks Prashant. Added for all the methods in 
testConversionBetweenDTAndString
  
 Handle null in DateTime functions
 -

 Key: PIG-3345
 URL: https://issues.apache.org/jira/browse/PIG-3345
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3345-1.patch, PIG-3345-2.patch


  NPE is thrown in date time functions when a null value is passed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3345) Handle null in DateTime functions

2013-06-03 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3345:


Attachment: PIG-3345-2.patch

Thanks Prashant. Added for all the methods in testConversionBetweenDTAndString

 Handle null in DateTime functions
 -

 Key: PIG-3345
 URL: https://issues.apache.org/jira/browse/PIG-3345
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3345-1.patch, PIG-3345-2.patch


  NPE is thrown in date time functions when a null value is passed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3345) Handle null in DateTime functions

2013-06-03 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13674070#comment-13674070
 ] 

Prashant Kommireddi commented on PIG-3345:
--

LGTM +1

Thanks Rohini!

 Handle null in DateTime functions
 -

 Key: PIG-3345
 URL: https://issues.apache.org/jira/browse/PIG-3345
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3345-1.patch, PIG-3345-2.patch


  NPE is thrown in date time functions when a null value is passed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira