[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-01-08 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546714#comment-13546714
 ] 

Scott Carey commented on PIG-3015:
--

Try corrupting the file at a point inside the data block instead of inside
the sync marker.  The ability to recover from a corrupted file was added
in response to corrupted data, not corrupted sync.





 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
 PIG-3015-5.patch, TestInput.java, Test.java


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni (as 
 TrevniStorage).
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3015) Rewrite of AvroStorage

2013-01-08 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3015:
---

Attachment: good.avro
bad.avro
Test.java
TestInput.java

Hi Scott,

Thank you very much. That makes sense. After several tries and errors, I 
managed to correctly corrupt a data block and was able to verify the recovery.

The output from 'java-tool.jar tojson bad.avro' is as follows:
{code}
Caused by: java.io.IOException: Block read partially, the data may be corrupt
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:194)
... 3 more
{code}
The output from my test program is as follows:
{code}
next(): 685
tell(): 8196
next(): 686
tell(): 8196
hasNext() or next() failed
tell(): 8240
next(): 2656
tell(): 16432
next(): 2657
tell(): 16432
{code}
The data are sequential integers (0 ~ 1M). Here is the number of lost integers 
due to a single corrupted data block with different sync intervals:
||Sync interval in bytes||Num. of lost values||
|32|1970|
|16,000|5389|

In summary,
* Avro can recover from a data block corruption but cannot from a sync marker 
corruption.
* The amount of data loss depends on the sync interval. By default, it's 16KB, 
but it can vary from 32 to 2^30 bytes. The greater the sync interval is, the 
more data loss is.

I am attaching my test program and input files if anyone's interested.

Thanks!

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, 
 PIG-3015-4.patch, PIG-3015-5.patch, TestInput.java, Test.java


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni (as 
 TrevniStorage).
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3015) Rewrite of AvroStorage

2013-01-08 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3015:
---

Attachment: (was: Test.java)

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, 
 PIG-3015-4.patch, PIG-3015-5.patch, TestInput.java, Test.java


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni (as 
 TrevniStorage).
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3015) Rewrite of AvroStorage

2013-01-08 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3015:
---

Attachment: (was: TestInput.java)

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: bad.avro, good.avro, PIG-3015-2.patch, PIG-3015-3.patch, 
 PIG-3015-4.patch, PIG-3015-5.patch, TestInput.java, Test.java


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni (as 
 TrevniStorage).
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2433) Jython import module not working if module path is in classpath

2013-01-08 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547050#comment-13547050
 ] 

Cheolsoo Park commented on PIG-2433:


+1.

Thanks for the fix. The test passes for me too. I also ran e2e test and found 
no failure.

Minor comment:
When you commit the patch, can you remove a tab char in the following line?
{code}
+   !-- Remove jython jar from mrapp-generated-classpath --
{code}

 Jython import module not working if module path is in classpath
 ---

 Key: PIG-2433
 URL: https://issues.apache.org/jira/browse/PIG-2433
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Daniel Dai
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: bad.log, good.log, PIG-2433-1.patch, PIG-2433.patch, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 This is a hole of PIG-1824. If the path of python module is in classpath, job 
 die with the message could not instantiate 
 'org.apache.pig.scripting.jython.JythonFunction'.
 Here is my observation:
 If the path of python module is in classpath, fileEntry we got in 
 JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
 script itself. Thus we cannot locate the script and skip the script in 
 job.xml. 
 For example:
 {code}
 register 'scriptB.py' using 
 org.apache.pig.scripting.jython.JythonScriptEngine as pig
 A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
 B = foreach A generate pig.square(a0);
 dump B;
 scriptB.py:
 #!/usr/bin/python
 import scriptA
 @outputSchema(x:{t:(num:double)})
 def sqrt(number):
  return (number ** .5)
 @outputSchema(x:{t:(num:long)})
 def square(number):
  return long(scriptA.square(number))
 scriptA.py:
 #!/usr/bin/python
 def square(number):
  return (number * number)
 {code}
 When we register scriptB.py, we use jython library to figure out the 
 dependent modules scriptB relies on, in this case, scriptA. However, if 
 current directory is in classpath, instead of scriptA.py, we get 
 __pyclasspath__/scriptA.class. Then we try to put 
 __pyclasspath__/script$py.class into job.jar, Pig complains 
 __pyclasspath__/script$py.class does not exist. 
 This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
 20.x, the test still success because MiniCluster will take local classpath so 
 it can still find scriptA.py even if it is not in job.jar. However, the 
 script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2013-01-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547085#comment-13547085
 ] 

Alan Gates commented on PIG-2769:
-

When I do a clean first as Cheolsoo advises it works, though I don't fully 
understand that since I started out with a clean checkout.

In the system tests NegForeach_7, NegForeach_9, SyntaxErrors_4, Macro_Error_4 
all fail because the error messages have changed.  You can find these in 
test/e2e/pig/tests/negative.conf and macro.conf.  Search on each of the group 
names (NegForeach, ...) and then find the test number under that.  In each case 
you can run the query and change the expected error message to match the new 
one.  

Other than that, +1, patch looks good.

 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Nick White
 Fix For: 0.12

 Attachments: case1.tar, PIG-2769.0.patch, 
 TEST-org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TestInputSizeReducerEstimator.txt


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2013-01-08 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547105#comment-13547105
 ] 

Rohini Palaniswamy commented on PIG-2769:
-

bq. When I do a clean first as Cheolsoo advises it works, though I don't fully 
understand that since I started out with a clean checkout.

Order of tests run must have been the cause. It might have passed after doing a 
clean because you might have just run TestInputSizeReducerEstimator without 
running other tests. TestInputSizeReducerEstimator needs to be fixed to do new 
Configuration(false); instead of new Configuration(); which makes it pick up 
hadoop-site.xml from a previously run test. 


 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Nick White
 Fix For: 0.12

 Attachments: case1.tar, PIG-2769.0.patch, 
 TEST-org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TestInputSizeReducerEstimator.txt


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2013-01-08 Thread Nick White (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick White updated PIG-2769:


Attachment: PIG-2769.1.patch

Thanks! I've attached a version of the patch which fixes the e2e tests you 
mentioned.

 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Nick White
 Fix For: 0.12

 Attachments: case1.tar, PIG-2769.0.patch, PIG-2769.1.patch, 
 TEST-org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TestInputSizeReducerEstimator.txt


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2013-01-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2769:


Attachment: PIG-2769.2.patch

Two small changes from the last patch.  I fixed one issue in negative.conf 
where there was a  instead of .  Also changed TestInputSizeReducerEstimator 
as suggested by Rohini which fixed the unit test issue (thanks Rohini).

 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Nick White
 Fix For: 0.12

 Attachments: case1.tar, PIG-2769.0.patch, PIG-2769.1.patch, 
 PIG-2769.2.patch, 
 TEST-org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TestInputSizeReducerEstimator.txt


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2769) a simple logic causes very long compiling time on pig 0.10.0

2013-01-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2769:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Nick.

 a simple logic causes very long compiling time on pig 0.10.0
 

 Key: PIG-2769
 URL: https://issues.apache.org/jira/browse/PIG-2769
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
 Environment: Apache Pig version 0.10.0-SNAPSHOT (rexported)
Reporter: Dan Li
Assignee: Nick White
 Fix For: 0.12

 Attachments: case1.tar, PIG-2769.0.patch, PIG-2769.1.patch, 
 PIG-2769.2.patch, 
 TEST-org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TestInputSizeReducerEstimator.txt


 We found the following simple logic will cause very long compiling time for 
 pig 0.10.0, while using pig 0.8.1, everything is fine.
 A = load 'A.txt' using PigStorage()  AS (m: int);
 B = FOREACH A {
 days_str = (chararray)
 (m == 1 ? 31: 
 (m == 2 ? 28: 
 (m == 3 ? 31: 
 (m == 4 ? 30: 
 (m == 5 ? 31: 
 (m == 6 ? 30: 
 (m == 7 ? 31: 
 (m == 8 ? 31: 
 (m == 9 ? 30: 
 (m == 10 ? 31: 
 (m == 11 ? 30:31)));
 GENERATE
days_str as days_str;
 }   
 store B into 'B';
 and here's a simple input file example: A.txt
 1
 2
 3
 The pig version we used in the test
 Apache Pig version 0.10.0-SNAPSHOT (rexported)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-01-08 Thread jira
Issue Subscription
Filter: PIG patch available (35 issues)

Subscriber: pigdaily

Key Summary
PIG-3115Distinct Build-in Function Doesn't Handle Null Bags
https://issues.apache.org/jira/browse/PIG-3115
PIG-3108HBaseStorage returns empty maps when mixing wildcard- with other 
columns
https://issues.apache.org/jira/browse/PIG-3108
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3098Add another test for the self join case
https://issues.apache.org/jira/browse/PIG-3098
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests 
https://issues.apache.org/jira/browse/PIG-3086
PIG-3078Make a UDF that, given a string, returns just the columns prefixed 
by that string
https://issues.apache.org/jira/browse/PIG-3078
PIG-3073POUserFunc creating log spam for large scripts
https://issues.apache.org/jira/browse/PIG-3073
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3057make readField protected to be able to override it if we extend 
PigStorage
https://issues.apache.org/jira/browse/PIG-3057
PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for 
cross-platform execution
https://issues.apache.org/jira/browse/PIG-3029
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function 
returns a Boolean value indicating whether string left is equal to string 
right. This check is case insensitive.
https://issues.apache.org/jira/browse/PIG-2878
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2788improved string interpolation of variables
https://issues.apache.org/jira/browse/PIG-2788
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2645PigSplit does not handle the case where SerializationFactory 
returns null
https://issues.apache.org/jira/browse/PIG-2645
PIG-2507Semicolon in paramenters for UDF results in parsing error
https://issues.apache.org/jira/browse/PIG-2507
PIG-2433Jython import module not working if module path is in classpath
https://issues.apache.org/jira/browse/PIG-2433
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:

[jira] [Updated] (PIG-2433) Jython import module not working if module path is in classpath

2013-01-08 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2433:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review Cheolsoo. Removed the tab before committing. Committed to 
trunk.  

 Jython import module not working if module path is in classpath
 ---

 Key: PIG-2433
 URL: https://issues.apache.org/jira/browse/PIG-2433
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Daniel Dai
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: bad.log, good.log, PIG-2433-1.patch, PIG-2433.patch, 
 TEST-org.apache.pig.test.TestScriptUDF.txt


 This is a hole of PIG-1824. If the path of python module is in classpath, job 
 die with the message could not instantiate 
 'org.apache.pig.scripting.jython.JythonFunction'.
 Here is my observation:
 If the path of python module is in classpath, fileEntry we got in 
 JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the 
 script itself. Thus we cannot locate the script and skip the script in 
 job.xml. 
 For example:
 {code}
 register 'scriptB.py' using 
 org.apache.pig.scripting.jython.JythonScriptEngine as pig
 A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long);
 B = foreach A generate pig.square(a0);
 dump B;
 scriptB.py:
 #!/usr/bin/python
 import scriptA
 @outputSchema(x:{t:(num:double)})
 def sqrt(number):
  return (number ** .5)
 @outputSchema(x:{t:(num:long)})
 def square(number):
  return long(scriptA.square(number))
 scriptA.py:
 #!/usr/bin/python
 def square(number):
  return (number * number)
 {code}
 When we register scriptB.py, we use jython library to figure out the 
 dependent modules scriptB relies on, in this case, scriptA. However, if 
 current directory is in classpath, instead of scriptA.py, we get 
 __pyclasspath__/scriptA.class. Then we try to put 
 __pyclasspath__/script$py.class into job.jar, Pig complains 
 __pyclasspath__/script$py.class does not exist. 
 This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 
 20.x, the test still success because MiniCluster will take local classpath so 
 it can still find scriptA.py even if it is not in job.jar. However, the 
 script will fail in real cluster and MiniMRYarnCluster of hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3117) A debug mode in which pig does not delete temporary files

2013-01-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547519#comment-13547519
 ] 

Daniel Dai commented on PIG-3117:
-

Pig intermediate file is not snappy. By default it is InterStorage. For your 
request:
1. It should be fairly easy to retain temp files, just don't call 
FileLocalizer.deleteTempFiles() in Main
2. To retain plain text, you may need to change Utils.getTmpFileCompressorName, 
not sure if that's enough. Another approach is to write a decoder which invoke 
InterStorage to decode the tmp files

 A debug mode in which pig does not delete temporary files
 -

 Key: PIG-3117
 URL: https://issues.apache.org/jira/browse/PIG-3117
 Project: Pig
  Issue Type: Wish
Affects Versions: 0.10.0
Reporter: Ido Hadanny

 when we debug our pig jobs on pre-production data, we usually find bugs we 
 couldn't detect in our UT, as env and data are not quite the same.
 when the final output of a script is not quite what we expect, we start 
 divide-and-conquer, running it line by line and inspecting the intermediate 
 output of each stage. 
 It would be great if we could simply configure pig not to delete the 
 intermediate MR outputs, and store them as plaintext instead of snappy format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit

2013-01-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547534#comment-13547534
 ] 

Daniel Dai commented on PIG-3114:
-

Runs good for me with 0.10.1. Which version are you using?

 Duplicated macro name error when using pigunit
 --

 Key: PIG-3114
 URL: https://issues.apache.org/jira/browse/PIG-3114
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Chetan Nadgire

 I'm using PigUnit to test a pig script within which a macro is defined.
 Pig runs fine on cluster but getting parsing error with pigunit.
 So I tried very basic pig script with macro and getting similar error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1'
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
   at 
 org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
   at 
 org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 
 'my_macro_1'
   at 
 org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
   at 
 org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
   ... 30 more
  
 Pig script which is failing :
 {code:title=test.pig|borderStyle=solid}
 DEFINE my_macro_1 (QUERY, A) RETURNS C {
 $C = ORDER $QUERY BY total DESC, $A;
 } ;
 data =  LOAD 'input' AS (query:CHARARRAY);
 queries_group = GROUP data BY query;
 queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS 
 total;
 queries_ordered = my_macro_1(queries_count, query);
 queries_limit = LIMIT queries_ordered 2;
 STORE queries_limit INTO 'output';
 {code}
 If I remove macro pigunit works fine. Even just defining macro without using 
 it results in parsing error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3115) Distinct Build-in Function Doesn't Handle Null Bags

2013-01-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547543#comment-13547543
 ] 

Daniel Dai commented on PIG-3115:
-

Looks good. Only have one question: in getDistinctFromNestedBags, do we need 
null check if Initial always generate legitimate bag? Is it just for bullet 
proof?

 Distinct Build-in Function Doesn't Handle Null Bags
 ---

 Key: PIG-3115
 URL: https://issues.apache.org/jira/browse/PIG-3115
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Nick White
Assignee: Nick White
 Attachments: PIG-3115.1.patch


 Calling Distinct(NULL) throws NPEs - it should handle this more gracefully. 
 The attached patch makes Distinct(NULL) == {}, although it could return NULL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3115) Distinct Build-in Function Doesn't Handle Null Bags

2013-01-08 Thread Nick White (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547597#comment-13547597
 ] 

Nick White commented on PIG-3115:
-

Yes, it's just defensive programming; I hit one of the NullPointerExceptions 
when writing a Pig script, so when I wrote the unit test to track down where it 
came from I added tests for all three static classes of Distinct...and so came 
across it then.

 Distinct Build-in Function Doesn't Handle Null Bags
 ---

 Key: PIG-3115
 URL: https://issues.apache.org/jira/browse/PIG-3115
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Nick White
Assignee: Nick White
 Attachments: PIG-3115.1.patch


 Calling Distinct(NULL) throws NPEs - it should handle this more gracefully. 
 The attached patch makes Distinct(NULL) == {}, although it could return NULL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3115) Distinct Build-in Function Doesn't Handle Null Bags

2013-01-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3115:


Fix Version/s: 0.12

 Distinct Build-in Function Doesn't Handle Null Bags
 ---

 Key: PIG-3115
 URL: https://issues.apache.org/jira/browse/PIG-3115
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Nick White
Assignee: Nick White
 Fix For: 0.12

 Attachments: PIG-3115.1.patch


 Calling Distinct(NULL) throws NPEs - it should handle this more gracefully. 
 The attached patch makes Distinct(NULL) == {}, although it could return NULL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3115) Distinct Build-in Function Doesn't Handle Null Bags

2013-01-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3115:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Nick!

 Distinct Build-in Function Doesn't Handle Null Bags
 ---

 Key: PIG-3115
 URL: https://issues.apache.org/jira/browse/PIG-3115
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Nick White
Assignee: Nick White
 Attachments: PIG-3115.1.patch


 Calling Distinct(NULL) throws NPEs - it should handle this more gracefully. 
 The attached patch makes Distinct(NULL) == {}, although it could return NULL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit

2013-01-08 Thread Chetan Nadgire (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547657#comment-13547657
 ] 

Chetan Nadgire commented on PIG-3114:
-

Hi Daniel,

I am building pig.jar and pigunit.jar from branch-0.11. When I tried 
branch-0.10 getting different error:

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing. null
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1606)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1549)
at org.apache.pig.PigServer.registerQuery(PigServer.java:549)
at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:968)
at 
org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
at 
org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:53)
at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:251)
at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:232)
at junit.framework.TestSuite.run(TestSuite.java:227)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:79)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.NullPointerException
at java.io.File.init(File.java:222)
at 
org.apache.pig.parser.QueryParserUtils.getFileFromImportSearchPath(QueryParserUtils.java:205)
at 
org.apache.pig.parser.QueryParserDriver.getMacroFile(QueryParserDriver.java:352)
at 
org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:411)
at 
org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:270)
at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:171)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
... 29 more 

Thanks,
Chetan

 Duplicated macro name error when using pigunit
 --

 Key: PIG-3114
 URL: https://issues.apache.org/jira/browse/PIG-3114
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Chetan Nadgire

 I'm using PigUnit to test a pig script within which a macro is defined.
 Pig runs fine on cluster but getting parsing error with pigunit.
 So I tried very basic pig script with macro and getting similar error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1'
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
   at 
 org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
   at 
 org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
   at 

[jira] [Commented] (PIG-3113) Shell command execution hangs job

2013-01-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547669#comment-13547669
 ] 

Daniel Dai commented on PIG-3113:
-

From the post, seems we can process the input and output streams before 
calling waitFor?

 Shell command execution hangs job
 -

 Key: PIG-3113
 URL: https://issues.apache.org/jira/browse/PIG-3113
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.8.1
Reporter: James

 Executing a shell command inside a Pig script has the potential to deadlock 
 the job. For example, the following statement will block when somebigfile.txt 
 is sufficiently large:
 {code}
 %declare input `cat /path/to/somebigfile.txt`
 {code}
 This happens because PreprocessorContext.executeShellCommand(String) 
 incorrectly uses Runtime.exec().  The sub-process's stderr and stdout streams 
 should be read in a separate thread to prevent p.waitFor() from hanging when 
 the sub-process's output is larger than the output buffer.
 Per the Java Process class javadoc: Because some native platforms only 
 provide limited buffer size for standard input and output streams, failure to 
 promptly write the input stream or read the output stream of the subprocess 
 may cause the subprocess to block, and even deadlock.
 See http://www.javaworld.com/jw-12-2000/jw-1229-traps.html for a correct 
 solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-842) PigStorage should support multi-byte delimiters

2013-01-08 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-842:
---

Assignee: Jeff Markham

 PigStorage should support multi-byte delimiters
 ---

 Key: PIG-842
 URL: https://issues.apache.org/jira/browse/PIG-842
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
Assignee: Jeff Markham
 Attachments: PigMultiByteJsonMetadata.java, PigMultiByteStorage.java, 
 PigMultiByteTextOutputFormat.java


 Currently, PigStorage supports single byte delimiters. Users have requested 
 mult-byte delimiters. There are performance implications with multi-byte 
 delimiters. i.e., instead of looking for a single byte, PigStorage should 
 look for a pattern ala BinStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-842) PigStorage should support multi-byte delimiters

2013-01-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547679#comment-13547679
 ] 

Daniel Dai commented on PIG-842:


I prefer a single PigStorage implementation for clarity and code maintenance 
reason. For performance, we can do some optimization for single character case 
which could bring the performance of single character delimit near equal.

 PigStorage should support multi-byte delimiters
 ---

 Key: PIG-842
 URL: https://issues.apache.org/jira/browse/PIG-842
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
Assignee: Jeff Markham
 Attachments: PigMultiByteJsonMetadata.java, PigMultiByteStorage.java, 
 PigMultiByteTextOutputFormat.java


 Currently, PigStorage supports single byte delimiters. Users have requested 
 mult-byte delimiters. There are performance implications with multi-byte 
 delimiters. i.e., instead of looking for a single byte, PigStorage should 
 look for a pattern ala BinStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-842) PigStorage should support multi-byte delimiters

2013-01-08 Thread Jeff Markham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547680#comment-13547680
 ] 

Jeff Markham commented on PIG-842:
--

Agreed.  After doing something separate, there's too much copy/paste code.

 PigStorage should support multi-byte delimiters
 ---

 Key: PIG-842
 URL: https://issues.apache.org/jira/browse/PIG-842
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.3.0
Reporter: Santhosh Srinivasan
Assignee: Jeff Markham
 Attachments: PigMultiByteJsonMetadata.java, PigMultiByteStorage.java, 
 PigMultiByteTextOutputFormat.java


 Currently, PigStorage supports single byte delimiters. Users have requested 
 mult-byte delimiters. There are performance implications with multi-byte 
 delimiters. i.e., instead of looking for a single byte, PigStorage should 
 look for a pattern ala BinStorage.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3110) pig corrupts chararrays with trailing whitespace when converting them to long

2013-01-08 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547697#comment-13547697
 ] 

Daniel Dai commented on PIG-3110:
-

Here is the code introduce the issue:
{code}
try {
return Long.parseLong(str);
} catch (NumberFormatException e) {
try {
Double d = Double.valueOf(str);
// Need to check for an overflow error
if (d.doubleValue()  
mMaxLong.doubleValue() + 1.0) {
LogUtils.warn(CastUtils.class, 
Value  + d
+  too large 
for long,

PigWarning.TOO_LARGE_FOR_INT, mLog);
return null;
}
return Long.valueOf(d.longValue());
}
 }
{code}
Not sure why we still try double, seems adding more confusion. Shall we change?

 pig corrupts chararrays with trailing whitespace when converting them to long
 -

 Key: PIG-3110
 URL: https://issues.apache.org/jira/browse/PIG-3110
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.10.0
Reporter: Ido Hadanny

 when trying to convert the following string into long, pig corrupts it. data:
 1703598819951657279 ,44081037
 data1 = load 'data' using CSVLoader as (a: chararray ,b: int);
 data2 = foreach data1 generate (long)a as a;
 dump data2;
 (1703598819951657216)--- last 2 digits are corrupted
 data2 = foreach data1 generate (long)TRIM(a) as a;
 dump data2;
 (1703598819951657279)--- correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira