[jira] [Commented] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall
[ https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543726#comment-13543726 ] Gianmarco De Francisci Morales commented on PIG-2362: - Hi Cheolsoo, Good catch, thanks. I wasn't familiar with PIG-2748. I verified that {{ant mvn-jar}} passes. Also ran the {{eclipse-files}}, {{src-release}}, {{tar-release}} targets and verified their output. +1 to the last patch Rework Ant build.xml to use macrodef instead of antcall --- Key: PIG-2362 URL: https://issues.apache.org/jira/browse/PIG-2362 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Attachments: PIG-2362.10.patch, PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, PIG-2362.8.patch, PIG-2362.9.patch, PIG-2362.9.patch.nowhitespace Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/ We'd better use macrodef and let Ant build a clean dependency graph. http://ant.apache.org/manual/Tasks/macrodef.html Right now we do like this: {code} target name=buildAllJars antcall target=buildJar param name=build.dir value=jar-A/ /antcall antcall target=buildJar param name=build.dir value=jar-B/ /antcall antcall target=buildJar param name=build.dir value=jar-C/ /antcall /target target name=buildJar jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /target {code} But it would be better if we did like this: {code} target name=buildAllJars buildJar build.dir=jar-A/ buildJar build.dir=jar-B/ buildJar build.dir=jar-C/ /target macrodef name=buildJar attribute name=build.dir/ jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /macrodef {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2433) Jython import module not working if module path is in classpath
[ https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543730#comment-13543730 ] Cheolsoo Park commented on PIG-2433: Hi Rohini, After applying the patch to trunk, I see the following error in TestScriptUDF.testPythonNestedImportClassPath: {code} Testcase: testPythonNestedImportClassPath took 0.182 sec Caused an ERROR Python Error. Traceback (most recent call last): File /home/cheolsoo/workspace/pig-svn/scriptB.py, line 2, in module import scriptA File __pyclasspath__/scriptA.py, line 3, in module NameError: name 'outputSchema' is not defined {code} Does this test pass for you? Jython import module not working if module path is in classpath --- Key: PIG-2433 URL: https://issues.apache.org/jira/browse/PIG-2433 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Daniel Dai Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-2433.patch This is a hole of PIG-1824. If the path of python module is in classpath, job die with the message could not instantiate 'org.apache.pig.scripting.jython.JythonFunction'. Here is my observation: If the path of python module is in classpath, fileEntry we got in JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the script itself. Thus we cannot locate the script and skip the script in job.xml. For example: {code} register 'scriptB.py' using org.apache.pig.scripting.jython.JythonScriptEngine as pig A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long); B = foreach A generate pig.square(a0); dump B; scriptB.py: #!/usr/bin/python import scriptA @outputSchema(x:{t:(num:double)}) def sqrt(number): return (number ** .5) @outputSchema(x:{t:(num:long)}) def square(number): return long(scriptA.square(number)) scriptA.py: #!/usr/bin/python def square(number): return (number * number) {code} When we register scriptB.py, we use jython library to figure out the dependent modules scriptB relies on, in this case, scriptA. However, if current directory is in classpath, instead of scriptA.py, we get __pyclasspath__/scriptA.class. Then we try to put __pyclasspath__/script$py.class into job.jar, Pig complains __pyclasspath__/script$py.class does not exist. This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 20.x, the test still success because MiniCluster will take local classpath so it can still find scriptA.py even if it is not in job.jar. However, the script will fail in real cluster and MiniMRYarnCluster of hadoop 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall
[ https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park resolved PIG-2362. Resolution: Fixed Committed to trunk. Thanks Gianmarco! Rework Ant build.xml to use macrodef instead of antcall --- Key: PIG-2362 URL: https://issues.apache.org/jira/browse/PIG-2362 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Attachments: PIG-2362.10.patch, PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, PIG-2362.8.patch, PIG-2362.9.patch, PIG-2362.9.patch.nowhitespace Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/ We'd better use macrodef and let Ant build a clean dependency graph. http://ant.apache.org/manual/Tasks/macrodef.html Right now we do like this: {code} target name=buildAllJars antcall target=buildJar param name=build.dir value=jar-A/ /antcall antcall target=buildJar param name=build.dir value=jar-B/ /antcall antcall target=buildJar param name=build.dir value=jar-C/ /antcall /target target name=buildJar jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /target {code} But it would be better if we did like this: {code} target name=buildAllJars buildJar build.dir=jar-A/ buildJar build.dir=jar-B/ buildJar build.dir=jar-C/ /target macrodef name=buildJar attribute name=build.dir/ jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /macrodef {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [VOTE] Release Pig 0.10.1 (candidate 3)
+1. Tested against hadoop 1.0.x and 2.0.x clusters. On Fri, Jan 4, 2013 at 12:14 AM, Jarek Jarcec Cecho jar...@apache.orgwrote: +1 (non-binding) * Verified checksum * Verified signatures * Tests seems to be passing * Checked top level files (NOTICE, LICENSE) Note: I personally prefer when all third party jars are explicitly mentioned in LICENSE file as is recommend in [1] (and used for example in [2], [3], [4]), but that is not required. Jarcec Links: 1: http://incubator.apache.org/guides/releasemanagement.html#best-practice-license 2: http://incubator.apache.org/guides/examples/LICENSE 3: https://git-wip-us.apache.org/repos/asf?p=sqoop.git;a=blob;f=LICENSE.txt;h=00b52964892971fea9280a6201b7efe11a2527e5;hb=sqoop2 4: https://git-wip-us.apache.org/repos/asf?p=flume.git;a=blob;f=LICENSE;h=04c1baf1e0aedfca63f81cff2d64593d8ab55f09;hb=58173b8983027124a61783b4326dee3347ab7552 On Thu, Jan 03, 2013 at 06:16:33PM -0800, Thejas Nair wrote: +1 Verified md5 checksums of src and binary tar.gz . Build the src tar.gz and ran queries against a hadoop 1.1 cluster, ran fs and sh commands. -Thejas On 1/3/13 12:11 PM, Rohini Palaniswamy wrote: +1. Downloaded the tar binary, checked signature, ran unit tests, piggybank unit tests, checked docs/release notes, ran a simple script locally and against a cluster. On Mon, Dec 31, 2012 at 8:41 AM, Alan Gates ga...@hortonworks.com wrote: +1, yet again :). Checked the key signature and checksum on the source package. Built and ran commit unit tests on src, ran a test job in local mode. Downloaded the tar binary and ran a job in local and cluster mode. Alan. On Dec 28, 2012, at 11:50 PM, Daniel Dai wrote: Hi, I have created a candidate build for Pig 0.10.1. This is a maintenance release of Pig 0.10. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup Please download, test, and try it out: http://people.apache.org/~daijy/pig-0.10.1-candidate-3/ Should we release this? Vote closes on EOD next Friday, Jan 4th. Thanks, Daniel
[jira] [Commented] (PIG-3108) HBaseStorage returns empty maps when mixing wildcard- with other columns
[ https://issues.apache.org/jira/browse/PIG-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543751#comment-13543751 ] Christoph Bauer commented on PIG-3108: -- Sorry. I've been with this code too long. I will try to explain. addFiltersWithColumnPrefix and addFiltersWithoutColumnPrefix actually do different things: - addFiltersWithColumnPrefix creates HBase scan filters - addFiltersWithoutColumnPrefix tells the scan object the families and columns to retrieve. This is much quicker than adding filters, thats why it was changed. The thing is: the scan object should always be limited to the family/columns needed to speed things up. In fact we did this already - see setLocation (it's basicly the same as in addFiltersWithoutColumnPrefix). So what I did was to replace the code in setLocation with a call to addFiltersWithoutColumnPrefix To make things clear, we could remove addFiltersWithoutColumnPrefix from the if/else in initScan() ()setLocation will called anyway) and rename it to setScanColumns or something. 2013/1/3 Bill Graham (JIRA) j...@apache.org HBaseStorage returns empty maps when mixing wildcard- with other columns Key: PIG-3108 URL: https://issues.apache.org/jira/browse/PIG-3108 Project: Pig Issue Type: Bug Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0, 0.11, 0.10.1, 0.12 Reporter: Christoph Bauer Fix For: 0.12 Attachments: PIG-3108.patch Consider the following: A and B should be the same (with different order, of course). {code} /* in hbase shell: create 'pigtest', 'pig' put 'pigtest' , '1', 'pig:name', 'A' put 'pigtest' , '1', 'pig:has_legs', 'true' put 'pigtest' , '1', 'pig:has_ribs', 'true' */ A = LOAD 'hbase://pigtest' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:name pig:has*') AS (name:chararray,parts); B = LOAD 'hbase://pigtest' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('pig:has* pig:name') AS (parts,name:chararray); dump A; dump B; {code} This is due to a bug in setLocation and initScan. For _A_ # scan.addColumn(pig,name); // for 'pig:name' # scan.addFamily(pig); // for the 'pig:has*' So that's silently right. But for _B_ # scan.addFamily(pig) # scan.addColumn(pig,name) will override the first call to addFamily, because you cannot mix them on the same family. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Pig-trunk #1384
See https://builds.apache.org/job/Pig-trunk/1384/changes Changes: [cheolsoo] PIG-2362: Rework Ant build.xml to use macrodef instead of antcall (azaroth via cheolsoo) -- [...truncated 6491 lines...] [findbugs] jline.History [findbugs] org.jruby.embed.internal.LocalContextProvider [findbugs] org.apache.hadoop.io.BooleanWritable [findbugs] org.apache.log4j.Logger [findbugs] org.apache.hadoop.hbase.filter.FamilyFilter [findbugs] org.codehaus.jackson.annotate.JsonPropertyOrder [findbugs] groovy.lang.Tuple [findbugs] org.antlr.runtime.IntStream [findbugs] org.apache.hadoop.util.ReflectionUtils [findbugs] org.apache.hadoop.fs.ContentSummary [findbugs] org.jruby.runtime.builtin.IRubyObject [findbugs] org.jruby.RubyInteger [findbugs] org.python.core.PyTuple [findbugs] org.mortbay.log.Log [findbugs] org.apache.hadoop.conf.Configuration [findbugs] com.google.common.base.Joiner [findbugs] org.apache.hadoop.mapreduce.lib.input.FileSplit [findbugs] org.apache.hadoop.mapred.Counters$Counter [findbugs] com.jcraft.jsch.Channel [findbugs] org.apache.hadoop.mapred.JobPriority [findbugs] org.apache.commons.cli.Options [findbugs] org.apache.hadoop.mapred.JobID [findbugs] org.apache.hadoop.util.bloom.BloomFilter [findbugs] org.python.core.PyFrame [findbugs] org.apache.hadoop.hbase.filter.CompareFilter [findbugs] org.apache.hadoop.util.VersionInfo [findbugs] org.python.core.PyString [findbugs] org.apache.hadoop.io.Text$Comparator [findbugs] org.jruby.runtime.Block [findbugs] org.antlr.runtime.MismatchedSetException [findbugs] org.apache.hadoop.io.BytesWritable [findbugs] org.apache.hadoop.fs.FsShell [findbugs] org.joda.time.Months [findbugs] org.mozilla.javascript.ImporterTopLevel [findbugs] org.apache.hadoop.hbase.mapreduce.TableOutputFormat [findbugs] org.apache.hadoop.mapred.TaskReport [findbugs] org.apache.hadoop.security.UserGroupInformation [findbugs] org.antlr.runtime.tree.RewriteRuleSubtreeStream [findbugs] org.apache.commons.cli.HelpFormatter [findbugs] com.google.common.collect.Maps [findbugs] org.joda.time.ReadableInstant [findbugs] org.mozilla.javascript.NativeObject [findbugs] org.apache.hadoop.hbase.HConstants [findbugs] org.apache.hadoop.io.serializer.Deserializer [findbugs] org.antlr.runtime.FailedPredicateException [findbugs] org.apache.hadoop.io.compress.CompressionCodec [findbugs] org.jruby.RubyNil [findbugs] org.apache.hadoop.fs.FileStatus [findbugs] org.apache.hadoop.hbase.client.Result [findbugs] org.apache.hadoop.mapreduce.JobContext [findbugs] org.codehaus.jackson.JsonGenerator [findbugs] org.apache.hadoop.mapreduce.TaskAttemptContext [findbugs] org.apache.hadoop.io.LongWritable$Comparator [findbugs] org.codehaus.jackson.map.util.LRUMap [findbugs] org.apache.hadoop.hbase.util.Bytes [findbugs] org.antlr.runtime.MismatchedTokenException [findbugs] org.codehaus.jackson.JsonParser [findbugs] com.jcraft.jsch.UserInfo [findbugs] org.apache.hadoop.hbase.filter.WhileMatchFilter [findbugs] org.python.core.PyException [findbugs] org.apache.commons.cli.ParseException [findbugs] org.apache.hadoop.io.compress.CompressionOutputStream [findbugs] org.apache.hadoop.hbase.filter.WritableByteArrayComparable [findbugs] org.antlr.runtime.tree.CommonTreeNodeStream [findbugs] org.apache.log4j.Level [findbugs] org.apache.hadoop.hbase.client.Scan [findbugs] org.jruby.anno.JRubyMethod [findbugs] org.apache.hadoop.mapreduce.Job [findbugs] com.google.common.util.concurrent.Futures [findbugs] org.apache.commons.logging.LogFactory [findbugs] org.apache.commons.collections.IteratorUtils [findbugs] org.apache.commons.codec.binary.Base64 [findbugs] org.codehaus.jackson.map.ObjectMapper [findbugs] org.apache.hadoop.fs.FileSystem [findbugs] org.jruby.embed.LocalContextScope [findbugs] org.apache.hadoop.hbase.filter.FilterList$Operator [findbugs] org.jruby.RubySymbol [findbugs] org.codehaus.jackson.map.annotate.JacksonStdImpl [findbugs] org.apache.hadoop.hbase.io.ImmutableBytesWritable [findbugs] org.apache.hadoop.io.serializer.SerializationFactory [findbugs] org.antlr.runtime.tree.TreeAdaptor [findbugs] org.apache.hadoop.mapred.RunningJob [findbugs] org.antlr.runtime.CommonTokenStream [findbugs] org.apache.hadoop.io.DataInputBuffer [findbugs] org.apache.hadoop.io.file.tfile.TFile [findbugs] org.apache.commons.cli.GnuParser [findbugs] org.mozilla.javascript.Context [findbugs] org.apache.hadoop.io.FloatWritable [findbugs] org.antlr.runtime.tree.RewriteEarlyExitException [findbugs] org.apache.hadoop.hbase.HBaseConfiguration [findbugs] org.codehaus.jackson.JsonGenerationException [findbugs] org.apache.hadoop.mapreduce.TaskInputOutputContext [findbugs] org.apache.hadoop.io.compress.GzipCodec [findbugs] org.jruby.RubyString
[jira] [Updated] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Adler updated PIG-3015: -- Attachment: PIG-3015-5.patch Added fixes for compression (and other metadata) Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544116#comment-13544116 ] Cheolsoo Park commented on PIG-3015: Hi Joe, I think you forgot to add new files to the patch. Do you mind uploading the patch again? :-) Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, PIG-3015-5.patch, PIG-3015.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Adler updated PIG-3015: -- Attachment: (was: PIG-3015.patch) Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Adler updated PIG-3015: -- Attachment: (was: PIG-3015-5.patch) Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Adler updated PIG-3015: -- Attachment: PIG-3015-5.patch Oops, this one contains the changes. Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, PIG-3015-5.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request: PIG-3015 Rewrite of AvroStorage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ --- (Updated Jan. 4, 2013, 7:22 p.m.) Review request for pig and Cheolsoo Park. Changes --- Fixes to make compression work Description --- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs (updated) - .eclipse.templates/.classpath c7b83b8 ivy.xml 70e8d50 ivy/libraries.properties 7b07c7e src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION
Re: Review Request: PIG-3015 Rewrite of AvroStorage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8104/ --- (Updated Jan. 4, 2013, 7:22 p.m.) Review request for pig and Cheolsoo Park. Description --- The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. This is the latest version of the patch, complete with test cases and TrevniStorage. (Test cases for TrevniStorage are still missing). This addresses bug PIG-3015. https://issues.apache.org/jira/browse/PIG-3015 Diffs - .eclipse.templates/.classpath c7b83b8 ivy.xml 70e8d50 ivy/libraries.properties 7b07c7e src/org/apache/pig/builtin/AvroStorage.java PRE-CREATION src/org/apache/pig/builtin/TrevniStorage.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroArrayReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroBagWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroMapWrapper.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordReader.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroRecordWriter.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageDataConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroStorageSchemaConversionUtilities.java PRE-CREATION src/org/apache/pig/impl/util/avro/AvroTupleWrapper.java PRE-CREATION test/commit-tests 5081fbc test/org/apache/pig/builtin/TestAvroStorage.java PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/directory_test.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ai1_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_blank_first_args.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_codec.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/identity_just_ao2.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/namesWithDoubleColons.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/recursive_tests.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_avro.pig PRE-CREATION test/org/apache/pig/builtin/avro/code/pig/trevni_to_trevni.pig PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/arraysAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordWithRepeatedSubRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/records.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsAsOutputByPig.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArrays.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsOfArraysOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchema.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsSubSchemaNullable.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithDoubleUnderscores.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithEnums.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithFixed.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMaps.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithMapsOfRecords.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recordsWithNullableUnions.json PRE-CREATION test/org/apache/pig/builtin/avro/data/json/recursiveRecord.json PRE-CREATION test/org/apache/pig/builtin/avro/schema/arrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/arraysAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordWithRepeatedSubRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/records.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsAsOutputByPig.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArrays.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsOfArraysOfRecords.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchema.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsSubSchemaNullable.avsc PRE-CREATION test/org/apache/pig/builtin/avro/schema/recordsWithDoubleUnderscores.avsc PRE-CREATION
[jira] [Commented] (PIG-3059) Global configurable minimum 'bad record' thresholds
[ https://issues.apache.org/jira/browse/PIG-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544219#comment-13544219 ] Russell Jurney commented on PIG-3059: - Regarding Avro, in reading https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader.java - it looks like you can still sync to the next record under most bad reads. We should do so. You're right about a bad sync halting things, but in the case of a bad sync - you might try advancing by some amount using seek() and then sync'ing again? I think this would work. I could be wrong, but in looking how seeks work - I think that would be ok. Kinda neat, maybe? Worst case, we would only throw out inputsplits on a bad sync(), not a bad read(). length() should help, as might pastSync(), skip() and available() I agree with Dmitriy's feedback, thanks for taking the time. Global configurable minimum 'bad record' thresholds --- Key: PIG-3059 URL: https://issues.apache.org/jira/browse/PIG-3059 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.11 Reporter: Russell Jurney Assignee: Cheolsoo Park Fix For: 0.12 Attachments: avro_test_files-2.tar.gz, PIG-3059-2.patch, PIG-3059.patch See PIG-2614. Pig dies when one record in a LOAD of a billion records fails to parse. This is almost certainly not the desired behavior. elephant-bird and some other storage UDFs have minimum thresholds in terms of percent and count that must be exceeded before a job will fail outright. We need these limits to be configurable for Pig, globally. I've come to realize what a major problem Pig's crashing on bad records is for new Pig users. I believe this feature can greatly improve Pig. An example of a config would look like: pig.storage.bad.record.threshold=0.01 pig.storage.bad.record.min=100 A thorough discussion of this issue is available here: http://www.quora.com/Big-Data/In-Big-Data-ETL-how-many-records-are-an-acceptable-loss -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544222#comment-13544222 ] Russell Jurney commented on PIG-3015: - Joe, some comments on handling errors in PIG-3059: Regarding Avro, in reading https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/file/DataFileReader.java - it looks like you can still sync to the next record under most bad reads. We should do so. You're right about a bad sync halting things, but in the case of a bad sync - you might try advancing by some amount using seek() and then sync'ing again? I think this would work. I could be wrong, but in looking how seeks work - I think that would be ok. Kinda neat, maybe? Worst case, we would only throw out inputsplits on a bad sync(), not a bad read(). length() should help, as might pastSync(), skip() and available() Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, PIG-3015-5.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544248#comment-13544248 ] Joseph Adler commented on PIG-3015: --- Hi Russ, I think you're right... it looks like you could do something like this in AvroRecordReader.nextKeyValue: {code} @Override public boolean nextKeyValue() throws IOException, InterruptedException { if (reader.pastSync(end)) { return false; } try { currentRecord = reader.next(new GenericData.Record(schema)); } catch (NoSuchElementException e) { return false; } catch (IOException ioe) { reader.sync(reader.tell()+1); throw ioe; } return true; } {code} Let me test this out to make sure it runs correctly on uncorrupted files. Would you mind creating a corrupted test file that I can use for testing? Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, PIG-3015-5.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni (as TrevniStorage). I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2433) Jython import module not working if module path is in classpath
[ https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544383#comment-13544383 ] Rohini Palaniswamy commented on PIG-2433: - ant clean test -Dtestcase=TestScriptUDF passes for me. Jython import module not working if module path is in classpath --- Key: PIG-2433 URL: https://issues.apache.org/jira/browse/PIG-2433 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Daniel Dai Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-2433.patch This is a hole of PIG-1824. If the path of python module is in classpath, job die with the message could not instantiate 'org.apache.pig.scripting.jython.JythonFunction'. Here is my observation: If the path of python module is in classpath, fileEntry we got in JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the script itself. Thus we cannot locate the script and skip the script in job.xml. For example: {code} register 'scriptB.py' using org.apache.pig.scripting.jython.JythonScriptEngine as pig A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long); B = foreach A generate pig.square(a0); dump B; scriptB.py: #!/usr/bin/python import scriptA @outputSchema(x:{t:(num:double)}) def sqrt(number): return (number ** .5) @outputSchema(x:{t:(num:long)}) def square(number): return long(scriptA.square(number)) scriptA.py: #!/usr/bin/python def square(number): return (number * number) {code} When we register scriptB.py, we use jython library to figure out the dependent modules scriptB relies on, in this case, scriptA. However, if current directory is in classpath, instead of scriptA.py, we get __pyclasspath__/scriptA.class. Then we try to put __pyclasspath__/script$py.class into job.jar, Pig complains __pyclasspath__/script$py.class does not exist. This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 20.x, the test still success because MiniCluster will take local classpath so it can still find scriptA.py even if it is not in job.jar. However, the script will fail in real cluster and MiniMRYarnCluster of hadoop 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2433) Jython import module not working if module path is in classpath
[ https://issues.apache.org/jira/browse/PIG-2433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13544484#comment-13544484 ] Rohini Palaniswamy commented on PIG-2433: - One cause for this error could be that your python cache dir is not writable and so the pig jar was not processed. Try running with -Dpython.cachedir=/dir with write perms if that is the case. Or are you running from eclipse? Jython import module not working if module path is in classpath --- Key: PIG-2433 URL: https://issues.apache.org/jira/browse/PIG-2433 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Daniel Dai Assignee: Rohini Palaniswamy Fix For: 0.12 Attachments: PIG-2433.patch This is a hole of PIG-1824. If the path of python module is in classpath, job die with the message could not instantiate 'org.apache.pig.scripting.jython.JythonFunction'. Here is my observation: If the path of python module is in classpath, fileEntry we got in JythonScriptEngine:236 is __pyclasspath__/script$py.class instead of the script itself. Thus we cannot locate the script and skip the script in job.xml. For example: {code} register 'scriptB.py' using org.apache.pig.scripting.jython.JythonScriptEngine as pig A = LOAD 'table_testPythonNestedImport' as (a0:long, a1:long); B = foreach A generate pig.square(a0); dump B; scriptB.py: #!/usr/bin/python import scriptA @outputSchema(x:{t:(num:double)}) def sqrt(number): return (number ** .5) @outputSchema(x:{t:(num:long)}) def square(number): return long(scriptA.square(number)) scriptA.py: #!/usr/bin/python def square(number): return (number * number) {code} When we register scriptB.py, we use jython library to figure out the dependent modules scriptB relies on, in this case, scriptA. However, if current directory is in classpath, instead of scriptA.py, we get __pyclasspath__/scriptA.class. Then we try to put __pyclasspath__/script$py.class into job.jar, Pig complains __pyclasspath__/script$py.class does not exist. This is exactly TestScriptUDF.testPythonNestedImport is doing. In hadoop 20.x, the test still success because MiniCluster will take local classpath so it can still find scriptA.py even if it is not in job.jar. However, the script will fail in real cluster and MiniMRYarnCluster of hadoop 23. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (36 issues) Subscriber: pigdaily Key Summary PIG-3108HBaseStorage returns empty maps when mixing wildcard- with other columns https://issues.apache.org/jira/browse/PIG-3108 PIG-3105Fix TestJobSubmission unit test failure. https://issues.apache.org/jira/browse/PIG-3105 PIG-3098Add another test for the self join case https://issues.apache.org/jira/browse/PIG-3098 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3086Allow A Prefix To Be Added To URIs In PigUnit Tests https://issues.apache.org/jira/browse/PIG-3086 PIG-3078Make a UDF that, given a string, returns just the columns prefixed by that string https://issues.apache.org/jira/browse/PIG-3078 PIG-3073POUserFunc creating log spam for large scripts https://issues.apache.org/jira/browse/PIG-3073 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3067HBaseStorage should be split up to become more managable https://issues.apache.org/jira/browse/PIG-3067 PIG-3057make readField protected to be able to override it if we extend PigStorage https://issues.apache.org/jira/browse/PIG-3057 PIG-3029TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution https://issues.apache.org/jira/browse/PIG-3029 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2878Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This check is case insensitive. https://issues.apache.org/jira/browse/PIG-2878 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2788improved string interpolation of variables https://issues.apache.org/jira/browse/PIG-2788 PIG-2769a simple logic causes very long compiling time on pig 0.10.0 https://issues.apache.org/jira/browse/PIG-2769 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2645PigSplit does not handle the case where SerializationFactory returns null https://issues.apache.org/jira/browse/PIG-2645 PIG-2507Semicolon in paramenters for UDF results in parsing error https://issues.apache.org/jira/browse/PIG-2507 PIG-2433Jython import module not working if module path is in classpath https://issues.apache.org/jira/browse/PIG-2433 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1237Piggybank MutliStorage - specify field to write in output