[jira] Commented: (PIG-915) Load row names in HBase loader
[ https://issues.apache.org/jira/browse/PIG-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849230#action_12849230 ] Jeff Zhang commented on PIG-915: Olga, sorry for reply you late. This feature has been included in Pig-1205, so I think it is no need to track this jira item. Load row names in HBase loader -- Key: PIG-915 URL: https://issues.apache.org/jira/browse/PIG-915 Project: Pig Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Alex Newman Assignee: Jeff Zhang Priority: Minor Fix For: 0.8.0 Attachments: Pig_915.Patch Currently their is no way to get the Row names when doing a query from HBase, we should probably remedy this as important data may be stored there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849279#action_12849279 ] David Ciemiewicz commented on PIG-282: -- How will the custom partitioner be used in Pig? Is this for map partitioning and/or output partitioning? For instance, I'd love to have something that created separate directories based on the value of some key. Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Reporter: Amir Youssefi Priority: Minor By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-282) Custom Partitioner
[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849280#action_12849280 ] Alan Gates commented on PIG-282: This JIRA refers to map-reduce partitioning. Output partitioning of spraying to directories based on a key can be done now via a custom store function. Custom Partitioner -- Key: PIG-282 URL: https://issues.apache.org/jira/browse/PIG-282 Project: Pig Issue Type: New Feature Reporter: Amir Youssefi Priority: Minor By adding custom partitioner we can give control over which output partition a key (/value) goes to. We can add keywords to language e.g. PARTITION BY UDF(...) or a similar syntax. UDF returns a number between 0 and n-1 where n is number of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Attachment: PIG-1316.patch Attached patch implements the change to cache the results of LoadMetadata.getSchema for use in future calls. LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Status: Open (was: Patch Available) Attached wrong patch file LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Status: Patch Available (was: Open) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Attachment: (was: PIG-1316.patch) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Status: Patch Available (was: Open) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1317.patch In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1317: Attachment: PIG-1317.patch Attached correct patch file now. LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1317.patch In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files
[ https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1316: Attachment: PIG-1316.patch Attached patch makes the required changes in TextLoader to use BZip2TextInputFormat if the load location ends with extension .bz or .bz2 like PigStorage. Also for non bzip data, TextLoader will now use PigTextInputFormat rather than TextInputFormat so that input directories can be recursively traversed. I have also changed BZip2TextInputFormat to extend PigFileInputFormat instead of FileInputFormat for the same reason. TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files Key: PIG-1316 URL: https://issues.apache.org/jira/browse/PIG-1316 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1316.patch Currently TextLoader uses TextInputFormat which does not split bzip files - this can be fixed by using Bzip2TextInputformat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849434#action_12849434 ] Alan Gates commented on PIG-1310: - Since Pig plans to support SQL soon and since many Pig Latin users are familiar with SQL, we'd like to pick a datetime format that will work well with SQL. While ISO 8601 is not the same as SQL's datetime format, it looks to me like translation between the two will be reasonably easy. DateMonthToISO lacks javadoc comments, making it hard to know what it does or how to use it. The comments in front of other functions like ISOToUnix should be turned into javadoc (just change /* to /** and add an introductory sentence) so users can read them without needing to open the code itself. It might be helpful in your javadocs to provide links to jodatime and somewhere that gives a good intro to ISO8601 date formats so users can figure out things like: What does that Z at the end of the datetime string mean? In UnixToISO your example shows an input of long, but the code assumes it's a string and parses the string into a long. It should probably be the former, but whichever way you decide to do it the code and the comments should match. In CustomFormatToISO the comments only show one input (the datetime string), but the code assumes two inputs, the datetime string and the format. The comments should reflect this as well as give users an indication of how to construct the format string in a way that jodatime will understand it (or perhaps just link to somewhere in jodatime that it explains this). All throughout, if there is an error in parsing the date the code depends on jodatime to throw an exception with a meaningful error message. Have you tested that these error messages are reasonably helpful to users? For now, in piggybank, this is ok. If these eventually move into Pig proper these errors will need to be caught and Pig numbered error messages (which may just print the jodatime error message with a notification of which function it came from) will need to be added. In ISOToX methods, the comments refer to rounding values. But the code isn't rounding, it's truncating. ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime.patch, datetime2.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849453#action_12849453 ] Alan Gates commented on PIG-1313: - I'm not sure I understand all the pros and cons of Daniel's suggestion of moving the variables to PigServer versus Bill's suggestion of making them ThreadLocal. The advantages I can see of moving the values to PigServer are: # It's clearer to other developers what's going on, since they can see that these values are associated with an instance of PigServer. Otherwise we're constructing a hidden dependency between the lifetime of PigServer and the thread it's running in. # If at some future point Pig's frontend is multi-threaded this will still work (granted this is unlikely or at least far in the future) The advantage I see with Bill's proposal is it's less change. Are there other things I'm missing here? PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Reporter: Bill Graham Attachments: Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1309) Map-side Cogroup
[ https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1309: -- Attachment: pig-1309.patch Did offline review with Alan. Found a subtle bug in POMergeCogroup#getNext(). Fixed that and added more tests. Still need to tidy up things at few places. Looking for suggestion for better test cases that cover all the edge cases. Map-side Cogroup Key: PIG-1309 URL: https://issues.apache.org/jira/browse/PIG-1309 Project: Pig Issue Type: Bug Components: impl Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: mapsideCogrp.patch, pig-1309.patch In never ending quest to make Pig go faster, we want to parallelize as many relational operations as possible. Its already possible to do Group-by( PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira is to add map-side implementation of Cogroup in Pig. Details to follow. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
TypeCheckingVisitor and casting to less precise numeric types
Hi, I know that Pig has logic for casting inputs to the expected data types when invoking a UDF and I understand that this logic resides in the TypeCheckingVisitor class. I am curious to know why certain casts have been omitted from the castLookup map. Specifically, I do not see any entries for casting a more precise numeric type (e.g. Double) to a less precise numeric type (e.g. Integer). Any reason why all down conversions of numeric types have been omitted? Is it because we do not want to perform any automatic casts that lead to a loss of precision (loss of data)? In my situation, we are trying to abstract all numeric data types into a single number type. If a UDF takes a numeric parameter, we want Pig to invoke that UDF with any numeric argument, regardless of whether the argument must be upconverted or downconverted. We are OK with the loss of precision in that circumstance. As a result, we added the following to the castLookup map: castLookup.put(DataType.LONG, DataType.INTEGER); castLookup.put(DataType.FLOAT, DataType.LONG); castLookup.put(DataType.FLOAT, DataType.INTEGER); castLookup.put(DataType.DOUBLE, DataType.FLOAT); castLookup.put(DataType.DOUBLE, DataType.LONG); castLookup.put(DataType.DOUBLE, DataType.INTEGER); All of these casts seem to work fine our tests. Other than loss of precision, is there any reason why adding these casts might be a bad idea? Thanks, -Anil
[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
[ https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1315: - Attachment: zebra.0324 [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader Key: PIG-1315 URL: https://issues.apache.org/jira/browse/PIG-1315 Project: Pig Issue Type: New Feature Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: zebra.0324 OrderedLoadFunc interface is used by Pig to do merge join and mapside cogrouping. For Zebra, implementing this interface is necessary to support mapside cogrouping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
[ https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1315: - Fix Version/s: (was: 0.7.0) 0.8.0 Status: Patch Available (was: Open) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader Key: PIG-1315 URL: https://issues.apache.org/jira/browse/PIG-1315 Project: Pig Issue Type: New Feature Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.8.0 Attachments: zebra.0324 OrderedLoadFunc interface is used by Pig to do merge join and mapside cogrouping. For Zebra, implementing this interface is necessary to support mapside cogrouping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1268) [Zebra] Need an ant target that runs all pig-related tests in Zebra
[ https://issues.apache.org/jira/browse/PIG-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1268: Fix Version/s: 0.7.0 [Zebra] Need an ant target that runs all pig-related tests in Zebra --- Key: PIG-1268 URL: https://issues.apache.org/jira/browse/PIG-1268 Project: Pig Issue Type: Test Components: build Reporter: Xuefu Zhang Assignee: Xuefu Zhang Priority: Minor Fix For: 0.7.0 Attachments: zebra.0303 Currently Pig checkins don't run any Zebra test to make sure that Zebra is not broken. To make this happen, Zebra build needs a test target that only run pig-related tests. With this, Pig committers need to do ant pig for Zebra as part of the before-checkin sanity check. Ideally, this target should be triggered as part of Hudson. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1214) Pig/Zebra 0.6 patch - docs
[ https://issues.apache.org/jira/browse/PIG-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1214: Fix Version/s: 0.6.0 Pig/Zebra 0.6 patch - docs -- Key: PIG-1214 URL: https://issues.apache.org/jira/browse/PIG-1214 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: pig-1214-branch-0-6.patch, pig-1214-trunk.patch, pig-1214.patch Pig Docs piglatin_ref2.xml - Update PigStorage function to include information about '/r' delimiter Zebra Docs zebra_pig.xml - Add new section, Sorting Data: Zebra only supports tables sorted in ascending (ASC) order; tables sorted in descending (DESC) order are treated as unsorted tables -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1148) Move splitable logic from pig latin to InputFormat
[ https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1148: Fix Version/s: 0.7.0 Move splitable logic from pig latin to InputFormat -- Key: PIG-1148 URL: https://issues.apache.org/jira/browse/PIG-1148 Project: Pig Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: PIG-1148.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface
[ https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1088: Fix Version/s: 0.7.0 change merge join and merge join indexer to work with new LoadFunc interface Key: PIG-1088 URL: https://issues.apache.org/jira/browse/PIG-1088 Project: Pig Issue Type: Sub-task Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.7.0 Attachments: PIG-1088.1.patch, PIG-1088.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1115: Fix Version/s: 0.7.0 [zebra] temp files are not cleaned. --- Key: PIG-1115 URL: https://issues.apache.org/jira/browse/PIG-1115 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Hong Tang Assignee: Gaurav Jain Fix For: 0.7.0 Attachments: PIG-1115.patch Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1141) Make streaming work with the new load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1141: Fix Version/s: 0.7.0 Make streaming work with the new load-store interfaces --- Key: PIG-1141 URL: https://issues.apache.org/jira/browse/PIG-1141 Project: Pig Issue Type: Sub-task Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1141.patch, PIG-1141.patch, PIG-1141.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal
[ https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1110: Fix Version/s: 0.7.0 Handle compressed file formats -- Gz, BZip with the new proposal Key: PIG-1110 URL: https://issues.apache.org/jira/browse/PIG-1110 Project: Pig Issue Type: Sub-task Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1110.patch, PIG-1110.patch, PIG_1110_Jeff.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1059) FINDBUGS: remaining Bad practice + Multithreaded correctness Warning
[ https://issues.apache.org/jira/browse/PIG-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1059: Fix Version/s: 0.6.0 FINDBUGS: remaining Bad practice + Multithreaded correctness Warning Key: PIG-1059 URL: https://issues.apache.org/jira/browse/PIG-1059 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-1059.patch ISInconsistent synchronization of org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.hodConfDir; locked 66% of time ISInconsistent synchronization of org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.hodProcess; locked 80% of time ISInconsistent synchronization of org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.remoteHodConfDir; locked 88% of time ISInconsistent synchronization of org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.initialized; locked 50% of time UG org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.getAggregate() is unsynchronized, org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.setAggregate(boolean) is synchronized UG org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.getReporter() is unsynchronized, org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.setReporter(Reporter) is synchronized BCEquals method for org.apache.pig.builtin.PigStorage assumes the argument is of type PigStorage BCEquals method for org.apache.pig.impl.streaming.StreamingCommand$HandleSpec assumes the argument is of type StreamingCommand$HandleSpec DPorg.apache.pig.data.BagFactory.getInstance() creates a java.net.URLClassLoader classloader, which should be performed within a doPrivileged block DPorg.apache.pig.data.TupleFactory.getInstance() creates a java.net.URLClassLoader classloader, which should be performed within a doPrivileged block DPorg.apache.pig.impl.PigContext.createCl(String) creates a java.net.URLClassLoader classloader, which should be performed within a doPrivileged block DPorg.apache.pig.impl.util.JarManager.createCl(String, PigContext) creates a java.net.URLClassLoader classloader, which should be performed within a doPrivileged block Eqorg.apache.pig.data.DistinctDataBag$DistinctDataBagIterator$TContainer defines compareTo(DistinctDataBag$DistinctDataBagIterator$TContainer) and uses Object.equals() Eqorg.apache.pig.data.SingleTupleBag defines compareTo(Object) and uses Object.equals() Eqorg.apache.pig.data.SortedDataBag$SortedDataBagIterator$PQContainer defines compareTo(SortedDataBag$SortedDataBagIterator$PQContainer) and uses Object.equals() Eqorg.apache.pig.data.TargetedTuple defines compareTo(Object) and uses Object.equals() HE org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan defines equals and uses Object.hashCode() HE org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator defines equals and uses Object.hashCode() HEorg.apache.pig.builtin.BinaryStorage defines equals and uses Object.hashCode() HEorg.apache.pig.builtin.BinStorage defines equals and uses Object.hashCode() HEorg.apache.pig.builtin.PigStorage defines equals and uses Object.hashCode() HEorg.apache.pig.data.InternalSortedBag$DefaultComparator defines equals and uses Object.hashCode() HEorg.apache.pig.data.NonSpillableDataBag defines equals and uses Object.hashCode() HEorg.apache.pig.data.SortedDataBag$DefaultComparator defines equals and uses Object.hashCode() HEorg.apache.pig.impl.streaming.StreamingCommand$HandleSpec defines equals and uses Object.hashCode() Nm org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PhyPlanSetter.visitSplit(POSplit) doesn't override method in superclass because parameter type org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit doesn't match superclass parameter type org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit Nm org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PhyPlanSetter.visitSplit(POSplit) doesn't override method in superclass because parameter type org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit doesn't match superclass parameter type org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit RV org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.deleteLocalDir(File) ignores
[jira] Updated: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner
[ https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1072: Fix Version/s: 0.7.0 ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner --- Key: PIG-1072 URL: https://issues.apache.org/jira/browse/PIG-1072 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1072.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1052) FINDBUGS: remaining performance warnings
[ https://issues.apache.org/jira/browse/PIG-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1052: Fix Version/s: 0.6.0 FINDBUGS: remaining performance warnings Key: PIG-1052 URL: https://issues.apache.org/jira/browse/PIG-1052 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-1052.patch SBSC Method org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(String) concatenates strings using + in a loop SBSC Method org.apache.pig.impl.logicalLayer.LOCross.getSchema() concatenates strings using + in a loop SBSC Method org.apache.pig.impl.logicalLayer.LOForEach.getSchema() concatenates strings using + in a loop SBSC Method org.apache.pig.PigServer.locateJarFromResources(String) concatenates strings using + in a loop SBSC Method org.apache.pig.tools.parameters.ParseException.initialise(Token, int[][], String[]) concatenates strings using + in a loop SBSC Method org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String) concatenates strings using + in a loop SSUnread field: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.OOM_ERR; should this field be static? SSUnread field: org.apache.pig.impl.io.BufferedPositionedInputStream.bufSize; should this field be static? UPM Private method org.apache.pig.impl.plan.optimizer.RulePlanPrinter.planString(List) is never called UPM Private method org.apache.pig.impl.plan.PlanPrinter.planString(List) is never called WMI Method org.apache.pig.builtin.PigStorage.putField(Object) makes inefficient use of keySet iterator instead of entrySet iterator WMI Method org.apache.pig.data.DataType.mapToString(Map) makes inefficient use of keySet iterator instead of entrySet iterator WMI Method org.apache.pig.impl.logicalLayer.LOCross.getSchema() makes inefficient use of keySet iterator instead of entrySet iterator WMI Method org.apache.pig.impl.logicalLayer.LOForEach.getSchema() makes inefficient use of keySet iterator instead of entrySet iterator WMI Method org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(Schema$FieldSchema, String) makes inefficient use of keySet iterator instead of entrySet iterator WMI Method org.apache.pig.impl.plan.CompilationMessageCollector.logAggregate(Map, CompilationMessageCollector$MessageType, Log) makes inefficient use of keySet iterator instead of entrySet iterator WMI Method org.apache.pig.StandAloneParser.tryParse(String) makes inefficient use of keySet iterator instead of entrySet iterator -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1058) FINDBUGS: remaining Correctness Warnings
[ https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1058: Fix Version/s: 0.6.0 FINDBUGS: remaining Correctness Warnings -- Key: PIG-1058 URL: https://issues.apache.org/jira/browse/PIG-1058 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-1058.patch, PIG-1058_v2.patch BCImpossible cast from java.lang.Object[] to java.lang.String[] in org.apache.pig.PigServer.listPaths(String) ECCall to equals() comparing different types in org.apache.pig.impl.plan.Operator.equals(Object) GCjava.lang.Byte is incompatible with expected argument type java.lang.Integer in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange) ILThere is an apparent infinite recursive loop in org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object) INT Bad comparison of nonnegative value with -1 in org.apache.tools.bzip2r.CBZip2InputStream.bsR(int) INT Bad comparison of nonnegative value with -1 in org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() INT Bad comparison of nonnegative value with -1 in org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode() MFField ConstantExpression.res masks field in superclass org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator Nm org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit) doesn't override method in superclass because parameter type org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit doesn't match superclass parameter type org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit Nm org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit) doesn't override method in superclass because parameter type org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit doesn't match superclass parameter type org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit NPPossible null pointer dereference of ? in org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List) NPPossible null pointer dereference of lo in org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List) NPPossible null pointer dereference of Schema$FieldSchema.Schema$FieldSchema.alias in org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, boolean, boolean) NPPossible null pointer dereference of Schema$FieldSchema.alias in org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema, Schema$FieldSchema, boolean, boolean) NPPossible null pointer dereference of inp in org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run() RCN Nullcheck of pigContext at line 123 of value previously dereferenced in org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext) RV org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String, Properties) ignores return value of java.net.InetAddress.getByName(String) RVBad attempt to compute absolute value of signed 32-bit hashcode in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable, Writable, int) RVBad attempt to compute absolute value of signed 32-bit hashcode in org.apache.pig.impl.plan.DotPlanDumper.getID(Operator) UwF Field only ever set to null: org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1055) FINDBUGS: remaining Dodgy Warnings
[ https://issues.apache.org/jira/browse/PIG-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1055: Fix Version/s: 0.6.0 FINDBUGS: remaining Dodgy Warnings Key: PIG-1055 URL: https://issues.apache.org/jira/browse/PIG-1055 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-1055.patch BCQuestionable cast from java.util.List to java.util.ArrayList in new org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit(PigContext, FileSystem, Path, String, List, long, long) Eqorg.apache.pig.data.AmendableTuple doesn't override DefaultTuple.equals(Object) Eqorg.apache.pig.data.TimestampedTuple doesn't override DefaultTuple.equals(Object) IAAmbiguous invocation of either an outer or inherited method org.apache.pig.impl.plan.DotPlanDumper.getName(Operator) in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.DotMRPrinter$InnerPrinter.getAttributes(DotMRPrinter$InnerOperator) IMComputation of average could overflow in org.apache.tools.bzip2r.CBZip2OutputStream.qSort3(int, int, int) IMCheck for oddness that won't work for negative numbers in org.apache.tools.bzip2r.CBZip2OutputStream.sendMTFValues() REC Exception is caught when Exception is not thrown in org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.doHod(String, Properties) REC Exception is caught when Exception is not thrown in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer.visitMROp(MapReduceOper) REC Exception is caught when Exception is not thrown in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitDistinct(PODistinct) REC Exception is caught when Exception is not thrown in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(POFRJoin) REC Exception is caught when Exception is not thrown in org.apache.pig.impl.logicalLayer.optimizer.OpLimitOptimizer.processNode(LOLimit) REC Exception is caught when Exception is not thrown in org.apache.pig.tools.streams.StreamGenerator.actionPerformed(ActionEvent) STWrite to static field org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner.sJobConf from instance method org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.configure(JobConf) STWrite to static field org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.activeSplit from instance method org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getRecordReader(InputSplit, JobConf, Reporter) STWrite to static field org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.sJob from instance method org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getRecordReader(InputSplit, JobConf, Reporter) STWrite to static field org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce.sJobConf from instance method org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.configure(JobConf) STWrite to static field org.apache.pig.data.BagFactory.gMemMgr from instance method new org.apache.pig.data.BagFactory() STWrite to static field org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.mOpToCloneMap from instance method new org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper(LogicalPlan, Map) STWrite to static field org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.classloader from instance method org.apache.pig.impl.PigContext.addJar(URL) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1051) FINFBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE: Possible null pointer dereference due to return value of called method
[ https://issues.apache.org/jira/browse/PIG-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1051: Fix Version/s: 0.6.0 FINFBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE: Possible null pointer dereference due to return value of called method Key: PIG-1051 URL: https://issues.apache.org/jira/browse/PIG-1051 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.6.0 Attachments: PIG-1051.patch NPPossible null pointer dereference in org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.CountingMap.put(Object, Integer) due to return value of called method NPLoad of known null value in org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone() NPLoad of known null value in org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone() NPLoad of known null value in org.apache.pig.impl.logicalLayer.optimizer.OpLimitOptimizer.check(List) NPLoad of known null value in org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.getOperator(List) NPLoad of known null value in org.apache.pig.impl.logicalLayer.optimizer.PushUpFilter.getOperator(List) NPLoad of known null value in org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.check(List) NPLoad of known null value in org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.getOperator(List) NPLoad of known null value in org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.getOperator(List) NPLoad of known null value in org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchema(Schema, Schema, boolean, boolean, boolean) NPLoad of known null value in org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchema(Schema, Schema, boolean, boolean, boolean) NPPossible null pointer dereference in org.apache.pig.impl.util.LineageTracer.getWeightedCounts(IdentityHashSet, int) due to return value of called method NPPossible null pointer dereference in org.apache.pig.impl.util.LineageTracer.getWeightedCounts(IdentityHashSet, int) due to return value of called method NPPossible null pointer dereference in org.apache.pig.impl.util.LineageTracer.insert(Tuple) due to return value of called method NPPossible null pointer dereference in org.apache.pig.impl.util.LineageTracer.link(Tuple, Tuple) due to return value of called method NPPossible null pointer dereference in org.apache.pig.impl.util.LineageTracer.link(Tuple, Tuple) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map, DataBag, LineageTracer, Map) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map, DataBag, LineageTracer, Map) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map, DataBag, LineageTracer, Map) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map, DataBag, LineageTracer, Map) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.util.LineageTracer.getWeightedCounts(float, float) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.util.LineageTracer.getWeightedCounts(float, float) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.util.LineageTracer.insert(Tuple) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.util.LineageTracer.link(Tuple, Tuple) due to return value of called method NPPossible null pointer dereference in org.apache.pig.pen.util.LineageTracer.link(Tuple, Tuple) due to return value of called method NPPossible null pointer dereference in org.apache.pig.StandAloneParser.main(String[]) due to return value of called method -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1313) PigServer leaks memory over time
[ https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849501#action_12849501 ] Bill Graham commented on PIG-1313: -- You summed it up well Alan. My ThreadLocal suggestion was really just because we could modify one class internally instead of doing a much larger refactor. I'm unclear though on how we'd go about moving FileLocalizer.toDelete and FileLocalizer.deleteOnFail into PigServer? Currently, calls to the FileLocalizer methods that create these temp file objects happen all over the codebase in places where the calling code wouldn't have a handle to their PigServer instance AFAIK. Unless they could get the PigServer from the PigContext or something of the sort. Otherwise, it would need to be a static call to PigServer methods, and we've just moved the same problem to another class. PigServer leaks memory over time Key: PIG-1313 URL: https://issues.apache.org/jira/browse/PIG-1313 Project: Pig Issue Type: Bug Reporter: Bill Graham Attachments: Pig1313Reproducer.java When {{PigServer}} runs it creates temporary files using the {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and returns a handle to a temporary file (as an instance of {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} get removed by the {{FileLocalizer.deleteTempFile()}} method. The only place in the code where I see {{FileLocalizer.deleteTempFile()}} called is in the Main class. {{PigServer}} does not call that method though, so a long-running VM that repeatedly uses instances of {{PigServer}} to run jobs will leak memory via {{toDelete}}. One suggested fix is to have {{PigServer.shutdown()}} call {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a multi-threaded environment, since it seems {{ElementDescriptors}} are pushed onto the {{toDelete}} stack before they're used, not once they're done with. With this approach, running multiple instances of {{PigServer}} in separate threads could cause one completed job to clobber the other's still-in-use temp files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-964) Handling null in skewed join
[ https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-964: --- Fix Version/s: 0.4.0 Handling null in skewed join - Key: PIG-964 URL: https://issues.apache.org/jira/browse/PIG-964 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Assignee: Sriranjan Manjunath Fix For: 0.4.0 Attachments: skewedjoinnull.patch For null tuples, the tuple size is calculated incorrectly and thus skewed join ends up expecting a large number of reducers. Further, skewed join should not bail out after the second job if the number of reducers specified by the user is low. It should print a warning message and continue execution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-458) Type branch integration with hadoop 18
[ https://issues.apache.org/jira/browse/PIG-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-458: --- Fix Version/s: 0.2.0 Type branch integration with hadoop 18 -- Key: PIG-458 URL: https://issues.apache.org/jira/browse/PIG-458 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.2.0 Attachments: hadoop18.jar, PIG-458.patch, un18.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math
[ https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849528#action_12849528 ] Russell Jurney commented on PIG-1310: - Thanks, Alan, I'll add all those changes tonight. I confess to not really testing CustomFormatToISO other than the test case, I'll update the docs :) As to ISO format - I will link to it and jodatime, and I would suggest ISO8601 be the standard representation of datetimes in Pig, as it handles time zones and is sortable as text - which is nice. ISO Date UDFs: Conversion, Rounding and Date Math - Key: PIG-1310 URL: https://issues.apache.org/jira/browse/PIG-1310 Project: Pig Issue Type: New Feature Components: impl Reporter: Russell Jurney Fix For: 0.7.0 Attachments: datetime.patch, datetime2.patch Original Estimate: 168h Remaining Estimate: 168h I've written UDFs to handle loading unix times, datemonth values and ISO 8601 formatted date strings, and working with them as ISO datetimes using jodatime. The working code is here: http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/ It needs to be documented and tests added, and a couple UDFs are missing, but these work if you REGISTER the jodatime jar in your script. Hopefully I can get this stuff in piggybank before someone else writes it this time :) The rounding also may not be performant, but the code works. Ultimately I'd also like to enable support for ISO 8601 durations. Someone slap me if this isn't done soon, it is not much work and this should help everyone working with time series. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-336) NULL checks are not in place in the types branch
[ https://issues.apache.org/jira/browse/PIG-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-336: --- Fix Version/s: 0.2.0 NULL checks are not in place in the types branch Key: PIG-336 URL: https://issues.apache.org/jira/browse/PIG-336 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.2.0 Attachments: PIG-336-part1.patch, PIG-336-part1_v2.patch, PIG-336-part1_v3.patch, PIG-336-part1_v4.patch, PIG-336-part2.patch, PIG-336.patch The following code currently does not work {code} B = filter A by $0 is null and $1 is null; {code} Some other things which don't work with nulls include POAND, POOR etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Attachment: PIG-1306.patch [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits
[ https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1306: -- Status: Patch Available (was: Open) [zebra] Support of locally sorted input splits -- Key: PIG-1306 URL: https://issues.apache.org/jira/browse/PIG-1306 Project: Pig Issue Type: Improvement Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.7.0 Attachments: PIG-1306.patch Current Zebra supports sorted or unsorted input splits on sorted table or sorted table unions. The sorted input splits are based upon key ranges which do not overlap. And the splits are basically globally sorted in that they are locally sorted, and their key ranges do not overlap. The biggest problem of the key-range splits are performance hits suffered if data skew is present, particularly if a key range contains a duplicate key solely which makes the data trunk of the duplicate keys virtually unsplittable regardless how many mappers are available: it just has to be processed by a single mapper. On the other hand, there are scenarios when the globally sorted splits are a over-kill and only locally sorted splits are good enough. Examples are the use of Zebra sorted tables as the probe table in a map-side merge inner join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
JIRA Fix Version
A reminder to Pig committers: When closing a JIRA issue as Resolved/ Fixed please make sure to set the Fix Version field. This helps our users know what versions they need to use to get fixes for their issues. And it helps release managers when they build releases to know what is and isn't in the release they're building. There were ~170 issues in Pig's JIRA marked fixed but with no version. I've assigned most of them to the appropriate version. Alan.
[jira] Commented: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
[ https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849568#action_12849568 ] Hadoop QA commented on PIG-1317: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12439703/PIG-1317.patch against trunk revision 926846. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/247/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/247/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/247/console This message is automatically generated. LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() - Key: PIG-1317 URL: https://issues.apache.org/jira/browse/PIG-1317 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1317.patch In LOLoad.getProjectionMap(), the private method determineSchema() is called which inturn calls LoadMetadata.getSchema() - the latter call could potentially be expensive if the input file is read to determine the schema or a metadata system is contacted to get the schema - determineSchema() can cache the schema it gets so that subsequent calls use the cached version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.