[jira] Commented: (PIG-915) Load row names in HBase loader

2010-03-24 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849230#action_12849230
 ] 

Jeff Zhang commented on PIG-915:


Olga, sorry for reply you late. This feature has been included in Pig-1205, so 
I think it is no need to track this jira item.



 Load row names in HBase loader
 --

 Key: PIG-915
 URL: https://issues.apache.org/jira/browse/PIG-915
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Alex Newman
Assignee: Jeff Zhang
Priority: Minor
 Fix For: 0.8.0

 Attachments: Pig_915.Patch


 Currently their is no way to get the Row names when doing a query from HBase, 
 we should probably remedy this as important data may be stored there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-282) Custom Partitioner

2010-03-24 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849279#action_12849279
 ] 

David Ciemiewicz commented on PIG-282:
--

How will the custom partitioner be used in Pig?

Is this for map partitioning and/or output partitioning?

For instance, I'd love to have something that created separate directories 
based on the value of some key.

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Reporter: Amir Youssefi
Priority: Minor

 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-282) Custom Partitioner

2010-03-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849280#action_12849280
 ] 

Alan Gates commented on PIG-282:


This JIRA refers to map-reduce partitioning.  Output partitioning of spraying 
to directories based on a key can be done now via a custom store function.

 Custom Partitioner
 --

 Key: PIG-282
 URL: https://issues.apache.org/jira/browse/PIG-282
 Project: Pig
  Issue Type: New Feature
Reporter: Amir Youssefi
Priority: Minor

 By adding custom partitioner we can give control over which output partition 
 a key (/value) goes to. We can add keywords to language e.g. 
 PARTITION BY UDF(...)
 or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
 of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Attachment: PIG-1316.patch

Attached patch implements the change to cache the results of 
LoadMetadata.getSchema for use in future calls.

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Status: Open  (was: Patch Available)

Attached wrong patch file

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Status: Patch Available  (was: Open)

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Attachment: (was: PIG-1316.patch)

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Status: Patch Available  (was: Open)

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1317.patch


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1317:


Attachment: PIG-1317.patch

Attached correct patch file now.

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1317.patch


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1316) TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files can be efficiently processed by splitting the files

2010-03-24 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1316:


Attachment: PIG-1316.patch

Attached patch makes the required changes in TextLoader to use 
BZip2TextInputFormat if the load location ends with extension .bz or .bz2 
like PigStorage. Also for non bzip data, TextLoader will now use 
PigTextInputFormat rather than TextInputFormat so that input directories can be 
recursively traversed. I have also changed BZip2TextInputFormat to extend 
PigFileInputFormat instead of FileInputFormat for the same reason.

 TextLoader should use Bzip2TextInputFormat for bzip files so that bzip files 
 can be efficiently processed by splitting the files
 

 Key: PIG-1316
 URL: https://issues.apache.org/jira/browse/PIG-1316
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1316.patch


 Currently TextLoader uses TextInputFormat which does not split bzip files - 
 this can be fixed by using Bzip2TextInputformat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849434#action_12849434
 ] 

Alan Gates commented on PIG-1310:
-

Since Pig plans to support SQL soon and since many Pig Latin users are familiar 
with SQL, we'd like to pick a datetime format that will work well with SQL.  
While ISO 8601 is not the same as SQL's datetime format, it looks to me like 
translation between the two will be reasonably easy.

DateMonthToISO lacks javadoc comments, making it hard to know what it does or 
how to use it.

The comments in front of other functions like ISOToUnix should be turned into 
javadoc (just change /* to /** and add an introductory sentence) so users can 
read them without needing to open the code itself.

It might be helpful in your javadocs to provide links to jodatime and somewhere 
that gives a good intro to ISO8601 date formats so users can figure out things 
like:  What does that Z at the end of the datetime string mean?

In UnixToISO your example shows an input of long, but the code assumes it's a 
string and parses the string into a long.  It should probably be the former, 
but whichever way you decide to do it the code and the comments should match.

In CustomFormatToISO the comments only show one input (the datetime string), 
but the code assumes two inputs, the datetime string and the format.  The 
comments should reflect this as well as give users an indication of how to 
construct the format string in a way that jodatime will understand it (or 
perhaps just link to somewhere in jodatime that it explains this).

All throughout, if there is an error in parsing the date the code depends on 
jodatime to throw an exception with a meaningful error message.  Have you 
tested that these error messages are reasonably helpful to users?  For now, in 
piggybank, this is ok.  If these eventually move into Pig proper these errors 
will need to be caught and Pig numbered error messages (which may just print 
the jodatime error message with a notification of which function it came from) 
will need to be added.

In ISOToX methods, the comments refer to rounding values.  But the code isn't 
rounding, it's truncating.


 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime.patch, datetime2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1313) PigServer leaks memory over time

2010-03-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849453#action_12849453
 ] 

Alan Gates commented on PIG-1313:
-

I'm not sure I understand all the pros and cons of Daniel's suggestion of 
moving the variables to PigServer versus Bill's suggestion of making them 
ThreadLocal.  The advantages I can see of moving the values to PigServer are:

# It's clearer to other developers what's going on, since they can see that 
these values are associated with an instance of PigServer.  Otherwise we're 
constructing a hidden dependency between the lifetime of PigServer and the 
thread it's running in.
# If at some future point Pig's frontend is multi-threaded this will still work 
(granted this is unlikely or at least far in the future)

The advantage I see with Bill's proposal is it's less change.

Are there other things I'm missing here?

 PigServer leaks memory over time
 

 Key: PIG-1313
 URL: https://issues.apache.org/jira/browse/PIG-1313
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
 Attachments: Pig1313Reproducer.java


 When {{PigServer}} runs it creates temporary files using the 
 {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and 
 returns a handle to a temporary file (as an instance of 
 {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method 
 are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} 
 get removed by the {{FileLocalizer.deleteTempFile()}} method.
 The only place in the code where I see {{FileLocalizer.deleteTempFile()}} 
 called is in the Main class. {{PigServer}} does not call that method though, 
 so a long-running VM that repeatedly uses instances of {{PigServer}} to run 
 jobs will leak memory via {{toDelete}}.
 One suggested fix is to have {{PigServer.shutdown()}} call 
 {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a 
 multi-threaded environment, since it seems {{ElementDescriptors}} are pushed 
 onto the {{toDelete}} stack before they're used, not once they're done with. 
 With this approach, running multiple instances of {{PigServer}} in separate 
 threads could cause one completed job to clobber the other's still-in-use 
 temp files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1309) Map-side Cogroup

2010-03-24 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1309:
--

Attachment: pig-1309.patch

Did offline review with Alan. Found a subtle bug in POMergeCogroup#getNext(). 
Fixed that and added more tests. Still need to tidy up things at few places. 
Looking for suggestion for better test cases that cover all the edge cases. 

 Map-side Cogroup
 

 Key: PIG-1309
 URL: https://issues.apache.org/jira/browse/PIG-1309
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: mapsideCogrp.patch, pig-1309.patch


 In never ending quest to make Pig go faster, we want to parallelize as many 
 relational operations as possible. Its already possible to do Group-by( 
 PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
 is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



TypeCheckingVisitor and casting to less precise numeric types

2010-03-24 Thread Anil Chawla


Hi,
I know that Pig has logic for casting inputs to the expected data types
when invoking a UDF and I understand that this logic resides in the
TypeCheckingVisitor class. I am curious to know why certain casts have been
omitted from the castLookup map. Specifically, I do not see any entries
for casting a more precise numeric type (e.g. Double) to a less precise
numeric type (e.g. Integer). Any reason why all down conversions of numeric
types have been omitted? Is it because we do not want to perform any
automatic casts that lead to a loss of precision (loss of data)?

In my situation, we are trying to abstract all numeric data types into a
single number type. If a UDF takes a numeric parameter, we want Pig to
invoke that UDF with any numeric argument, regardless of whether the
argument must be upconverted or downconverted. We are OK with the loss of
precision in that circumstance. As a result, we added the following to the
castLookup map:

castLookup.put(DataType.LONG, DataType.INTEGER);
castLookup.put(DataType.FLOAT, DataType.LONG);
castLookup.put(DataType.FLOAT, DataType.INTEGER);
castLookup.put(DataType.DOUBLE, DataType.FLOAT);
castLookup.put(DataType.DOUBLE, DataType.LONG);
castLookup.put(DataType.DOUBLE, DataType.INTEGER);

All of these casts seem to work fine our tests. Other than loss of
precision, is there any reason why adding these casts might be a bad idea?

Thanks,
-Anil

[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader

2010-03-24 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1315:
-

Attachment: zebra.0324

 [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
 

 Key: PIG-1315
 URL: https://issues.apache.org/jira/browse/PIG-1315
 Project: Pig
  Issue Type: New Feature
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: zebra.0324


 OrderedLoadFunc interface is used by Pig to do merge join and mapside 
 cogrouping. For Zebra, implementing this interface is necessary to support 
 mapside cogrouping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1315) [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader

2010-03-24 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-1315:
-

Fix Version/s: (was: 0.7.0)
   0.8.0
   Status: Patch Available  (was: Open)

 [Zebra] Implementing OrderedLoadFunc interface for Zebra TableLoader
 

 Key: PIG-1315
 URL: https://issues.apache.org/jira/browse/PIG-1315
 Project: Pig
  Issue Type: New Feature
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.8.0

 Attachments: zebra.0324


 OrderedLoadFunc interface is used by Pig to do merge join and mapside 
 cogrouping. For Zebra, implementing this interface is necessary to support 
 mapside cogrouping.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1268) [Zebra] Need an ant target that runs all pig-related tests in Zebra

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1268:


Fix Version/s: 0.7.0

 [Zebra] Need an ant target that runs all pig-related tests in Zebra
 ---

 Key: PIG-1268
 URL: https://issues.apache.org/jira/browse/PIG-1268
 Project: Pig
  Issue Type: Test
  Components: build
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.7.0

 Attachments: zebra.0303


 Currently Pig checkins don't run any Zebra test to make sure that Zebra is 
 not broken. To make this happen, Zebra build needs a test target that only 
 run pig-related tests. With this, Pig committers need to do ant pig for 
 Zebra as part of the before-checkin sanity check. Ideally, this target should 
 be triggered as part of Hudson.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1214) Pig/Zebra 0.6 patch - docs

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1214:


Fix Version/s: 0.6.0

 Pig/Zebra 0.6 patch - docs
 --

 Key: PIG-1214
 URL: https://issues.apache.org/jira/browse/PIG-1214
 Project: Pig
  Issue Type: Task
  Components: documentation
Affects Versions: 0.6.0
Reporter: Corinne Chandel
Assignee: Corinne Chandel
Priority: Blocker
 Fix For: 0.6.0

 Attachments: pig-1214-branch-0-6.patch, pig-1214-trunk.patch, 
 pig-1214.patch


 Pig Docs
  piglatin_ref2.xml - Update PigStorage function to include information about 
  '/r' delimiter
 Zebra Docs
  zebra_pig.xml - Add new section, Sorting Data: Zebra only supports tables 
  sorted in ascending (ASC) order; tables sorted in descending (DESC) order 
  are treated as unsorted tables

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1148) Move splitable logic from pig latin to InputFormat

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1148:


Fix Version/s: 0.7.0

 Move splitable logic from pig latin to InputFormat
 --

 Key: PIG-1148
 URL: https://issues.apache.org/jira/browse/PIG-1148
 Project: Pig
  Issue Type: Sub-task
Reporter: Jeff Zhang
Assignee: Jeff Zhang
 Fix For: 0.7.0

 Attachments: PIG-1148.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1088:


Fix Version/s: 0.7.0

 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.7.0

 Attachments: PIG-1088.1.patch, PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1115:


Fix Version/s: 0.7.0

 [zebra] temp files are not cleaned.
 ---

 Key: PIG-1115
 URL: https://issues.apache.org/jira/browse/PIG-1115
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Hong Tang
Assignee: Gaurav Jain
 Fix For: 0.7.0

 Attachments: PIG-1115.patch


 Temp files created by zebra during table creation are not cleaned where there 
 is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1141) Make streaming work with the new load-store interfaces

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1141:


Fix Version/s: 0.7.0

 Make streaming work with the new load-store interfaces 
 ---

 Key: PIG-1141
 URL: https://issues.apache.org/jira/browse/PIG-1141
 Project: Pig
  Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1141.patch, PIG-1141.patch, PIG-1141.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1110:


Fix Version/s: 0.7.0

 Handle compressed file formats -- Gz, BZip with the new proposal
 

 Key: PIG-1110
 URL: https://issues.apache.org/jira/browse/PIG-1110
 Project: Pig
  Issue Type: Sub-task
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1110.patch, PIG-1110.patch, PIG_1110_Jeff.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1059) FINDBUGS: remaining Bad practice + Multithreaded correctness Warning

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1059:


Fix Version/s: 0.6.0

 FINDBUGS: remaining Bad practice + Multithreaded correctness Warning
 

 Key: PIG-1059
 URL: https://issues.apache.org/jira/browse/PIG-1059
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-1059.patch


 ISInconsistent synchronization of 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.hodConfDir; 
 locked 66% of time
 ISInconsistent synchronization of 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.hodProcess; 
 locked 80% of time
 ISInconsistent synchronization of 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.remoteHodConfDir;
  locked 88% of time
 ISInconsistent synchronization of 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.initialized;
  locked 50% of time
 UG
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.getAggregate()
  is unsynchronized, 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.setAggregate(boolean)
  is synchronized
 UG
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.getReporter()
  is unsynchronized, 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger.setReporter(Reporter)
  is synchronized
 BCEquals method for org.apache.pig.builtin.PigStorage assumes the 
 argument is of type PigStorage
 BCEquals method for 
 org.apache.pig.impl.streaming.StreamingCommand$HandleSpec assumes the 
 argument is of type StreamingCommand$HandleSpec
 DPorg.apache.pig.data.BagFactory.getInstance() creates a 
 java.net.URLClassLoader classloader, which should be performed within a 
 doPrivileged block
 DPorg.apache.pig.data.TupleFactory.getInstance() creates a 
 java.net.URLClassLoader classloader, which should be performed within a 
 doPrivileged block
 DPorg.apache.pig.impl.PigContext.createCl(String) creates a 
 java.net.URLClassLoader classloader, which should be performed within a 
 doPrivileged block
 DPorg.apache.pig.impl.util.JarManager.createCl(String, PigContext) 
 creates a java.net.URLClassLoader classloader, which should be performed 
 within a doPrivileged block
 Eqorg.apache.pig.data.DistinctDataBag$DistinctDataBagIterator$TContainer 
 defines compareTo(DistinctDataBag$DistinctDataBagIterator$TContainer) and 
 uses Object.equals()
 Eqorg.apache.pig.data.SingleTupleBag defines compareTo(Object) and uses 
 Object.equals()
 Eqorg.apache.pig.data.SortedDataBag$SortedDataBagIterator$PQContainer 
 defines compareTo(SortedDataBag$SortedDataBagIterator$PQContainer) and uses 
 Object.equals()
 Eqorg.apache.pig.data.TargetedTuple defines compareTo(Object) and uses 
 Object.equals()
 HE
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan
  defines equals and uses Object.hashCode()
 HE
 org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator
  defines equals and uses Object.hashCode()
 HEorg.apache.pig.builtin.BinaryStorage defines equals and uses 
 Object.hashCode()
 HEorg.apache.pig.builtin.BinStorage defines equals and uses 
 Object.hashCode()
 HEorg.apache.pig.builtin.PigStorage defines equals and uses 
 Object.hashCode()
 HEorg.apache.pig.data.InternalSortedBag$DefaultComparator defines equals 
 and uses Object.hashCode()
 HEorg.apache.pig.data.NonSpillableDataBag defines equals and uses 
 Object.hashCode()
 HEorg.apache.pig.data.SortedDataBag$DefaultComparator defines equals and 
 uses Object.hashCode()
 HEorg.apache.pig.impl.streaming.StreamingCommand$HandleSpec defines 
 equals and uses Object.hashCode()
 Nm
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PhyPlanSetter.visitSplit(POSplit)
  doesn't override method in superclass because parameter type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
  doesn't match superclass parameter type 
 org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
 Nm
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PhyPlanSetter.visitSplit(POSplit)
  doesn't override method in superclass because parameter type 
 org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
  doesn't match superclass parameter type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
 RV
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.deleteLocalDir(File)
  ignores 

[jira] Updated: (PIG-1072) ReversibleLoadStoreFunc interface should be removed to enable different load and store implementation classes to be used in a reversible manner

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1072:


Fix Version/s: 0.7.0

 ReversibleLoadStoreFunc interface should be removed to enable different load 
 and store implementation classes to be used in a reversible manner
 ---

 Key: PIG-1072
 URL: https://issues.apache.org/jira/browse/PIG-1072
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Richard Ding
 Fix For: 0.7.0

 Attachments: PIG-1072.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1052) FINDBUGS: remaining performance warnings

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1052:


Fix Version/s: 0.6.0

 FINDBUGS: remaining performance warnings
 

 Key: PIG-1052
 URL: https://issues.apache.org/jira/browse/PIG-1052
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-1052.patch


 SBSC  Method 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStackTraceElement(String)
  concatenates strings using + in a loop
 SBSC  Method org.apache.pig.impl.logicalLayer.LOCross.getSchema() 
 concatenates strings using + in a loop
 SBSC  Method org.apache.pig.impl.logicalLayer.LOForEach.getSchema() 
 concatenates strings using + in a loop
 SBSC  Method org.apache.pig.PigServer.locateJarFromResources(String) 
 concatenates strings using + in a loop
 SBSC  Method org.apache.pig.tools.parameters.ParseException.initialise(Token, 
 int[][], String[]) concatenates strings using + in a loop
 SBSC  Method 
 org.apache.pig.tools.parameters.PreprocessorContext.executeShellCommand(String)
  concatenates strings using + in a loop
 SSUnread field: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.OOM_ERR;
  should this field be static?
 SSUnread field: 
 org.apache.pig.impl.io.BufferedPositionedInputStream.bufSize; should this 
 field be static?
 UPM   Private method 
 org.apache.pig.impl.plan.optimizer.RulePlanPrinter.planString(List) is never 
 called
 UPM   Private method org.apache.pig.impl.plan.PlanPrinter.planString(List) is 
 never called
 WMI   Method org.apache.pig.builtin.PigStorage.putField(Object) makes 
 inefficient use of keySet iterator instead of entrySet iterator
 WMI   Method org.apache.pig.data.DataType.mapToString(Map) makes inefficient 
 use of keySet iterator instead of entrySet iterator
 WMI   Method org.apache.pig.impl.logicalLayer.LOCross.getSchema() makes 
 inefficient use of keySet iterator instead of entrySet iterator
 WMI   Method org.apache.pig.impl.logicalLayer.LOForEach.getSchema() makes 
 inefficient use of keySet iterator instead of entrySet iterator
 WMI   Method 
 org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(Schema$FieldSchema,
  String) makes inefficient use of keySet iterator instead of entrySet iterator
 WMI   Method 
 org.apache.pig.impl.plan.CompilationMessageCollector.logAggregate(Map, 
 CompilationMessageCollector$MessageType, Log) makes inefficient use of keySet 
 iterator instead of entrySet iterator
 WMI   Method org.apache.pig.StandAloneParser.tryParse(String) makes 
 inefficient use of keySet iterator instead of entrySet iterator

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1058) FINDBUGS: remaining Correctness Warnings

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1058:


Fix Version/s: 0.6.0

 FINDBUGS: remaining Correctness Warnings
 --

 Key: PIG-1058
 URL: https://issues.apache.org/jira/browse/PIG-1058
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-1058.patch, PIG-1058_v2.patch


 BCImpossible cast from java.lang.Object[] to java.lang.String[] in 
 org.apache.pig.PigServer.listPaths(String)
 ECCall to equals() comparing different types in 
 org.apache.pig.impl.plan.Operator.equals(Object)
 GCjava.lang.Byte is incompatible with expected argument type 
 java.lang.Integer in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator$LoRearrangeDiscoverer.visitLocalRearrange(POLocalRearrange)
 ILThere is an apparent infinite recursive loop in 
 org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup$groupComparator.equals(Object)
 INT   Bad comparison of nonnegative value with -1 in 
 org.apache.tools.bzip2r.CBZip2InputStream.bsR(int)
 INT   Bad comparison of nonnegative value with -1 in 
 org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
 INT   Bad comparison of nonnegative value with -1 in 
 org.apache.tools.bzip2r.CBZip2InputStream.getAndMoveToFrontDecode()
 MFField ConstantExpression.res masks field in superclass 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
 Nm
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSplit(POSplit)
  doesn't override method in superclass because parameter type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
  doesn't match superclass parameter type 
 org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
 Nm
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.NoopStoreRemover$PhysicalRemover.visitSplit(POSplit)
  doesn't override method in superclass because parameter type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
  doesn't match superclass parameter type 
 org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POSplit
 NPPossible null pointer dereference of ? in 
 org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.check(List)
 NPPossible null pointer dereference of lo in 
 org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.transform(List)
 NPPossible null pointer dereference of 
 Schema$FieldSchema.Schema$FieldSchema.alias in 
 org.apache.pig.impl.logicalLayer.schema.Schema.equals(Schema, Schema, 
 boolean, boolean)
 NPPossible null pointer dereference of Schema$FieldSchema.alias in 
 org.apache.pig.impl.logicalLayer.schema.Schema$FieldSchema.equals(Schema$FieldSchema,
  Schema$FieldSchema, boolean, boolean)
 NPPossible null pointer dereference of inp in 
 org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run()
 RCN   Nullcheck of pigContext at line 123 of value previously dereferenced in 
 org.apache.pig.impl.util.JarManager.createJar(OutputStream, List, PigContext)
 RV
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.fixUpDomain(String,
  Properties) ignores return value of java.net.InetAddress.getByName(String)
 RVBad attempt to compute absolute value of signed 32-bit hashcode in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.getPartition(PigNullableWritable,
  Writable, int)
 RVBad attempt to compute absolute value of signed 32-bit hashcode in 
 org.apache.pig.impl.plan.DotPlanDumper.getID(Operator)
 UwF   Field only ever set to null: 
 org.apache.pig.impl.builtin.MergeJoinIndexer.dummyTuple

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1055) FINDBUGS: remaining Dodgy Warnings

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1055:


Fix Version/s: 0.6.0

 FINDBUGS: remaining Dodgy Warnings
 

 Key: PIG-1055
 URL: https://issues.apache.org/jira/browse/PIG-1055
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-1055.patch


 BCQuestionable cast from java.util.List to java.util.ArrayList in new 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit(PigContext,
  FileSystem, Path, String, List, long, long)
 Eqorg.apache.pig.data.AmendableTuple doesn't override 
 DefaultTuple.equals(Object)
 Eqorg.apache.pig.data.TimestampedTuple doesn't override 
 DefaultTuple.equals(Object)
 IAAmbiguous invocation of either an outer or inherited method 
 org.apache.pig.impl.plan.DotPlanDumper.getName(Operator) in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.DotMRPrinter$InnerPrinter.getAttributes(DotMRPrinter$InnerOperator)
 IMComputation of average could overflow in 
 org.apache.tools.bzip2r.CBZip2OutputStream.qSort3(int, int, int)
 IMCheck for oddness that won't work for negative numbers in 
 org.apache.tools.bzip2r.CBZip2OutputStream.sendMTFValues()
 REC   Exception is caught when Exception is not thrown in 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.doHod(String, 
 Properties)
 REC   Exception is caught when Exception is not thrown in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer.visitMROp(MapReduceOper)
 REC   Exception is caught when Exception is not thrown in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitDistinct(PODistinct)
 REC   Exception is caught when Exception is not thrown in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(POFRJoin)
 REC   Exception is caught when Exception is not thrown in 
 org.apache.pig.impl.logicalLayer.optimizer.OpLimitOptimizer.processNode(LOLimit)
 REC   Exception is caught when Exception is not thrown in 
 org.apache.pig.tools.streams.StreamGenerator.actionPerformed(ActionEvent)
 STWrite to static field 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner.sJobConf
  from instance method 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.configure(JobConf)
 STWrite to static field 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.activeSplit
  from instance method 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getRecordReader(InputSplit,
  JobConf, Reporter)
 STWrite to static field 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.sJob
  from instance method 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getRecordReader(InputSplit,
  JobConf, Reporter)
 STWrite to static field 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce.sJobConf
  from instance method 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.configure(JobConf)
 STWrite to static field org.apache.pig.data.BagFactory.gMemMgr from 
 instance method new org.apache.pig.data.BagFactory()
 STWrite to static field 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.mOpToCloneMap from 
 instance method new 
 org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper(LogicalPlan, Map)
 STWrite to static field 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.classloader from instance 
 method org.apache.pig.impl.PigContext.addJar(URL)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1051) FINFBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE: Possible null pointer dereference due to return value of called method

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1051:


Fix Version/s: 0.6.0

 FINFBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE: Possible null pointer 
 dereference due to return value of called method
 

 Key: PIG-1051
 URL: https://issues.apache.org/jira/browse/PIG-1051
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-1051.patch


 NPPossible null pointer dereference in 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.CountingMap.put(Object,
  Integer) due to return value of called method
 NPLoad of known null value in 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
 NPLoad of known null value in 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone()
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.optimizer.OpLimitOptimizer.check(List)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.optimizer.PushDownForeachFlatten.getOperator(List)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.optimizer.PushUpFilter.getOperator(List)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.optimizer.StreamOptimizer.check(List)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.getOperator(List)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.optimizer.TypeCastInserter.getOperator(List)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchema(Schema, Schema, 
 boolean, boolean, boolean)
 NPLoad of known null value in 
 org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchema(Schema, Schema, 
 boolean, boolean, boolean)
 NPPossible null pointer dereference in 
 org.apache.pig.impl.util.LineageTracer.getWeightedCounts(IdentityHashSet, 
 int) due to return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.impl.util.LineageTracer.getWeightedCounts(IdentityHashSet, 
 int) due to return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.impl.util.LineageTracer.insert(Tuple) due to return value of 
 called method
 NPPossible null pointer dereference in 
 org.apache.pig.impl.util.LineageTracer.link(Tuple, Tuple) due to return value 
 of called method
 NPPossible null pointer dereference in 
 org.apache.pig.impl.util.LineageTracer.link(Tuple, Tuple) due to return value 
 of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map,
  DataBag, LineageTracer, Map) due to return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map,
  DataBag, LineageTracer, Map) due to return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map,
  DataBag, LineageTracer, Map) due to return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.LineageTrimmingVisitor.PruneBaseDataConstrainedCoverage(Map,
  DataBag, LineageTracer, Map) due to return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.util.LineageTracer.getWeightedCounts(float, float) due to 
 return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.util.LineageTracer.getWeightedCounts(float, float) due to 
 return value of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.util.LineageTracer.insert(Tuple) due to return value of 
 called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.util.LineageTracer.link(Tuple, Tuple) due to return value 
 of called method
 NPPossible null pointer dereference in 
 org.apache.pig.pen.util.LineageTracer.link(Tuple, Tuple) due to return value 
 of called method
 NPPossible null pointer dereference in 
 org.apache.pig.StandAloneParser.main(String[]) due to return value of called 
 method

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1313) PigServer leaks memory over time

2010-03-24 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849501#action_12849501
 ] 

Bill Graham commented on PIG-1313:
--

You summed it up well Alan. My ThreadLocal suggestion was really just because 
we could modify one class internally instead of doing a much larger refactor. 

I'm unclear though on how we'd go about moving FileLocalizer.toDelete and 
FileLocalizer.deleteOnFail into PigServer? Currently, calls to the 
FileLocalizer methods that create these temp file objects happen all over the 
codebase in places where the calling code wouldn't have a handle to their 
PigServer instance AFAIK. Unless they could get the PigServer from the 
PigContext or something of the sort. Otherwise, it would need to be a static 
call to PigServer methods, and we've just moved the same problem to another 
class.



 PigServer leaks memory over time
 

 Key: PIG-1313
 URL: https://issues.apache.org/jira/browse/PIG-1313
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
 Attachments: Pig1313Reproducer.java


 When {{PigServer}} runs it creates temporary files using the 
 {{FileLocalizer.getTemporaryPath(..)}}. This static method creates and 
 returns a handle to a temporary file (as an instance of 
 {{ElementDescriptor}}). The {{ElementDescriptors}} returned by this method 
 are kept on a static {{Stack}} named {{toDelete}}. The items on {{toDelete}} 
 get removed by the {{FileLocalizer.deleteTempFile()}} method.
 The only place in the code where I see {{FileLocalizer.deleteTempFile()}} 
 called is in the Main class. {{PigServer}} does not call that method though, 
 so a long-running VM that repeatedly uses instances of {{PigServer}} to run 
 jobs will leak memory via {{toDelete}}.
 One suggested fix is to have {{PigServer.shutdown()}} call 
 {{FileLocalizer.deleteTempFile()}}, but this would cause problems in a 
 multi-threaded environment, since it seems {{ElementDescriptors}} are pushed 
 onto the {{toDelete}} stack before they're used, not once they're done with. 
 With this approach, running multiple instances of {{PigServer}} in separate 
 threads could cause one completed job to clobber the other's still-in-use 
 temp files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-964:
---

Fix Version/s: 0.4.0

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Fix For: 0.4.0

 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-458) Type branch integration with hadoop 18

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-458:
---

Fix Version/s: 0.2.0

 Type branch integration with hadoop 18
 --

 Key: PIG-458
 URL: https://issues.apache.org/jira/browse/PIG-458
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.2.0

 Attachments: hadoop18.jar, PIG-458.patch, un18.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1310) ISO Date UDFs: Conversion, Rounding and Date Math

2010-03-24 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849528#action_12849528
 ] 

Russell Jurney commented on PIG-1310:
-

Thanks, Alan, I'll add all those changes tonight.  I confess to not really 
testing CustomFormatToISO other than the test case, I'll update the docs :)

As to ISO format - I will link to it and jodatime, and I would suggest ISO8601 
be the standard representation of datetimes in Pig, as it handles time zones 
and is sortable as text - which is nice.  

 ISO Date UDFs: Conversion, Rounding and Date Math
 -

 Key: PIG-1310
 URL: https://issues.apache.org/jira/browse/PIG-1310
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Russell Jurney
 Fix For: 0.7.0

 Attachments: datetime.patch, datetime2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 I've written UDFs to handle loading unix times, datemonth values and ISO 8601 
 formatted date strings, and working with them as ISO datetimes using jodatime.
 The working code is here: 
 http://github.com/rjurney/oink/tree/master/src/java/oink/udf/isodate/
 It needs to be documented and tests added, and a couple UDFs are missing, but 
 these work if you REGISTER the jodatime jar in your script.  Hopefully I can 
 get this stuff in piggybank before someone else writes it this time :)  The 
 rounding also may not be performant, but the code works.
 Ultimately I'd also like to enable support for ISO 8601 durations.  Someone 
 slap me if this isn't done soon, it is not much work and this should help 
 everyone working with time series.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-336) NULL checks are not in place in the types branch

2010-03-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-336:
---

Fix Version/s: 0.2.0

 NULL checks are not in place in the types branch
 

 Key: PIG-336
 URL: https://issues.apache.org/jira/browse/PIG-336
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.2.0

 Attachments: PIG-336-part1.patch, PIG-336-part1_v2.patch, 
 PIG-336-part1_v3.patch, PIG-336-part1_v4.patch, PIG-336-part2.patch, 
 PIG-336.patch


 The following code currently does not work
 {code}
 B = filter A by $0 is null and $1 is null;
 {code}
 Some other things which don't work with nulls include POAND, POOR etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Attachment: PIG-1306.patch

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1306) [zebra] Support of locally sorted input splits

2010-03-24 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1306:
--

Status: Patch Available  (was: Open)

 [zebra] Support of locally sorted input splits
 --

 Key: PIG-1306
 URL: https://issues.apache.org/jira/browse/PIG-1306
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1306.patch


 Current Zebra supports sorted or unsorted input splits on sorted table or 
 sorted table unions. The sorted input splits are based upon key ranges which 
 do not overlap. And the splits are basically globally sorted in that they are 
 locally sorted, and their key ranges do not overlap.
 The biggest problem of the key-range splits are performance hits suffered if 
 data skew is present, particularly if a key range contains a duplicate key 
 solely which makes the data trunk of the duplicate keys virtually 
 unsplittable regardless how many mappers are available: it just has to be 
 processed by a single mapper.
 On the other hand, there are scenarios when the globally sorted splits are a 
 over-kill and only locally sorted splits are good enough. Examples are the 
 use of Zebra sorted tables as the probe table in a map-side merge inner join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA Fix Version

2010-03-24 Thread Alan Gates
A reminder to Pig committers:  When closing a JIRA issue as Resolved/ 
Fixed please make sure to set the Fix Version field.  This helps our  
users know what versions they need to use to get fixes for their  
issues.  And it helps release managers when they build releases to  
know what is and isn't in the release they're building.  There were  
~170 issues in Pig's JIRA marked fixed but with no version.  I've  
assigned most of them to the appropriate version.


Alan.


[jira] Commented: (PIG-1317) LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()

2010-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849568#action_12849568
 ] 

Hadoop QA commented on PIG-1317:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12439703/PIG-1317.patch
  against trunk revision 926846.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/247/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/247/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/247/console

This message is automatically generated.

 LOLoad should cache results of LoadMetadata.getSchema() for use in subsequent 
 calls to LOLoad.getSchema() or LOLoad.determineSchema()
 -

 Key: PIG-1317
 URL: https://issues.apache.org/jira/browse/PIG-1317
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.7.0

 Attachments: PIG-1317.patch


 In LOLoad.getProjectionMap(), the private method determineSchema() is called 
 which inturn calls LoadMetadata.getSchema() - the latter call could 
 potentially be expensive if the input file is read to determine the schema or 
 a metadata system is contacted to get the schema - determineSchema() can 
 cache the schema it gets so that subsequent calls use the cached version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.