[jira] Commented: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-09-10 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753827#action_12753827 ] Pradeep Kamath commented on PIG-953: I see the first part as a coding style preference - I

[jira] Updated: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-10 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-951: --- Status: Patch Available (was: Open) Reset parallelism to 1 for indexing job in MergeJoin

[jira] Updated: (PIG-943) Pig crash when it cannot get counter from hadoop

2009-09-04 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-943: --- Assignee: Daniel Dai Status: Patch Available (was: Open) Pig crash when it cannot get counter

[jira] Commented: (PIG-943) Pig crash when it cannot get counter from hadoop

2009-09-04 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751560#action_12751560 ] Pradeep Kamath commented on PIG-943: I am wondering if we should report that we were

[jira] Created: (PIG-946) Combiner optimizer does not optimize when limit follow group, foreach

2009-09-04 Thread Pradeep Kamath (JIRA)
Affects Versions: 0.3.0 Reporter: Pradeep Kamath The following script is combinable but is not optimized: a = load '/user/pig/tests/data/singlefile/studenttab10k'; b = group a by $1; c = foreach b generate group, AVG(a.$2); d = limit c 10; dump d; -- This message is automatically

[jira] Updated: (PIG-578) join ... outer, ... outer semantics are a no-ops, should produce corresponding null values

2009-09-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-578: --- Fix Version/s: 0.4.0 Assignee: Pradeep Kamath Status: Patch Available (was: Open

[jira] Updated: (PIG-578) join ... outer, ... outer semantics are a no-ops, should produce corresponding null values

2009-09-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-578: --- Attachment: PIG-578-2.patch Addressed the javadoc warning and 4 of the findbugs. There will still be 1

[jira] Commented: (PIG-578) join ... outer, ... outer semantics are a no-ops, should produce corresponding null values

2009-09-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751224#action_12751224 ] Pradeep Kamath commented on PIG-578: Correction to previous comment: At some point we

[jira] Updated: (PIG-935) Skewed join throws an exception when used with map keys

2009-09-02 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-935: --- Assignee: (was: Santhosh Srinivasan) Skewed join throws an exception when used with map keys

[jira] Updated: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-09-01 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-934: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

[jira] Commented: (PIG-935) Skewed join throws an exception when used with map keys

2009-08-31 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749664#action_12749664 ] Pradeep Kamath commented on PIG-935: Review Comment: The fix looks good - the only change

[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-08-28 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749042#action_12749042 ] Pradeep Kamath commented on PIG-934: The reason I thought a separate function

[jira] Commented: (PIG-930) merge join should handle compressed bz2 sorted files

2009-08-27 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748468#action_12748468 ] Pradeep Kamath commented on PIG-930: I had spoken to Ben (who wrote the bzip2 code

RE: pig trunk build

2009-08-26 Thread Pradeep Kamath
What is the URL for the Hudson UI? I tried http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.ne t but that did not work. Pradeep -Original Message- From: Giridharan Kesavan [mailto:gkesa...@yahoo-inc.com] Sent: Wednesday, August 26, 2009 7:41 AM To:

[jira] Created: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-08-26 Thread Pradeep Kamath (JIRA)
URL: https://issues.apache.org/jira/browse/PIG-934 Project: Pig Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath We use POLoad to seek into right file which has the following code: {noformat} public void setUp() throws

[jira] Commented: (PIG-934) Merge join implementation currently does not seek to right point on the right side input based on the offset provided by the index

2009-08-26 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748070#action_12748070 ] Pradeep Kamath commented on PIG-934: To get an idea of how this seeking in case of regular

[jira] Commented: (PIG-930) merge join should handle compressed bz2 sorted files

2009-08-24 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747172#action_12747172 ] Pradeep Kamath commented on PIG-930: A couple of proposals: 1) We record

[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-20 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745535#action_12745535 ] Pradeep Kamath commented on PIG-926: Patch committed - thanks for the contribution

[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-19 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745181#action_12745181 ] Pradeep Kamath commented on PIG-926: In MRCompiler: You should change: {code

[jira] Created: (PIG-927) null should be handled consistently in Join

2009-08-18 Thread Pradeep Kamath (JIRA)
null should be handled consistently in Join --- Key: PIG-927 URL: https://issues.apache.org/jira/browse/PIG-927 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep

[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-18 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744705#action_12744705 ] Pradeep Kamath commented on PIG-926: Review commments Merge-Join phase 2

[jira] Commented: (PIG-926) Merge-Join phase 2

2009-08-18 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744706#action_12744706 ] Pradeep Kamath commented on PIG-926: Review comments: In MergeJoinIndexer

[jira] Updated: (PIG-845) PERFORMANCE: Merge Join

2009-08-14 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-845: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-08-10 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741621#action_12741621 ] Pradeep Kamath commented on PIG-845: Review comments: 1) In LogicalPlanTester.java, why

[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-901: --- Attachment: PIG-901-trunk.patch InputSplit (SliceWrapper) created by Pig is big in size due

[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-901: --- Status: Patch Available (was: Open) PIG-901-trunk.patch is for the trunk. The change

[jira] Created: (PIG-906) Need a way to test integration points with Hadoop from unit tests

2009-08-04 Thread Pradeep Kamath (JIRA)
Affects Versions: 0.3.1 Reporter: Pradeep Kamath Priority: Minor Currently there is no easy mechanisim from unit tests to get hold of the compiled JobConf (or Job) for a script from a unit test testcase. This may require some design changes like having public methods

[jira] Commented: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-04 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739056#action_12739056 ] Pradeep Kamath commented on PIG-901: https://issues.apache.org/jira/browse/PIG-906 has

[jira] Created: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-03 Thread Pradeep Kamath (JIRA)
Issue Type: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.4.0 InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext. SliceWrapper only needs ExecType - so the entire

[jira] Updated: (PIG-660) Integration with Hadoop 0.20

2009-08-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-660: --- Attachment: PIG-660-for-branch-0.3.patch Attached a patch for branch-0.3 based on PIG-660_5.patch

[jira] Created: (PIG-904) Conversion from double to chararray for udf input arguments does not occur

2009-08-03 Thread Pradeep Kamath (JIRA)
: Bug Affects Versions: 0.3.1 Reporter: Pradeep Kamath Script showing the problem: {noformat} a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa:double); b = foreach a generate CONCAT(gpa, 'dummy'); dump b; Error shown: 2009-08-03 17:04:27,573 [main] ERROR

[jira] Updated: (PIG-901) InputSplit (SliceWrapper) created by Pig is big in size due to serialized PigContext

2009-08-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-901: --- Attachment: PIG-901-branch-0.3.patch Patch for 0.3 branch InputSplit (SliceWrapper) created by Pig

[jira] Commented: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737310#action_12737310 ] Pradeep Kamath commented on PIG-880: +1 - looks good. Order by is borken with complex

[jira] Commented: (PIG-845) PERFORMANCE: Merge Join

2009-07-30 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737377#action_12737377 ] Pradeep Kamath commented on PIG-845: Some initial comments on POMergeJoin.java: If status

[jira] Commented: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737379#action_12737379 ] Pradeep Kamath commented on PIG-880: +1 to the new changes. Order by is borken

[jira] Commented: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-20 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733415#action_12733415 ] Pradeep Kamath commented on PIG-695: +1 Pig should not fail when error logs cannot

[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-16 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-880: --- Attachment: PIG-880-bytearray-mapvalue-code-without-tests.patch Attached code only patch which changes

[jira] Created: (PIG-879) Pig should provide a way for input location string in load statement to be passed as-is to the Loader

2009-07-10 Thread Pradeep Kamath (JIRA)
/PIG-879 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Pradeep Kamath Due to multiquery optimization, Pig always converts the filenames to absolute URIs (see http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification - section about

[jira] Commented: (PIG-880) Order by is borken with complex fields

2009-07-10 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729882#action_12729882 ] Pradeep Kamath commented on PIG-880: The root cause of this issue is that in interpreting

[jira] Updated: (PIG-724) Treating map values in PigStorage

2009-07-10 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-724: --- Summary: Treating map values in PigStorage (was: Treating integers and strings in PigStorage

[jira] Updated: (PIG-820) PERFORMANCE: The RandomSampleLoader should be changed to allow it subsume another loader

2009-07-07 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-820: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join

2009-07-06 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727648#action_12727648 ] Pradeep Kamath commented on PIG-872: Distributed cache can be used for the case where

[jira] Commented: (PIG-820) PERFORMANCE: The RandomSampleLoader should be changed to allow it subsume another loader

2009-07-06 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12727834#action_12727834 ] Pradeep Kamath commented on PIG-820: Review comments - two observations: 1. In PigStorage

[jira] Commented: (PIG-865) Performance: Unnnecessary computation in FRJoin

2009-06-29 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12725361#action_12725361 ] Pradeep Kamath commented on PIG-865: Looks good. Minor comment: Since the constant

[jira] Commented: (PIG-820) PERFORMANCE: The RandomSampleLoader should be changed to allow it subsume another loader

2009-06-26 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724717#action_12724717 ] Pradeep Kamath commented on PIG-820: +1 PERFORMANCE: The RandomSampleLoader should

[jira] Commented: (PIG-573) Changes to make Pig run with Hadoop 19

2009-06-26 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724729#action_12724729 ] Pradeep Kamath commented on PIG-573: Steps to apply this patch: 1) svn co http

[jira] Commented: (PIG-820) PERFORMANCE: The RandomSampleLoader should be changed to allow it subsume another loader

2009-06-25 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724204#action_12724204 ] Pradeep Kamath commented on PIG-820: In SampleOptimizer the following should change: {code

RE: [VOTE] Release Pig 0.3.0 (candidate 0)

2009-06-22 Thread Pradeep Kamath
+1 for release. -Pradeep -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Monday, June 22, 2009 9:30 AM To: priv...@hadoop.apache.org Cc: pig-dev@hadoop.apache.org; gene...@hadoop.apache.org Subject: Re: [VOTE] Release Pig 0.3.0 (candidate 0) Downloaded, ran, ran

[jira] Updated: (PIG-846) MultiQuery optimization in some cases has an issue when there is a split in the map plan

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-846: --- Attachment: PIG-846.patch MultiQuery optimization in some cases has an issue when there is a split

[jira] Created: (PIG-847) Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag

2009-06-12 Thread Pradeep Kamath (JIRA)
://issues.apache.org/jira/browse/PIG-847 Project: Pig Issue Type: Improvement Affects Versions: 0.2.1 Reporter: Pradeep Kamath Currently Pig interprets the result type of a relation as a bag. However the schema of the relation directly contains the schema

[jira] Created: (PIG-848) Explain output sometimes may not match the exact plan that is executed in terms of the order in which inner plans and operators are presented - (semantically the plans are th

2009-06-12 Thread Pradeep Kamath (JIRA)
) - Key: PIG-848 URL: https://issues.apache.org/jira/browse/PIG-848 Project: Pig Issue Type: Improvement Affects Versions: 0.2.1 Reporter: Pradeep Kamath

[jira] Updated: (PIG-846) MultiQuery optimization in some cases has an issue when there is a split in the map plan

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-846: --- Status: Open (was: Patch Available) Will be resubmitting a new patch - just realized that a few unit

[jira] Updated: (PIG-846) MultiQuery optimization in some cases has an issue when there is a split in the map plan

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-846: --- Attachment: PIG-846-v2.patch New patch - the only change is to not add extra information

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Attachment: PIG-846-v2.patch New patch - the only change is to not add extra information

[jira] Updated: (PIG-846) MultiQuery optimization in some cases has an issue when there is a split in the map plan

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-846: --- Status: Patch Available (was: Open) MultiQuery optimization in some cases has an issue when

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Comment: was deleted (was: New patch - the only change is to not add extra information

[jira] Updated: (PIG-846) MultiQuery optimization in some cases has an issue when there is a split in the map plan

2009-06-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-846: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

[jira] Created: (PIG-841) PERFORMANCE: The sample MR job in order by implementation can use Hadoop sorting instead of doing a POSort

2009-06-09 Thread Pradeep Kamath (JIRA)
/browse/PIG-841 Project: Pig Issue Type: Improvement Affects Versions: 0.2.1 Reporter: Pradeep Kamath Fix For: 0.3.0 Currently the sample map reduce job in order by implementation does the following: - sample 100 records from each map - group

[jira] Commented: (PIG-841) PERFORMANCE: The sample MR job in order by implementation can use Hadoop sorting instead of doing a POSort

2009-06-09 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12717835#action_12717835 ] Pradeep Kamath commented on PIG-841: This mechanism can be used for any join which

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-09 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Status: Open (was: Patch Available) Multiquery optimization does not handle the case where the map

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Attachment: PIG-835-v2.patch New patch with findbugs warnings addressed - essentially findbugs wanted

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-08 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Status: Patch Available (was: Open) Multiquery optimization does not handle the case where the map

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-05 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Attachment: PIG-835.patch The root cause of the issue is that the current multiQueryOptimizer checks

[jira] Created: (PIG-838) Parser does not handle ctrl-m ('\u000d') as argument to PigStorage

2009-06-05 Thread Pradeep Kamath (JIRA)
Versions: 0.2.1 Reporter: Pradeep Kamath An script which has a = load 'input' using PigStorage('\u000d'); produces the following error: 2009-06-05 14:47:49,241 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Lexical error at line 1, column 47

[jira] Updated: (PIG-835) Multiquery optimization does not handle the case where the map keys in the split plans have different key types (tuple and non tuple key type)

2009-06-05 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-835: --- Status: Patch Available (was: Open) Multiquery optimization does not handle the case where the map

[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-04 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-796: --- Resolution: Fixed Fix Version/s: 0.3.0 Hadoop Flags: [Reviewed] Status: Resolved

[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-796: --- Status: Patch Available (was: Open) support conversion from numeric types to chararray

[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-796: --- Status: Open (was: Patch Available) support conversion from numeric types to chararray

[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-796: --- Status: Patch Available (was: Open) support conversion from numeric types to chararray

[jira] Updated: (PIG-796) support conversion from numeric types to chararray

2009-06-03 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-796: --- Status: Open (was: Patch Available) support conversion from numeric types to chararray

[jira] Commented: (PIG-796) support conversion from numeric types to chararray

2009-06-01 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715325#action_12715325 ] Pradeep Kamath commented on PIG-796: A few comments: - In TestPOCast.java the variables

[jira] Updated: (PIG-816) PigStorage() does not accept Unicode characters in its contructor

2009-05-29 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-816: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

RE: UDF with parameters?

2009-05-26 Thread Pradeep Kamath
You should be able to send the percentile rank that you want to calculate as a udf argument like the way you stated - generate Percentile(90, duration) - here 90 will be an integer constant sent as the first argument to your udf. -Original Message- From: Brian Long

[jira] Updated: (PIG-804) problem with lineage with double map redirection

2009-05-26 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-804: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available

[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-26 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12713181#action_12713181 ] Pradeep Kamath commented on PIG-802: Changes look good - still have a comment about

[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-21 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711769#action_12711769 ] Pradeep Kamath commented on PIG-802: Review comments: In MRCompiler, does POPackageLite

[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-21 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12711811#action_12711811 ] Pradeep Kamath commented on PIG-802: I think even in the future if ReadOnceBags are used

[jira] Created: (PIG-814) Make Binstorage more robust when data contains record markers

2009-05-21 Thread Pradeep Kamath (JIRA)
: 0.2.1 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.3.0 When the inputstream for BinStorage is at a position where the data has the record marker sequence, the code incorrectly assumes that it is at the beginning of a record (tuple) and calls

[jira] Updated: (PIG-804) problem with lineage with double map redirection

2009-05-13 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-804: --- Fix Version/s: 0.3.0 Affects Version/s: 0.2.1 Status: Patch Available (was: Open

[jira] Updated: (PIG-804) problem with lineage with double map redirection

2009-05-13 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-804: --- Attachment: PIG-804.patch The root cause was in the parsers, in CastExp(), a getFieldSchema() was being

[jira] Created: (PIG-808) getFieldSchema() in ExpressionOperators also sets up lineage information - this can cause issues if getFieldSchema() is called too early

2009-05-13 Thread Pradeep Kamath (JIRA)
: PIG-808 URL: https://issues.apache.org/jira/browse/PIG-808 Project: Pig Issue Type: Bug Affects Versions: 0.2.1 Reporter: Pradeep Kamath Fix For: 0.3.0 See PIG-804 for a use case which exposes this bug. We should probably

[jira] Created: (PIG-807) PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop values iterator)

2009-05-12 Thread Pradeep Kamath (JIRA)
Project: Pig Issue Type: Improvement Affects Versions: 0.2.1 Reporter: Pradeep Kamath Fix For: 0.3.0 Currently all bags resulting from a group or cogroup are materialized as bags containing all of the contents. The issue

[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-12 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708551#action_12708551 ] Pradeep Kamath commented on PIG-802: Adding some more details: A new kind of bag

[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-07 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707064#action_12707064 ] Pradeep Kamath commented on PIG-802: PIG-744 is a duplicate - will be marking that one

[jira] Resolved: (PIG-775) PORelationToExprProject should create a NonSpillableDataBag to create empty bags

2009-04-24 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-775. Resolution: Fixed Patch committed. PORelationToExprProject should create a NonSpillableDataBag

[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization

2009-04-23 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12702005#action_12702005 ] Pradeep Kamath commented on PIG-627: All the work till now (phase 1 and phase2) has now

[jira] Resolved: (PIG-514) COUNT returns no results as a result of two filter statements in FOREACH

2009-04-22 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-514. Resolution: Fixed Fix Version/s: 0.3.0 Hadoop Flags: [Reviewed] Patch committed

[jira] Created: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported

2009-04-21 Thread Pradeep Kamath (JIRA)
Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Priority: Minor We should be able to create empty bag constant using {}, empty tuple constant using (), empty map constant using [] within a pig script -- This message is automatically generated

[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization

2009-04-20 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12700925#action_12700925 ] Pradeep Kamath commented on PIG-627: reviewed error_handling_0416.patch for additional

[jira] Resolved: (PIG-739) Filter in foreach seems to drop records resulting in decreased count of records

2009-04-16 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-739. Resolution: Duplicate Assignee: Pradeep Kamath This issue has the same root cause of PIG-514

[jira] Commented: (PIG-514) COUNT returns no results as a result of two filter statements in FOREACH

2009-04-16 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12699801#action_12699801 ] Pradeep Kamath commented on PIG-514: I am currently working on implementing the above

[jira] Updated: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-09 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-733: --- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed Order by sampling

[jira] Commented: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-06 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696244#action_12696244 ] Pradeep Kamath commented on PIG-733: Tests are not included in this patch since

[jira] Updated: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-06 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-733: --- Attachment: PIG-733-v2.patch Order by sampling dumps entire sample to hdfs which causes dfs FileSystem

[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization

2009-04-06 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696350#action_12696350 ] Pradeep Kamath commented on PIG-627: +1, patch committed. Thanks for the contribution

[jira] Updated: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-01 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-733: --- Fix Version/s: 0.3.0 Affects Version/s: 0.2.0 Status: Patch Available (was: Open

[jira] Updated: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-01 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-733: --- Attachment: PIG-733.patch Order by sampling dumps entire sample to hdfs which causes dfs FileSystem

[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization

2009-04-01 Thread Pradeep Kamath (JIRA)
[ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12694859#action_12694859 ] Pradeep Kamath commented on PIG-627: +1, patch committed - thanks for the contribution

[jira] Created: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-03-25 Thread Pradeep Kamath (JIRA)
/PIG-733 Project: Pig Issue Type: Bug Reporter: Pradeep Kamath Assignee: Pradeep Kamath Order by has a sampling job which samples the input and creates a sorted list of sample items. CUrrently the number of items sampled is 100 per map task. So

<    1   2   3   4   5   6   >