[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-953: -- Attachment: PIG-953_missing_files.diff The attached diff was applied to svn but not posted to the jira. Combine it with patch-9 to get the full patch. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953-9.patch, PIG-953.patch, PIG-953_missing_files.diff > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Resolution: Fixed Fix Version/s: 0.6.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I ran the test-patch process and junit tests on my local machine since the hudson queue was backed up. Here are results - I have explained the reason for the javac warnings and release audit warnings in my previous comment. {noformat} test-patch results [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] -1 javac. The applied patch generated 200 javac compiler warnings (more than the trunk's current 197 warnings). [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 298 release audit warnings (more than the trunk's current 291 warnings). [exec] core unit test results == ... [junit] Running org.apache.pig.test.TestUnion [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 44.03 sec test-contrib: BUILD SUCCESSFUL {noformat} Patch committed to trunk > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: 0.6.0 > > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953-9.patch, PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-9.patch - New patch addresses all javadoc, unit test and findbugs issues. - The release audit warnings are unrelated issues relating to html files and not code related. - I tried supressing deprecated related javac warning in code but looks like there is an existing javac [bug|http://bugs.sun.com/view_bug.do?bug_id=6594914] - so there is no way I am aware of to supress this in code and we may need to live with these warnings till we move to the new hadoop api > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953-9.patch, PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-8.patch > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Status: Open (was: Patch Available) > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Status: Patch Available (was: Open) > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: (was: PIG-953-8.patch) > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-8.patch Fixed commit() code in PigOutputCommitter in multi store case to correctly set up the StoreConfig and StoreFunc in the Conf before calling commit() on the storefunc. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Status: Patch Available (was: Open) > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953-8.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-7.patch I missed allowing an IOException to be thrown in commit() in CommittableStoreFunc and initialize() in IndexableLoadFunc in my previous patch - attaching new version with just that change. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953-7.patch, PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-6.patch Dmitriy - by default when the application does not set an OutputCommitter, hadoop uses FileOutputCommitter. So currently (in trunk code) since pig does not set an OuptuCommitter, hadoop would be using FileOutputCommitter. Hence I derived from FileOutputCommitter so that the current cleanup continues to happen and we do the extra commit needed by Zebra. The new load-store redesign already has an allFinished() method in storeFunc which is the same as this commit except it does not have the Configuration - I have modified it to have the Configuration parameter. It turns out zebra needs the job configuration in order to open the right side file during merge join. Hence I am introducing an initialize(Configuration conf) method into the IndexableLoadFunc interface in the attached patch so that the pig runtime can call it allowing zebra to store this configuration for use in opening the right side file later. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953-6.patch, PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-5.patch Zebra needs a global commit method to be able to build an index on the sorted zebra file. Attaching a new patch which introduces a CommittableStoreFunc interfce with a commit() method which extends StoreFunc. Zebra store function will extend this interface and pig will call the commit() method on the CommittableStoreFunc at the completion of the job. While this is not ideal and we could add commit() into StoreFunc itself, it would break existing store functions. Also very soon, if changes in http://wiki.apache.org/pig/LoadStoreRedesignProposal are implemented, this would change anyway - so this new interface is being introduced so that till we move to the new interface changes recommended in the wiki we don't break existing store functions. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953-5.patch, PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-4.patch Thanks for the review Ashutosh - updated patch which addresses the concerns. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953-4.patch, > PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-3.patch Attached patch which has the SortColInfo implementation to convey sort column information in SortInfo. This patch also address PIG-981. > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953-2.patch, PIG-953-3.patch, PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Attachment: PIG-953-2.patch Attached new patch against latest trunk: Addressing the previous two comments: bq. [Ashutosh] What about LinkedHashMap? It provides all the properties we are seeking here, one data structure, O(1) lookup and guaranteed iteration order. LinkedHashMap is also a good choice - I think this internal structure at this point does not need to be optimized for lookup - hence leaving it as is bq. will result in NPE when both obj1 and obj2 are null. Fixed the NPE in Utils.checkNullAndClass - good catch! bq. A minor detail: Suppose obj1 is declared of type ArrayList and obj2 is declared of type ArrayList, obj1.getClass() == obj2.getClass() will return true thanks to type erasure by java compiler at compile time. Not sure if thats OK or not for the check here. You are right - however there is no way to work around type erasure - this is the best we can do. bq. Does zebra requires columns to be named? If it doesn't then SortInfo could be changed in such a way that it can provide column position instead of names to loader, if columns arent named. Zebra needs column names and cannot work with positions bq. Isnt this blocked on https://issues.apache.org/jira/browse/PIG-930 bz2 handling needs to be fixed but this code will be needed when it is fixed. This does not make things any worse since bz2 is currently already broken bq. because if status is Error, execution should be stopped and exception should be thrown as early as possible instead of continue doing work which will be wasted. If status is Null NPE will occur while doing join. Fixed to throw exception bq. there is no need of passing rightinputfilename from MRCompiler to POMergeJoin We do need these calls to tell Zebra the filename - you are right that pig's DefaultIndexableLoader doesn't need these - but the code has to work with Zebra also. bq. Also, seekNear() doesn't sound right. How about seekToClosest() ? I know seekNear() is vague and intentionally so - the hope is that users will read the javadoc comments to know how to implement it - seekToClosest would be equally vague in my opinion :) bq. I think introducing order preserving flag on logical operator is a good idea. I think the order preserving flag idea should be addressed in a different jira as it is orthogonal to this jira bq. Instead if we use following, we will achieve the same thing and then neither findbugs will complain, nor their is need for our own copy method. Fixed - removed Utils.getCopy bq. In POMergeJoin.java {code} // we should never get here! return new Result(POStatus.STATUS_ERR, null); could be changed to // we should never get here! throw new ExecException(errMsg,2176); {code} bq. because if we ever get there, it will result in NPE later on otherwise. The method has to return a Result. I think this will just be passed down the pipeline as an error and should not result in an NPE going by the code in getNext(): {code} Result rightInp = getNextRightInp(); if(rightInp.returnStatus != POStatus.STATUS_OK){ prevRightInp = null; return rightInp; } {code} Per a previous review comment, also changed Utils.checkNullEquals() to the following: {code} public static boolean checkNullEquals(Object obj1, Object obj2, boolean checkEquality) { if(obj1 == null || obj2 == null) { return obj1 == obj2; } if(checkEquality) { if(!obj1.equals(obj2)) { return false;
[jira] Updated: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data
[ https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-953: --- Summary: Enable merge join in pig to work with loaders and store functions which can internally index sorted data (was: Enable merge join in pig to work with loaders which can internally index sorted data ) > Enable merge join in pig to work with loaders and store functions which can > internally index sorted data > - > > Key: PIG-953 > URL: https://issues.apache.org/jira/browse/PIG-953 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.3.0 >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Attachments: PIG-953.patch > > > Currently merge join implementation in pig includes construction of an index > on sorted data and use of that index to seek into the "right input" to > efficiently perform the join operation. Some loaders (notably the zebra > loader) internally implement an index on sorted data and can perform this > seek efficiently using their index. So the use of the index needs to be > abstracted in such a way that when the loader supports indexing, pig uses it > (indirectly through the loader) and does not construct an index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.