[jira] [Assigned] (PIG-3000) Optimize nested foreach
[ https://issues.apache.org/jira/browse/PIG-3000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy reassigned PIG-3000: --- Assignee: Rohini Palaniswamy (was: Mona Chitnis) This is being addressed as part of PIG-5256 > Optimize nested foreach > --- > > Key: PIG-3000 > URL: https://issues.apache.org/jira/browse/PIG-3000 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.10.0 >Reporter: Richard Ding >Assignee: Rohini Palaniswamy >Priority: Major > Attachments: PIG-3000-6.patch, unit_tests.patch > > > In this Pig script: > {code} > A = load 'data' as (a:chararray); > B = foreach A { c = UPPER(a); generate ((c eq 'TEST') ? 1 : 0), ((c eq 'DEV') > ? 1 : 0); } > {code} > The Eval function UPPER is called twice for each record. > This should be optimized so that the UPPER is called only once for each record -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5336) Drop old documents from the site
[ https://issues.apache.org/jira/browse/PIG-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514130#comment-16514130 ] Koji Noguchi commented on PIG-5336: --- {quote}As for the redirects, it's not working. Keeping the .htaccess as-is and opened INFRA-16644 for tracking. {quote} It was actually my .htaccess issue. Changing it slightly made it work. [http://svn.apache.org/viewvc/pig/site/publish/.htaccess?r1=1833614=1833417=diff_format=s] Now, link like {{[https://pig.apache.org/docs/r0.13.0/start.html]}} is redirected to {{[https://pig.apache.org/docs/latest/start.html]}} I'll need to update the document for updating 'latest' for new releases. > Drop old documents from the site > > > Key: PIG-5336 > URL: https://issues.apache.org/jira/browse/PIG-5336 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Fix For: site > > Attachments: pig-5336-redirect.patch > > > When working on PIG-5334, saw bunch of old documents still being uploaded on > svn > {noformat} > knoguchi@truelisten-lm site> ls publish/docs/ | sort -V > r0.7.0/ > r0.8.1/ > r0.9.1/ > r0.9.2/ > r0.10.0/ > r0.10.1/ > r0.11.0/ > r0.11.1/ > r0.12.0/ > r0.12.1/ > r0.13.0/ > r0.14.0/ > r0.15.0/ > r0.16.0/ > r0.17.0/ > {noformat} > Sometimes I see our users referencing old documents due to this. > We should retire most of them and leave the recent ones. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner
[ https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-5342: Description: 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom join. When the keys are all unique, the combiner is unnecessary overhead. 2) In previous case, the keys were the bloom filter index and the values were the join key. Combining involved doing a distinct on the bag of values which has memory issues for more than 10 million records. That needs to be flipped and distinct combiner used to scale to a billions of records. 3) Mention in documentation that bloom join is also ideal in cases of right outer join with smaller dataset on the right. Replicate join only supports left outer join. was: 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom join. When the keys are all unique, the combiner is unnecessary overhead. 2) Mention in documentation that bloom join is also ideal in cases of right outer join with smaller dataset on the right. Replicate join only supports left outer join. 1) pkgr.setKeyType(DataType.INTEGER); should go in createBloomInMap and pkg.getPkgr().setKeyType(op.getPkgr().getKeyType()); should go in else clause. Not sure how it is working. The golden files also don't look write for the map case - key is showing as bytearray instead of int because of that. 2) if (pkg.getPkgr() instanceof BloomPackager ) should be (pkg.getPkgr() instanceof BloomPackager && pkgr.isBloomCreatedInMap()) 3) Please update one of the e2e tests in join.conf with a different value for pig.bloomjoin.num.filters > Add setting to turn off bloom join combiner > --- > > Key: PIG-5342 > URL: https://issues.apache.org/jira/browse/PIG-5342 > Project: Pig > Issue Type: Sub-task >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley >Priority: Major > Attachments: PIG-5342-1.patch, PIG-5342-2.patch > > > 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom > join. When the keys are all unique, the combiner is unnecessary overhead. > 2) In previous case, the keys were the bloom filter index and the values were > the join key. Combining involved doing a distinct on the bag of values which > has memory issues for more than 10 million records. That needs to be flipped > and distinct combiner used to scale to a billions of records. > 3) Mention in documentation that bloom join is also ideal in cases of right > outer join with smaller dataset on the right. Replicate join only supports > left outer join. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PIG-5342) Add setting to turn off bloom join combiner
[ https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Subhashrao Saley updated PIG-5342: - Attachment: PIG-5342-2.patch > Add setting to turn off bloom join combiner > --- > > Key: PIG-5342 > URL: https://issues.apache.org/jira/browse/PIG-5342 > Project: Pig > Issue Type: Sub-task >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley >Priority: Major > Attachments: PIG-5342-1.patch, PIG-5342-2.patch > > > 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom > join. When the keys are all unique, the combiner is unnecessary overhead. > 2) Mention in documentation that bloom join is also ideal in cases of right > outer join with smaller dataset on the right. Replicate join only supports > left outer join. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5342) Add setting to turn off bloom join combiner
[ https://issues.apache.org/jira/browse/PIG-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514304#comment-16514304 ] Satish Subhashrao Saley commented on PIG-5342: -- Updated the patch. > Add setting to turn off bloom join combiner > --- > > Key: PIG-5342 > URL: https://issues.apache.org/jira/browse/PIG-5342 > Project: Pig > Issue Type: Sub-task >Reporter: Satish Subhashrao Saley >Assignee: Satish Subhashrao Saley >Priority: Major > Attachments: PIG-5342-1.patch, PIG-5342-2.patch > > > 1) Need a new setting pig.bloomjoin.nocombiner to turn off combiner for bloom > join. When the keys are all unique, the combiner is unnecessary overhead. > 2) Mention in documentation that bloom join is also ideal in cases of right > outer join with smaller dataset on the right. Replicate join only supports > left outer join. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (36 issues) Subscriber: pigdaily Key Summary PIG-5342Add setting to turn off bloom join combiner https://issues.apache.org/jira/browse/PIG-5342 PIG-5338Prevent deep copy of DataBag into Jython List https://issues.apache.org/jira/browse/PIG-5338 PIG-5323Implement LastInputStreamingOptimizer in Tez https://issues.apache.org/jira/browse/PIG-5323 PIG-5317Upgrade old dependencies: commons-lang, hsqldb, commons-logging https://issues.apache.org/jira/browse/PIG-5317 PIG-5273_SUCCESS file should be created at the end of the job https://issues.apache.org/jira/browse/PIG-5273 PIG-5267Review of org.apache.pig.impl.io.BufferedPositionedInputStream https://issues.apache.org/jira/browse/PIG-5267 PIG-5256Bytecode generation for POFilter and POForeach https://issues.apache.org/jira/browse/PIG-5256 PIG-5191Pig HBase 2.0.0 support https://issues.apache.org/jira/browse/PIG-5191 PIG-5160SchemaTupleFrontend.java is not thread safe, cause PigServer thrown NPE in multithread env https://issues.apache.org/jira/browse/PIG-5160 PIG-5115Builtin AvroStorage generates incorrect avro schema when the same pig field name appears in the alias https://issues.apache.org/jira/browse/PIG-5115 PIG-5106Optimize when mapreduce.input.fileinputformat.input.dir.recursive set to true https://issues.apache.org/jira/browse/PIG-5106 PIG-5081Can not run pig on spark source code distribution https://issues.apache.org/jira/browse/PIG-5081 PIG-5080Support store alias as spark table https://issues.apache.org/jira/browse/PIG-5080 PIG-5057IndexOutOfBoundsException when pig reducer processOnePackageOutput https://issues.apache.org/jira/browse/PIG-5057 PIG-5029Optimize sort case when data is skewed https://issues.apache.org/jira/browse/PIG-5029 PIG-4926Modify the content of start.xml for spark mode https://issues.apache.org/jira/browse/PIG-4926 PIG-4913Reduce jython function initiation during compilation https://issues.apache.org/jira/browse/PIG-4913 PIG-4849pig on tez will cause tez-ui to crash,because the content from timeline server is too long. https://issues.apache.org/jira/browse/PIG-4849 PIG-4750REPLACE_MULTI should compile Pattern once and reuse it https://issues.apache.org/jira/browse/PIG-4750 PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4551Partition filter is not pushed down in case of SPLIT https://issues.apache.org/jira/browse/PIG-4551 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-1804Alow Jython function to implement Algebraic and/or Accumulator interfaces https://issues.apache.org/jira/browse/PIG-1804 You may edit this subscription at: https://issues.apache.org/jira/secure/EditSubscription!default.jspa?subId=16328=12322384