Hi, inStockSkusBitSet.get(currentChildDocNumber) Is that child a lucene id? If yes, does it include offset? Every index segment starts at a different point, but docs are numbered from zero. So to check them against the full index bitset, I'd be doing Bitset.exists(indexBase + docid)
Just one thing to check Roman On Aug 3, 2015 1:24 AM, "Stephen Weiss" <steve.we...@wgsn.com> wrote: > Hi everyone, > > I'm trying to write a PostFilter for Solr 5.1.0, which is meant to crawl > through grandchild documents during a search through the parents and filter > out documents based on statistics gathered from aggregating the > grandchildren together. I've been successful in getting the logic correct, > but it does not perform so well - I'm grabbing too many documents from the > index along the way. I'm trying to filter out grandchild documents which > are not relevant to the statistics I'm collecting, in order to reduce the > number of document objects pulled from the IndexReader. > > I've implemented the following code in my DelegatingCollector.collect: > > if (inStockSkusBitSet == null) { > SolrIndexSearcher SidxS = (SolrIndexSearcher) idxS; // type cast from > IndexSearcher to expose getDocSet. > inStockSkusDocSet = SidxS.getDocSet(inStockSkusQuery); > inStockSkusBitDocSet = (BitDocSet) inStockSkusDocSet; // type cast from > DocSet to expose getBits. > inStockSkusBitSet = inStockSkusBitDocSet.getBits(); > } > > > My BitDocSet reports a size which matches a standard query for the more > limited set of grandchildren, and the FixedBitSet (inStockSkusBitSet) also > reports this same cardinality. Based on that fact, it seems that the > getDocSet call itself must be working properly, and returning the right > number of documents. However, when I try to filter out grandchild > documents using either BitDocSet.exists or BitSet.get (passing over any > grandchild document which doesn't exist in the bitdocset or return true > from the bitset), I get about 1/3 less results than I'm supposed to. It > seems many documents that should match the filter, are being excluded, and > documents which should not match the filter, are being included. > > I'm trying to use it either of these ways: > > if (!inStockSkusBitSet.get(currentChildDocNumber)) continue; > if (!inStockSkusBitDocSet.exists(currentChildDocNumber)) continue; > > The currentChildDocNumber is simply the docNumber which is passed to > DelegatingCollector.collect, decremented until I hit a document that > doesn't belong to the parent document. > > I can't seem to figure out a way to actually use the BitDocSet (or its > derivatives) to quickly eliminate document IDs. It seems like this is how > it's supposed to be used. What am I getting wrong? > > Sorry if this is a newbie question, I've never written a PostFilter > before, and frankly, the documentation out there is a little sketchy > (mostly for version 4) - so many classes have changed names and so many of > the more well-documented techniques are deprecated or removed now, it's > tough to follow what the current best practice actually is. I'm using the > block join functionality heavily so I'm trying to keep more current than > that. I would be happy to send along the full source privately if it would > help figure this out, and plan to write up some more elaborate instructions > (updated for Solr 5) for the next person who decides to write a PostFilter > and work with block joins, if I ever manage to get this performing well > enough. > > Thanks for any pointers! Totally open to doing this an entirely different > way. I read DocValues might be a more elegant approach but currently that > would require reindexing, so trying to avoid that. > > Also, I've been wondering if the query above would read from the filter > cache or not. The query is constructed like this: > > > private Term inStockTrueTerm = new Term("sku_history.is_in_stock", > "T"); > private Term objectTypeSkuHistoryTerm = new Term("object_type", > "sku_history"); > ... > > inStockTrueTermQuery = new TermQuery(inStockTrueTerm); > objectTypeSkuHistoryTermQuery = new TermQuery(objectTypeSkuHistoryTerm); > inStockSkusQuery = new BooleanQuery(); > inStockSkusQuery.add(inStockTrueTermQuery, BooleanClause.Occur.MUST); > inStockSkusQuery.add(objectTypeSkuHistoryTermQuery, > BooleanClause.Occur.MUST); > -- > Steve > > ________________________________ > > WGSN is a global foresight business. Our experts provide deep insight and > analysis of consumer, fashion and design trends. We inspire our clients to > plan and trade their range with unparalleled confidence and accuracy. > Together, we Create Tomorrow. > > WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of > market-leading products including WGSN.com<http://www.wgsn.com>, WGSN > Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN > INstock<http://www.wgsninstock.com/>, WGSN StyleTrial< > http://www.wgsn.com/en/styletrial/> and WGSN Mindset< > http://www.wgsn.com/en/services/consultancy/>, our bespoke consultancy > services. > > The information in or attached to this email is confidential and may be > legally privileged. If you are not the intended recipient of this message, > any use, disclosure, copying, distribution or any action taken in reliance > on it is prohibited and may be unlawful. If you have received this message > in error, please notify the sender immediately by return email and delete > this message and any copies from your computer and network. WGSN does not > warrant that this email and any attachments are free from viruses and > accepts no liability for any loss resulting from infected email > transmissions. > > WGSN reserves the right to monitor all email through its networks. Any > views expressed may be those of the originator and not necessarily of WGSN. > WGSN is powered by Top Right Group<http://www.topright-group.com>, which > transforms knowledge businesses to deliver exceptional performance. > > Please be advised all phone calls may be recorded for training and quality > purposes and by accepting and/or making calls from and/or to us you > acknowledge and agree to calls being recorded. > > WGSN Limited, Company number 4858491 > > registered address: > > Top Right Group Limited, The Prow, 1 Wilder Walk, London W1B 5AP > > WGSN Inc., tax ID 04-3851246, registered office c/o National Registered > Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States > > 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): > 15.536.968/0001-04, Address: Avenida Nove de Julho, 5966, Loja, CEP > 01406-200, Jardim Europa, São Paulo > > 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询(上海)有限公司, > registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao > Road, Xuhui District, Shanghai >