Re: FacetExample.py
This one: lucene/core/src/test/org/apache/lucene/search/TestSort.java Yeah, I figured by comparing the size of these three... So, to make it short -- every thread should get its own Random instance from a call to LuceneTestCase's public static Random random() { return RandomizedContext.current().getRandom(); } More specifically, the inside of this method returns per-thread Random instance. The first time a thread calls this method it initializes to the same (master) seed. These are not super-easy things to rewrite, Andi. Don't know if you want to emulate the entire randomized testing infrastructure or just make it work consistently with one seed. Dawid
Re: FacetExample.py
Hi folks, about the randomness: I think this should not be the case. if different threads try to share the same random, actually there should be an exception from the test framework saying that each thread should get its own random (eg. initialized by a long value). So lucene-java tests should not have code that does this: otherwise it really makes test failures difficult to reproduce. This is exactly the case. The same instance of Random (acquired from the randomized context) is bound to one thread and will throw an exception if another thread tries to use it. You should check though -- there are places in the code where a normal Random is created with a seed provided by the framework's random. This is to speed up computations or in places where I couldn't propagate per-thread randoms normally. Per-thread Random instances are always initialized with the same initial seed, otherwise you'd have a race condition in which the order of acquiring random instances could influence the test's result. This is not to say it's a perfect method (if there are other races there is nothing to be done about them) but at least it's something. Which TestSort are we talking about because there seem to be three of them... D.
Re: FacetExample.py
So, if each thread gets the same seed, then they should also get the same random values, right ? They would start from the same seed so if they're calling that Random in the same pattern then yes -- they'd get the same values. Any real randomness will be non-reproducible. If this is needed for a test (and you know it won't be a problem if the test doesn't reproduce) then you could just do new Random() instead of calling LuceneTestCase's generator. D.
Re: FacetExample.py
On Feb 14, 2013, at 0:30, Dawid Weiss dawid.we...@gmail.com wrote: This one: lucene/core/src/test/org/apache/lucene/search/TestSort.java Yeah, I figured by comparing the size of these three... So, to make it short -- every thread should get its own Random instance from a call to LuceneTestCase's public static Random random() { return RandomizedContext.current().getRandom(); } More specifically, the inside of this method returns per-thread Random instance. The first time a thread calls this method it initializes to the same (master) seed. These are not super-easy things to rewrite, Andi. Don't know if you want to emulate the entire randomized testing infrastructure or just make it work consistently with one seed. No, definitely not. As an alternative, I could jcc-wrap the test framework but that would defeat some of the purpose. So, if each thread gets the same seed, then they should also get the same random values, right ? I was generating more random values (no per thread generator) and producing per-thread field configurations that were incompatible with each other in the field cache. I've now worked around this by caching some key random choices and reusing them for all threads. Andi.. Dawid
Re: FacetExample.py
Hi Andi, You're right - and API docs are wrong. Actually both must have change after 4.1 release: I checked the source of java-lucene v4.1 (lucene-4.1.0-src.tgz / 21-Jan-2013) and it matches the online javadocs. So I guess you're preparing for PyLucene v4.2? Note: I think that LUCENE_SVN=http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x is the trunk where 4.x development happens (i.e. unstable) whereas http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_1/ is the stable Lucene4.1/Solr4.1 branch (matching the 4.1 release and API docs). So if that's right (please correct me if I'm wrong) - why did you choose the branch_4x? Anyway, I fixed the FacetsExample.py for branch_4x now ,-) Some notes on API changes for those interested: -the 'new' FacetsCollector has a factory pattern now: public static FacetsCollector create(FacetSearchParams fsp, IndexReader indexReader, TaxonomyReader taxoReader) - the order of constructor arguments for FacetSearchParams has changed! - FacetResultNode has changed: it used to be an interface but is now a concrete class (and the method getSubResults of FacetResultNode disappeared) - DrillDown.query() became DrillDownQuery() - with a new API. Well, at least API docs state it: WARNING: This API is experimental and might change in incompatible ways in the next release. So one should be warned... Here's the new version: https://dl.dropbox.com/u/4384120/FacetExample.py Or as patch to svn: https://dl.dropbox.com/u/4384120/FacetExample_patch_20130213.txt Thanks again for your help. regards, Thomas -- Am 12.02.2013 um 22:36 schrieb Andi Vajda va...@apache.org: Hi Thomas, On Tue, 12 Feb 2013, Thomas Koch wrote: Thanks to your hints I was now able to build PyLucene4.1 and got further with the FacetExample.py - The imports should be OK now and most of the required changes are done I guess. However I now reached another problem: I need to instantiate the class 'FacetsCollector' but get an error when doing so: File samples/FacetExample.py, line 222, in searchWithRequestAndQuery facetsCollector = FacetsCollector(facetSearchParams, indexReader, taxoReader) NotImplementedError: ('instantiating java class', type 'FacetsCollector') The java example has this line: FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxoReader); and javadocs state it has a public constructor: http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/search/FacetsCollector.html#FacetsCollector(org.apache.lucene.facet.search.params.FacetSearchParams,%20org.apache.lucene.index.IndexReader,%20org.apache.lucene.facet.taxonomy.TaxonomyReader) So what could be the reason for this behavior? The FacetCollector class is declared abstract. Thus you can't instantiate it, constructor or not. I think the intent is to instantiate one of its concrete inner subclasses. See lucene-java-4.1/lucene/facet/src/java/org/apache/lucene/facet/search/FacetsCollector.java I have another problem with the constructor of FacetSearchParams: it is expecting arguments: (ListFacetRequest facetRequests, FacetIndexingParams indexingParams) but neither FacetSearchParams(Arrays.asList([facetRequest,]), indexingParams) nor FacetSearchParams([facetRequest,], indexingParams) does it here. I get lucene.InvalidArgsError: (type 'FacetSearchParams', '__init__', (List: [root/a nRes=10 nLbl=10], FacetIndexingParams: org.apache.lucene.facet.params.FacetIndexingParams@f97ad3c0)) There are four constructors on FacetSearchParams, none of which seems to match your call: public FacetSearchParams(FacetRequest... facetRequests) public FacetSearchParams(ListFacetRequest facetRequests) public FacetSearchParams(FacetIndexingParams indexingParams, FacetRequest... facetRequests) public FacetSearchParams(FacetIndexingParams indexingParams, ListFacetRequest facetRequests) See lucene-java-4.1/lucene/facet/src/java/org/apache/lucene/facet/params/FacetSearchParams.java You seem to be passing FacetIndexingParams last. Andi.. I thought that JavaList could help, but I cannot import it: from lucene.collections import JavaList Traceback (most recent call last): File stdin, line 1, in module File /Users/koch/.virtualenvs/pylucene/lib/python2.7/site-packages/lucene-4.1-py2.7-macosx-10.8-x86_64.egg/lucene/collections.py, line 17, in module from org.apache.pylucene.util import \ ImportError: No module named pylucene.util That's probably because I had to disable in Makefile ## JARS+=$(HIGHLIGHTER_JAR)# needs memory contrib ## JARS+=$(EXTENSIONS_JAR) # needs highlighter contrib Do you think that's a type cast issue and that JavaList would help here? I need to define a 'typed' list , e.g. ListFacetRequest FacetSearchParams API docs: http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/search/params/FacetSearchParams.html Current version of
Re: FacetExample.py
On Tue, Feb 12, 2013 at 3:11 AM, Andi Vajda va...@apache.org wrote: I then found that the test case from hell, TestSort.java, has majorly changed again and test_Sort.py needs to be ported again. Sigh. Andi.. I'm not laughing at your expense Andi... but this made me laugh out loud multiple times today. I've done battle with this thing several times, I feel like I always lose!
Re: FacetExample.py
On Wed, Feb 13, 2013 at 7:54 PM, Andi Vajda va...@apache.org wrote: On Wed, 13 Feb 2013, Robert Muir wrote: On Tue, Feb 12, 2013 at 3:11 AM, Andi Vajda va...@apache.org wrote: I then found that the test case from hell, TestSort.java, has majorly changed again and test_Sort.py needs to be ported again. Sigh. Andi.. I'm not laughing at your expense Andi... but this made me laugh out loud multiple times today. I've done battle with this thing several times, I feel like I always lose! I found lines 170 - 195 particularly clever :-) Jokes aside, I did spend a bunch of time of yesterday battling with the unspelled assumptions made in the Lucene random number generation code in the Lucene test framework. In particular, it seems that it expects different threads, sometimes, to get the same random values, no ? (I'm using Python's random number generator in PyLucene). The field cache sanity checker would otherwise complain, sometimes... Andi.. in all seriousness I dont like that committers' time is wasted on this. just a day or two ago I created a bug in this thing merging, and mike spent time tracking it down. I'd like to think i'm pretty careful about not breaking things when merging (I think i spent at least an hour merging this file alone very carefully, yet still screwed it up). so i opened https://issues.apache.org/jira/browse/LUCENE-4779 about the randomness: I think this should not be the case. if different threads try to share the same random, actually there should be an exception from the test framework saying that each thread should get its own random (eg. initialized by a long value). So lucene-java tests should not have code that does this: otherwise it really makes test failures difficult to reproduce. Unfortunately I'm not very familiar with what python does here, but I cc'ed Dawid just in case he knows off the top of his head.
Re: FacetExample.py
Hi Thomas, On Mon, 11 Feb 2013, Thomas Koch wrote: first please excuse I didn't get back to you regarding the tests - I did start with it but then got stuck and distracted from other tasks popping up. No excuse - I just failed to deliver what I promised. Oh well. No worries. Would you please port it to the new 4.x API so that it can be included with the PyLucene 4.1 release ? Yes, will do. And yes, the Facets API has changed - mainly due to a complete rewrite of huge parts of the code by Shai Erera (as written in LUCENE-4647). I haven't yet worked with 4.x so I had to check the documentation first... (mainly API docs). The most obvious change is in CategoryDocumentBuilder that has been replaced by the FacetFields class. The method of interest is certainly FacetFields.addFields(Document doc, IterableCategoryPath categories) that should be used instead of CategoryDocumentBuilder.setCategoryPaths(IterableCategoryPath categoryPaths) AND CategoryDocumentBuilder.build(org.apache.lucene.document.Document doc) It should be noted (and reported - to whom?) Either file a bug on https://issues.apache.org/jira/browse/LUCENE or send a note to d...@lucene.apache.org (subscribe first). that the Apache Lucene Faceted Search User's Guide at http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/doc-files/userguide.html does NOT yet reflect the new API and thus is misleading (or just plain wrong). Luckily the basic concept of a CategoryPath and a CountFacetRequest have remained unchanged AFAIK so changes should not be that big. Actually I did change the FacetSample.py according to the API changes I noticed already, but couldn't test it yet because I was unable to build PyLucene4.1. Here's the diff against pylucene-trunk: https://dl.dropbox.com/u/4384120/FacetExample_patch.txt And here's what I did in order to get my local pylucene-trunk environment up-to-date (to4.1) and how I failed: I first did svn up and rebuild JCC and PyLucene but then noticed I still had got PyLucene4.0 : I checked in the latest Makefile for version 4.1 into rev 1445038. Be sure to have LUCENE_SVN=http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x as well. ... snip ... [javac] /Users/koch/Projekte/Python/pylucene/pylucene-trunk/java/org/apache/pylucene/search/PythonFloatParser.java:25: org.apache.pylucene.search.PythonFloatParser is not abstract and does not override abstract method termsEnum(org.apache.lucene.index.Terms) in org.apache.lucene.search.FieldCache.Parser Indeed. I reproduced that error here. A new method was added to the FieldCache.Parser interface. I added it to the classes missing it (rev 1445048). I then found that the test case from hell, TestSort.java, has majorly changed again and test_Sort.py needs to be ported again. Sigh. Andi..
Re: FacetExample.py
On Tue, 12 Feb 2013, Andi Vajda wrote: Indeed. I reproduced that error here. A new method was added to the FieldCache.Parser interface. I added it to the classes missing it (rev 1445048). I then found that the test case from hell, TestSort.java, has majorly changed again and test_Sort.py needs to be ported again. Sigh. That being said, you should be able to build PyLucene 4.1 again and proceed with FacetExample.py. The test_Sort.py needed work shouldn't be blocking you. Andi..
Re: FacetExample.py
Hi Andi, Thanks to your hints I was now able to build PyLucene4.1 and got further with the FacetExample.py - The imports should be OK now and most of the required changes are done I guess. However I now reached another problem: I need to instantiate the class 'FacetsCollector' but get an error when doing so: File samples/FacetExample.py, line 222, in searchWithRequestAndQuery facetsCollector = FacetsCollector(facetSearchParams, indexReader, taxoReader) NotImplementedError: ('instantiating java class', type 'FacetsCollector') The java example has this line: FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxoReader); and javadocs state it has a public constructor: http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/search/FacetsCollector.html#FacetsCollector(org.apache.lucene.facet.search.params.FacetSearchParams,%20org.apache.lucene.index.IndexReader,%20org.apache.lucene.facet.taxonomy.TaxonomyReader) So what could be the reason for this behavior? I have another problem with the constructor of FacetSearchParams: it is expecting arguments: (ListFacetRequest facetRequests, FacetIndexingParams indexingParams) but neither FacetSearchParams(Arrays.asList([facetRequest,]), indexingParams) nor FacetSearchParams([facetRequest,], indexingParams) does it here. I get lucene.InvalidArgsError: (type 'FacetSearchParams', '__init__', (List: [root/a nRes=10 nLbl=10], FacetIndexingParams: org.apache.lucene.facet.params.FacetIndexingParams@f97ad3c0)) I thought that JavaList could help, but I cannot import it: from lucene.collections import JavaList Traceback (most recent call last): File stdin, line 1, in module File /Users/koch/.virtualenvs/pylucene/lib/python2.7/site-packages/lucene-4.1-py2.7-macosx-10.8-x86_64.egg/lucene/collections.py, line 17, in module from org.apache.pylucene.util import \ ImportError: No module named pylucene.util That's probably because I had to disable in Makefile ## JARS+=$(HIGHLIGHTER_JAR)# needs memory contrib ## JARS+=$(EXTENSIONS_JAR) # needs highlighter contrib Do you think that's a type cast issue and that JavaList would help here? I need to define a 'typed' list , e.g. ListFacetRequest FacetSearchParams API docs: http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/search/params/FacetSearchParams.html Current version of FacetExample.py https://dl.dropbox.com/u/4384120/FacetExample.py Any hints? regards, Thomas -- Am 12.02.2013 um 09:19 schrieb Andi Vajda va...@apache.org: On Tue, 12 Feb 2013, Andi Vajda wrote: Indeed. I reproduced that error here. A new method was added to the FieldCache.Parser interface. I added it to the classes missing it (rev 1445048). I then found that the test case from hell, TestSort.java, has majorly changed again and test_Sort.py needs to be ported again. Sigh. That being said, you should be able to build PyLucene 4.1 again and proceed with FacetExample.py. The test_Sort.py needed work shouldn't be blocking you. Andi..
Re: FacetExample.py
Hi Thomas, On Tue, 12 Feb 2013, Thomas Koch wrote: Thanks to your hints I was now able to build PyLucene4.1 and got further with the FacetExample.py - The imports should be OK now and most of the required changes are done I guess. However I now reached another problem: I need to instantiate the class 'FacetsCollector' but get an error when doing so: File samples/FacetExample.py, line 222, in searchWithRequestAndQuery facetsCollector = FacetsCollector(facetSearchParams, indexReader, taxoReader) NotImplementedError: ('instantiating java class', type 'FacetsCollector') The java example has this line: FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader, taxoReader); and javadocs state it has a public constructor: http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/search/FacetsCollector.html#FacetsCollector(org.apache.lucene.facet.search.params.FacetSearchParams,%20org.apache.lucene.index.IndexReader,%20org.apache.lucene.facet.taxonomy.TaxonomyReader) So what could be the reason for this behavior? The FacetCollector class is declared abstract. Thus you can't instantiate it, constructor or not. I think the intent is to instantiate one of its concrete inner subclasses. See lucene-java-4.1/lucene/facet/src/java/org/apache/lucene/facet/search/FacetsCollector.java I have another problem with the constructor of FacetSearchParams: it is expecting arguments: (ListFacetRequest facetRequests, FacetIndexingParams indexingParams) but neither FacetSearchParams(Arrays.asList([facetRequest,]), indexingParams) nor FacetSearchParams([facetRequest,], indexingParams) does it here. I get lucene.InvalidArgsError: (type 'FacetSearchParams', '__init__', (List: [root/a nRes=10 nLbl=10], FacetIndexingParams: org.apache.lucene.facet.params.FacetIndexingParams@f97ad3c0)) There are four constructors on FacetSearchParams, none of which seems to match your call: public FacetSearchParams(FacetRequest... facetRequests) public FacetSearchParams(ListFacetRequest facetRequests) public FacetSearchParams(FacetIndexingParams indexingParams, FacetRequest... facetRequests) public FacetSearchParams(FacetIndexingParams indexingParams, ListFacetRequest facetRequests) See lucene-java-4.1/lucene/facet/src/java/org/apache/lucene/facet/params/FacetSearchParams.java You seem to be passing FacetIndexingParams last. Andi.. I thought that JavaList could help, but I cannot import it: from lucene.collections import JavaList Traceback (most recent call last): File stdin, line 1, in module File /Users/koch/.virtualenvs/pylucene/lib/python2.7/site-packages/lucene-4.1-py2.7-macosx-10.8-x86_64.egg/lucene/collections.py, line 17, in module from org.apache.pylucene.util import \ ImportError: No module named pylucene.util That's probably because I had to disable in Makefile ## JARS+=$(HIGHLIGHTER_JAR)# needs memory contrib ## JARS+=$(EXTENSIONS_JAR) # needs highlighter contrib Do you think that's a type cast issue and that JavaList would help here? I need to define a 'typed' list , e.g. ListFacetRequest FacetSearchParams API docs: http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/search/params/FacetSearchParams.html Current version of FacetExample.py https://dl.dropbox.com/u/4384120/FacetExample.py Any hints? regards, Thomas -- Am 12.02.2013 um 09:19 schrieb Andi Vajda va...@apache.org: On Tue, 12 Feb 2013, Andi Vajda wrote: Indeed. I reproduced that error here. A new method was added to the FieldCache.Parser interface. I added it to the classes missing it (rev 1445048). I then found that the test case from hell, TestSort.java, has majorly changed again and test_Sort.py needs to be ported again. Sigh. That being said, you should be able to build PyLucene 4.1 again and proceed with FacetExample.py. The test_Sort.py needed work shouldn't be blocking you. Andi..
Re: FacetExample.py
Hi Andi, first please excuse I didn't get back to you regarding the tests - I did start with it but then got stuck and distracted from other tasks popping up. No excuse - I just failed to deliver what I promised. ... Would you please port it to the new 4.x API so that it can be included with the PyLucene 4.1 release ? Yes, will do. And yes, the Facets API has changed - mainly due to a complete rewrite of huge parts of the code by Shai Erera (as written in LUCENE-4647). I haven't yet worked with 4.x so I had to check the documentation first... (mainly API docs). The most obvious change is in CategoryDocumentBuilder that has been replaced by the FacetFields class. The method of interest is certainly FacetFields.addFields(Document doc, IterableCategoryPath categories) that should be used instead of CategoryDocumentBuilder.setCategoryPaths(IterableCategoryPath categoryPaths) AND CategoryDocumentBuilder.build(org.apache.lucene.document.Document doc) It should be noted (and reported - to whom?) that the Apache Lucene Faceted Search User's Guide at http://lucene.apache.org/core/4_1_0/facet/org/apache/lucene/facet/doc-files/userguide.html does NOT yet reflect the new API and thus is misleading (or just plain wrong). Luckily the basic concept of a CategoryPath and a CountFacetRequest have remained unchanged AFAIK so changes should not be that big. Actually I did change the FacetSample.py according to the API changes I noticed already, but couldn't test it yet because I was unable to build PyLucene4.1. Here's the diff against pylucene-trunk: https://dl.dropbox.com/u/4384120/FacetExample_patch.txt And here's what I did in order to get my local pylucene-trunk environment up-to-date (to4.1) and how I failed: I first did svn up and rebuild JCC and PyLucene but then noticed I still had got PyLucene4.0 : import lucene lucene.VERSION '4.0' from org.apache.lucene.facet.taxonomy.directory import DirectoryTaxonomyReader from org.apache.lucene.facet.index import FacetFields Traceback (most recent call last): File stdin, line 1, in module ImportError: cannot import name FacetFields import org.apache.lucene.facet.index dir(org.apache.lucene.facet.index) ['CategoryContainer', 'CategoryDocumentBuilder', 'CategoryListPayloadStream', 'OrdinalMappingAtomicReader', '__doc__', '__name__', '__package__', 'attributes', 'categorypolicy', 'params', 'streaming'] exit() So I had to replace the java-source directory lucene-java-40 - I just removed it and changed in Makefile LUCENE_VER=4.1 then did make again, which svn updated the java-lucene sources into lucene-java-4.1: ... Alucene-java-4.1/lucene/queries/src/java/overview.html Alucene-java-4.1/lucene/queries/build.xml Exported revision 1444751. But then failed to build around pylucene-trunk/extensions.xml. It did build /lucene-java-4.1/build/core/lucene-core-4.1.jar and other jars before though. The output is captured below. Is there more that need to be changed in the Makefile for 4.1 or do you have any other local diff against the current pylucene-trunk that's needed? (I am using SVN URL http://svn.apache.org/repos/asf/lucene/pylucene/trunk). Please let me know how to get pylucene-41 up and running locally so I can proceed with the migration of FacetsExample. Thx. Cheers, Thomas -- OUTPUT of make (the part that failed) - steps including common.compile-core did succeed and it created lucene-java-4.1/lucene/build/highlighter/lucene-highlighter-4.1.jar before it failed in pylucene-trunk/extensions.xml: (pylucene)ios:pylucene-trunk koch$ make cd lucene-java-4.1/lucene; (ant ivy-fail || ant ivy-bootstrap) Buildfile: /Users/koch/Projekte/Python/pylucene/pylucene-trunk/lucene-java-4.1/lucene/build.xml ivy-fail: [echo] [echo] This build requires Ivy and Ivy could not be found in your ant classpath. [echo] [echo] (Due to classpath issues and the recursive nature of the Lucene/Solr [echo] build system, a local copy of Ivy can not be used an loaded dynamically [echo] by the build.xml) [echo] [echo] You can either manually install a copy of Ivy 2.3.0 in your ant classpath: [echo]http://ant.apache.org/manual/install.html#optionalTasks [echo] [echo] Or this build file can do it for you by running the Ivy Bootstrap target: [echo]ant ivy-bootstrap [echo] [echo] Either way you will only have to install Ivy one time. [echo] [echo] 'ant ivy-bootstrap' will install a copy of Ivy into your Ant User Library: [echo]/Users/koch/.ant/lib [echo] [echo] If you would prefer, you can have it installed into an alternative [echo] directory using the -Divy_install_path=/some/path/you/choose option, [echo] but you will have to specify this path every time you build Lucene/Solr [echo] in the future...