Re: Problems building JCC
Hi Petrus, On Jul 4, 2011, at 23:15, Petrus Hyvönen petrus.hyvo...@gmail.com wrote: This is likely another faq but, I've moved to a windows 7 machine (64bit) and trying to compile jcc. mingw32 compiler, JDK, JRE installed. I'm getting a libjcc.a - No such file or directory error. javac is available at the command prompt. I don't use this compiler so I don't have an answer. Maybe someone else on this list ? In any case, if you solve the problem, please post the solution so that the next 64-bit mingw user facing this can find it here. Thanks ! Andi., Building with: python setup.py build --compiler=mingw32 Any help highly appriciated. /Petrus writing build\temp.win32-2.6\Release\jcc\sources\jcc.def C:\Program Files (x86)\pythonxy\mingw\bin\g++.exe -mno-cygwin -mdll -static --en try _DllMain@12 -Wl,--out-implib,build\lib.win32-2.6\jcc\jcc.lib --output-lib bu ild\temp.win32-2.6\Release\jcc\sources\libjcc.a --def build\temp.win32-2.6\Relea se\jcc\sources\jcc.def -s build\temp.win32-2.6\Release\jcc\sources\jcc.o build\t emp.win32-2.6\Release\jcc\sources\jccenv.o -LC:\Python26\libs -LC:\Python26\PCbu ild -lpython26 -lmsvcr90 -o build\lib.win32-2.6\jcc.dll -LC:\Program Files (x86 )\Java\jdk1.6.0_26/lib -ljvm -Wl,-S -Wl,--out-implib,jcc\jcc.lib g++: build\temp.win32-2.6\Release\jcc\sources\libjcc.a: No such file or director y error: command 'g++' failed with exit status 1
Re: Problems building JCC
Petrus, g++: build\temp.win32-2.6\Release\jcc\sources\libjcc.a: No such file or director y If you could find a mail tool which doesn't wrap the lines of the log you're trying to send, that would help to debug this. But it looks to me as if something is not creating that directory before trying to write into it. I build regularly on Win XP with mingw. I have a Win7 machine; I'll try to build on it this afternoon, and let you know what I see. I attach my build script; this is a piece of a larger /bin/sh script. I believe I've used it successfully on Win7, too. The missing piece might be my jcc-2.9-mingw-PATCH; I include a copy of it below, as well. But it looks to me as if Andi has already merged it into the sources. Bill echo -- jcc -- export PATH=$PATH:${javahome}/jre/bin/client echo PATH is $PATH cd ../pylucene-3.0.*/jcc # note that this patch still works for 3.0.1/3.0.2 patch -p0 ${patchesdir}/jcc-2.9-mingw-PATCH export JCC_ARGSEP=; export JCC_JDK=$WINSTYLEJAVAHOME export JCC_CFLAGS=-fno-strict-aliasing;-Wno-write-strings export JCC_LFLAGS=-L${WINSTYLEJAVAHOME}\\lib;-ljvm export JCC_INCLUDES=${WINSTYLEJAVAHOME}\\include;${WINSTYLEJAVAHOME}\\include\\win32 export JCC_JAVAC=${WINSTYLEJAVAHOME}\\bin\\javac.exe ${python} setup.py build --compiler=mingw32 install --single-version-externally-managed --root /c/ --prefix=${distdir} if [ -f jcc/jcc.lib ]; then cp -p jcc/jcc.lib ${sitepackages}/jcc/jcc.lib fi # for 3.0.2 compiled with MinGW GCC 4.x and --shared, we also need two # GCC libraries if [ -f /mingw/bin/libstdc++-6.dll ]; then install -m 555 /mingw/bin/libstdc++-6.dll ${distdir}/bin/ echo copied libstdc++-6.dll fi if [ -f /mingw/bin/libgcc_s_dw2-1.dll ]; then install -m 555 /mingw/bin/libgcc_s_dw2-1.dll ${distdir}/bin/ echo copied libgcc_s_dw2-1.dll fi cd .. --- jcc-2.9-mingw-PATCH *** setup.py2009-10-28 15:24:16.0 -0700 --- setup.py2010-03-29 22:08:56.0 -0700 *** *** 262,268 elif platform == 'win32': jcclib = 'jcc%s.lib' %(debug and '_d' or '') kwds[extra_link_args] = \ ! lflags + [/IMPLIB:%s %(os.path.join('jcc', jcclib))] package_data.append(jcclib) else: kwds[extra_link_args] = lflags --- 262,268 elif platform == 'win32': jcclib = 'jcc%s.lib' %(debug and '_d' or '') kwds[extra_link_args] = \ ! lflags + [-Wl,--out-implib,%s %(os.path.join('jcc', jcclib))] package_data.append(jcclib) else: kwds[extra_link_args] = lflags
[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3233: Attachment: LUCENE-3233.patch fixed some bugs, added some tests, but there is a problem, I started to add a little benchmark and I hit this on my largish synonyms file: {noformat} java.lang.IllegalStateException: max arc size is too large (445) {noformat} Just run the TestFSTSynonymFilterFactory and you will see it, i enabled some prints and it doesn't appear like anything totally stupid is going on... giving up for the night :) HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3233: Attachment: synonyms.zip attaching my synonyms.txt test file that i was using: its derived from wordnet HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir
[ https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060331#comment-13060331 ] Varun Thacker commented on LUCENE-2795: --- {quote} Hey Varun - saw you asked for someone with a mac to run some code for you in IRC but you popped off before I saw - what do you need? Just apply the patch and run the tests? {quote} This patch will apply to the LUCENE2793 branch. Othewise in file : lucene/contrib/misc/src/java/org/apache/lucene/store/NativePosixUtil.cpp after line 117 inside the if add this line - {code} fcntl(fd, F_NOCACHE, 1); {code} And then by running {code}ant build-native-unix{code} from the /contrib/misc folder to check if it compiles successfully. Thanks. Genericize DirectIOLinuxDir - UnixDir -- Key: LUCENE-2795 URL: https://issues.apache.org/jira/browse/LUCENE-2795 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2795.patch Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to use it for indexWriter and not IndexReader (searching). It's a trap. But, once we do LUCENE-2793, we can make it fully general purpose because then a single native Dir impl can be used. I'd also like to make it generic to other Unices, if we can, so that it becomes UnixDirectory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9364 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9364/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull Error Message: CFS has no entries Stack Trace: java.lang.IllegalStateException: CFS has no entries at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139) at org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181) at org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031) at org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) Build Log (for compile errors): [...truncated 10589 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060342#comment-13060342 ] Stefan Matheis (steffkes) commented on SOLR-2399: - Mark, changing the Pathes should be really easy, like Ryan said. So, should we use something other then {{request.getContextPath()}}? Maybe combined with a Conditional? bq. but it complains about the javascript variable class in script.js (L969) when i repackage. [that's already changed|http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/webapp/web/js/script.js?r1=1138322r2=1138323;] : Or did you use another Version? Stefan Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060435#comment-13060435 ] Erik Hatcher commented on SOLR-2399: Perhaps we should close this issue and open new ones since this one is getting incredibly long in comments and it's already been committed. But... one issue I have is that the schema/config views don't take advantage of my browsers ability to render XML as a collapsible/expandable tree structure. It's surely nice as it is now for browsers that don't do XML like this... so maybe we leave it like it is but also provide a direct link to the show file request handler for those files like the old-school admin links do. Thoughts? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060441#comment-13060441 ] Stefan Matheis (steffkes) commented on SOLR-2399: - bq. ... so maybe we leave it like it is but also provide a direct link to the show file request handler for those files like the old-school admin links do. Thoughts? Either this, yes -- or we add tabs to the relevant views. The current one as first and an additional one with raw view? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: failonjavadocwarning to false for ant generate-maven-artifacts
Hi Steven, Tks for the explanation. I should have seen that from the ant targets! Diverging from the thread title and probably already discussed... but why pom are not committed and maintained in SVN? (Even if 'mvn install' works fine, I still have issues importing the modules hierarchy in eclipse due to e.g. the src/test-framework module. Having pom in trunk would allow to fix this one for all.) Thx. On 04/07/11 20:07, Steven A Rowe wrote: Hi Eric, 'ant get-maven-poms' will generate the pom.xml files for you. 'ant generate-maven-artifacts' has to generate the javadoc for each module, and javadoc generation fails on warnings. When the javadoc tool fails to download the package list from Oracle, which seems to happen often, the resulting warning fails the build. Steve -Original Message- From: Eric Charles [mailto:eric.char...@u-mangate.com] Sent: Monday, July 04, 2011 5:07 AM To: dev@lucene.apache.org Subject: failonjavadocwarning to false for ant generate-maven-artifacts Hi, In current trunk, I had to set failonjavadocwarning to false to successfully generate the pom (via ant generate-maven-artifacts). (invoking ant javadoc in lucene folder also fails). I was simply looking for the pom.xml generation, but much more was done. I'm not worry about that (just willing to share it). Thx. -- Eric - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Eric - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services
A CoreContainer Plugin interface to create Container level Services --- Key: SOLR-2638 URL: https://issues.apache.org/jira/browse/SOLR-2638 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Noble Paul It can help register services such as Zookeeper . interface {code:java} public abstract class ContainerPlugin { /**Called before initializing any core. * @param container * @param attrs */ public abstract void init(CoreContainer container, MapString,String attrs); /**Callback after all cores are initialized */ public void postInit(){} /**Callback after each core is created, but before registration * @param core */ public void onCoreCreate(SolrCore core){} /**Callback for server shutdown */ public void shutdown(){} } {code} It may be specified in solr.xml as {code:xml} solr plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 zkClientTimeout=8000/ cores adminPath=/admin/cores defaultCoreName=collection1 host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr core name=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} instanceDir=./ /cores /solr {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant
[ https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060469#comment-13060469 ] Luca Stancapiano commented on LUCENE-3167: -- Some OSGI informations are inside the pom.xml . So if we want an automatism to create the OSGI attributes in the MANIFEST.MF, we have to read inside the pom.xml.template files. Theese are the OSGI informations to add in the MANIFEST: Bundle-License: http://www.apache.org/licenses/LICENSE-2.0.txt (project.licenses.license.url in the parent pom.xml.template) Bundle-SymbolicName: org.apache.lucene.misc (project.groupId+project.artifactId in the pom.xml.template) Bundle-Name: Lucene Miscellaneous (project.name attribute in the pom.xml.template) Bundle-Vendor: The Apache Software Foundation (from the parent pom.xml.template) Bundle-Version: 4.0-SNAPSHOT ($version variable from ant) Bundle-Description: Miscellaneous Lucene extensions (project.description from pom.xml.template) Bundle-DocURL: http://www.apache.org/ (project.documentation.url in the parent pom.xml.template) Else we should duplicate the informations. What is the better road? Make lucene/solr a OSGI bundle through Ant -- Key: LUCENE-3167 URL: https://issues.apache.org/jira/browse/LUCENE-3167 Project: Lucene - Java Issue Type: New Feature Environment: bndtools Reporter: Luca Stancapiano We need to make a bundle thriugh Ant, so the binary can be published and no more need the download of the sources. Actually to get a OSGI bundle we need to use maven tools and build the sources. Here the reference for the creation of the OSGI bundle through Maven: https://issues.apache.org/jira/browse/LUCENE-1344 Bndtools could be used inside Ant -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060471#comment-13060471 ] Michael McCandless commented on LUCENE-3233: bq. java.lang.IllegalStateException: max arc size is too large (445) Ahh -- to fix this we have to call Builder.setAllowArrayArcs(false), ie, disable the array arcs in the FST (and this binary search lookup for finding arcs!). I had to do this also for MemoryCodec, since postings encoded as output per arc can be more than 256 bytes, in general. This will hurt perf, ie, the arc lookup cannot use a binary search; it's because of a silly limitation in the FST representation, that we use a single byte to hold the max size of all arcs, so that if any arc is 256 bytes we are unable to encode it as an array. We could fix this (eg, use vInt), however, arcs with such widely varying sizes (due to widely varying outputs on each arc) will be very wasteful in space because all arcs will use up a fixed number of bytes when represented as an array. For now I think we should just call the above method, and then test the resulting perf. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant
[ https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060472#comment-13060472 ] Gunnar Wagenknecht commented on LUCENE-3167: The approach I implemented in my patch (attached to LUCENE-1344) used template files for BND. I wonder if both - Ant as well as Maven could use those files. Make lucene/solr a OSGI bundle through Ant -- Key: LUCENE-3167 URL: https://issues.apache.org/jira/browse/LUCENE-3167 Project: Lucene - Java Issue Type: New Feature Environment: bndtools Reporter: Luca Stancapiano We need to make a bundle thriugh Ant, so the binary can be published and no more need the download of the sources. Actually to get a OSGI bundle we need to use maven tools and build the sources. Here the reference for the creation of the OSGI bundle through Maven: https://issues.apache.org/jira/browse/LUCENE-1344 Bndtools could be used inside Ant -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services
[ https://issues.apache.org/jira/browse/SOLR-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-2638: - Attachment: SOLR-2638.patch First cut A CoreContainer Plugin interface to create Container level Services --- Key: SOLR-2638 URL: https://issues.apache.org/jira/browse/SOLR-2638 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-2638.patch It can help register services such as Zookeeper . interface {code:java} public abstract class ContainerPlugin { /**Called before initializing any core. * @param container * @param attrs */ public abstract void init(CoreContainer container, MapString,String attrs); /**Callback after all cores are initialized */ public void postInit(){} /**Callback after each core is created, but before registration * @param core */ public void onCoreCreate(SolrCore core){} /**Callback for server shutdown */ public void shutdown(){} } {code} It may be specified in solr.xml as {code:xml} solr plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 zkClientTimeout=8000/ cores adminPath=/admin/cores defaultCoreName=collection1 host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr core name=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} instanceDir=./ /cores /solr {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3216) Store DocValues per segment instead of per field
[ https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3216: Attachment: LUCENE-3216.patch here is a new patch that moves the DocValues configuration to setters. I also added a randomizeCodec(Codec) to LuceneTestCase that sets the CFS flag at random. Store DocValues per segment instead of per field Key: LUCENE-3216 URL: https://issues.apache.org/jira/browse/LUCENE-3216 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch currently we are storing docvalues per field which results in at least one file per field that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we should try to by default pack docvalues into a single file if possible. To enable this we need to hold all docvalues in memory during indexing and write them to disk once we flush a segment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant
[ https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060522#comment-13060522 ] Luca Stancapiano commented on LUCENE-3167: -- It depends by the structure of your templates. Can you tell me how is organized the tree of the template files? Make lucene/solr a OSGI bundle through Ant -- Key: LUCENE-3167 URL: https://issues.apache.org/jira/browse/LUCENE-3167 Project: Lucene - Java Issue Type: New Feature Environment: bndtools Reporter: Luca Stancapiano We need to make a bundle thriugh Ant, so the binary can be published and no more need the download of the sources. Actually to get a OSGI bundle we need to use maven tools and build the sources. Here the reference for the creation of the OSGI bundle through Maven: https://issues.apache.org/jira/browse/LUCENE-1344 Bndtools could be used inside Ant -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3279) Allow CFS be empty
Allow CFS be empty -- Key: LUCENE-3279 URL: https://issues.apache.org/jira/browse/LUCENE-3279 Project: Lucene - Java Issue Type: Improvement Components: core/store Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Fix For: 3.4, 4.0 since we changed CFS semantics slightly closing a CFS directory on an error can lead to an exception. Yet, an empty CFS is still a valid CFS so for consistency we should allow CFS to be empty. here is an example: {noformat} 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull Error Message: CFS has no entries Stack Trace: java.lang.IllegalStateException: CFS has no entries at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139) at org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181) at org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031) at org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: failonjavadocwarning to false for ant generate-maven-artifacts
Eric, On 7/6/2011 at 5:35 AM, Eric Charles wrote: Diverging from the thread title and probably already discussed... but why pom are not committed and maintained in SVN? The POMs *are* committed to SVN, under dev-tools/maven/, as pom.xml.template files, which have their version filled in when they are copied over to where they can be used and renamed to pom.xml. The Maven configuration is a non-official build, and maintaining the POMs outside of the main source tree is one way in which this fact is conveyed to users. Even if 'mvn install' works fine, I still have issues importing the modules hierarchy in eclipse due to e.g. the src/test-framework module. Do you know about the eclipse configuration available via 'ant eclipse' ? Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3279) Allow CFS be empty
[ https://issues.apache.org/jira/browse/LUCENE-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3279: --- Assignee: Simon Willnauer Allow CFS be empty -- Key: LUCENE-3279 URL: https://issues.apache.org/jira/browse/LUCENE-3279 Project: Lucene - Java Issue Type: Improvement Components: core/store Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3279.patch since we changed CFS semantics slightly closing a CFS directory on an error can lead to an exception. Yet, an empty CFS is still a valid CFS so for consistency we should allow CFS to be empty. here is an example: {noformat} 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull Error Message: CFS has no entries Stack Trace: java.lang.IllegalStateException: CFS has no entries at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139) at org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181) at org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031) at org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3279) Allow CFS be empty
[ https://issues.apache.org/jira/browse/LUCENE-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3279: Attachment: LUCENE-3279.patch here is a patch Allow CFS be empty -- Key: LUCENE-3279 URL: https://issues.apache.org/jira/browse/LUCENE-3279 Project: Lucene - Java Issue Type: Improvement Components: core/store Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3279.patch since we changed CFS semantics slightly closing a CFS directory on an error can lead to an exception. Yet, an empty CFS is still a valid CFS so for consistency we should allow CFS to be empty. here is an example: {noformat} 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull Error Message: CFS has no entries Stack Trace: java.lang.IllegalStateException: CFS has no entries at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139) at org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181) at org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58) at org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863) at org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710) at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031) at org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: failonjavadocwarning to false for ant generate-maven-artifacts
Hi Steven, On 06/07/11 14:17, Steven A Rowe wrote: Eric, On 7/6/2011 at 5:35 AM, Eric Charles wrote: Diverging from the thread title and probably already discussed... but why pom are not committed and maintained in SVN? The POMs *are* committed to SVN, under dev-tools/maven/, as pom.xml.template files, which have theirversion filled in when they are copied over to where they can be used and renamed to pom.xml. The Maven configuration is a non-official build, and maintaining the POMs outside of the main source tree is one way in which this fact is conveyed to users. Yes, I just saw this. It's just that they are not committed on the standard place and they need to be generated before to be used. Even if 'mvn install' works fine, I still have issues importing the modules hierarchy in eclipse due to e.g. the src/test-framework module. Do you know about the eclipse configuration available via 'ant eclipse' ? I've given a try to 'ant eclipse', and yes, it create one eclipse project with may src folders (the lucene, contrib, modules, solr...). Before coming to mailing list, I looked across the lucene website and wiki for the information. Now I see it's on http://wiki.apache.org/solr/HowToContribute#Development_Environment_Tips and in the SVN README.txt. Sorry for the noise. I'm more used to work with m2eclipse that allows me to directly import modules as different eclipse projects, having snapshots resolved from maven repo. But that's just a developer habit and I'm fine working with the generated eclipse project. Nevertheless, I would have preferred having the final pom committed to be sure it imports fine in eclipse (now, I'm stuck with the src/test-framework module). Steve - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Eric - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060537#comment-13060537 ] Simon Willnauer commented on LUCENE-2793: - bq. Made the necessary changes and hopefully addressed all the nocommits. varun, I still see lots of nocommits here. Would be good if you could address them this week. You don't need to solve them but discuss them here with us. you can do that in a patch and add your comments to the parts where you are not sure how to resolve. I would like to commit the patches this week so we can merge to trunk soonish. Simon Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: revisit naming for grouping/join?
On Tue, Jul 5, 2011 at 5:44 PM, Mike Sokolov soko...@ifactory.com wrote: : Maybe modules/nested? modules/nesteddocs? modules/subdocs modules/nesteddocs modules/nested None of them scream this is the perfect name to me, but none of them scream dear lord this is a terrible idea either. Instinct says All other factors being equal, pick the shortest name : Hmm... sub feels like it undersells, ie emphasizes under or : inferior to and de-emphasizes the strong cooperation w/ the parent. How about modules/superdoc? It wouldn't undersell, at least :) I agree it's no longer under selling :) But I like this even less than sub! First, I think it has the same problems that sub has since it's just symmetric: it's too un-equal, ie implies one side is superior and above the other side, when in fact joining (XML search, product SKUs, nested docs, etc.) are really symmetric. The nested parts of the doc are just as valid a part of the document as the non-nested part. Second, I don't like the super-ness of super (ie, in the sense of supercalifragilisticexpialidocious or superman or superwoman) -- it's too generic, ie, like best or awesome. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060538#comment-13060538 ] Michael McCandless commented on LUCENE-2793: +1 Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: revisit naming for grouping/join?
On 07/06/2011 08:47 AM, Michael McCandless wrote: How about modules/superdoc? It wouldn't undersell, at least :) I agree it's no longer under selling :) But I like this even less than sub! First, I think it has the same problems that sub has since it's just symmetric: it's too un-equal, ie implies one side is superior and above the other side, I basically agree, although I think there is an asymmetry in that this is a many-one relation? The main improvement this name makes is the removal of the plural in the other options (doc vs docs). And it's shorter than huperduperdoc :) But otoh nothing I've seen here really captures all that much about index-time vs query-time join, which seems to be the main distinction (why you can't just call it join)? If you're still in the market for names here are a few: StructureJoin, IntrinsicJoin, TreeJoin; Branch? Just brainstorming loosely. Frankly Nest* seems well enough. -Sokolov - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Two words Terms
You'll get more responses if you ask on the user's list. This list is for the development of the Lucene library, not for user application of the library. On Jul 5, 2011, at 6:42 PM, jcardona7508 wrote: Hi everybody, I have a question, I need a to create documents with two words terms, for example, the content of the document is: I have problems using the operating system windows 7, where terms must be: Ter1: I Ter2: have Term3: problems Term4: using Term5: the Term6: operating system Term7: windows 7 The terms 6 and 7 must be two words, operating system and windows 7 because in the program it has sense together, not operating , system, windows, 7. Can I create terms with 2 words? like Term6: operating system? What can I do? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Two-words-Terms-tp3142833p3142833.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
CloudStateUpdateTest too many close
I just noticed that CloudStateUpdateTest consistently generates the following log message: SEVERE: Too many close [count:-1] on org.apache.solr.core.SolrCore@5dedb45. Please report this exception to solr-u...@lucene.apache.org -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services
[ https://issues.apache.org/jira/browse/SOLR-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060573#comment-13060573 ] Mark Miller commented on SOLR-2638: --- cool - would be nice to abstract some of this out of CoreContainer - could use some slimming. A CoreContainer Plugin interface to create Container level Services --- Key: SOLR-2638 URL: https://issues.apache.org/jira/browse/SOLR-2638 Project: Solr Issue Type: New Feature Components: multicore Reporter: Noble Paul Assignee: Noble Paul Attachments: SOLR-2638.patch It can help register services such as Zookeeper . interface {code:java} public abstract class ContainerPlugin { /**Called before initializing any core. * @param container * @param attrs */ public abstract void init(CoreContainer container, MapString,String attrs); /**Callback after all cores are initialized */ public void postInit(){} /**Callback after each core is created, but before registration * @param core */ public void onCoreCreate(SolrCore core){} /**Callback for server shutdown */ public void shutdown(){} } {code} It may be specified in solr.xml as {code:xml} solr plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 zkClientTimeout=8000/ cores adminPath=/admin/cores defaultCoreName=collection1 host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr core name=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} instanceDir=./ /cores /solr {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060578#comment-13060578 ] Michael McCandless commented on LUCENE-3233: Actually, maybe a better general fix for FST would be for it to dynamically decide whether to make an array based on how many bytes will be wasted (in addition to the number of arcs / depth of the node). This way we could turn on arcs always, and FST would pick the right times to use it. If we stick to only 1 byte for the number of bytes per arc, the FST could simply not use the array when an arc is 256 bytes. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: CloudStateUpdateTest too many close
On Jul 6, 2011, at 9:36 AM, Yonik Seeley wrote: I just noticed that CloudStateUpdateTest consistently generates the following log message: SEVERE: Too many close [count:-1] on org.apache.solr.core.SolrCore@5dedb45. Please report this exception to solr-u...@lucene.apache.org -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org I'll fix the test cleanup. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: CloudStateUpdateTest too many close
I'll fix the test cleanup. Or you will beat me to it - conflicts! - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch I removed all the remaining nocommits as I think all of them had been addressed to. Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs
[ https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3246. Resolution: Fixed Next I'll work on LUCENE-1536... Invert IR.getDelDocs - IR.getLiveDocs -- Key: LUCENE-3246 URL: https://issues.apache.org/jira/browse/LUCENE-3246 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3246-IndexSplitters.patch, LUCENE-3246.patch, LUCENE-3246.patch Spinoff from LUCENE-1536, where we need to fix the low level filtering we do for deleted docs to match Filters (ie, a set bit means the doc is accepted) so that filters can be pushed all the way down to the enums when possible/appropriate. This change also inverts the meaning first arg to TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3246) Invert IR.getDelDocs - IR.getLiveDocs
[ https://issues.apache.org/jira/browse/LUCENE-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060583#comment-13060583 ] Michael McCandless commented on LUCENE-3246: This commit changed the index format (the *.del), but the change is fully back-compat even with trunk indices. Invert IR.getDelDocs - IR.getLiveDocs -- Key: LUCENE-3246 URL: https://issues.apache.org/jira/browse/LUCENE-3246 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-3246-IndexSplitters.patch, LUCENE-3246.patch, LUCENE-3246.patch Spinoff from LUCENE-1536, where we need to fix the low level filtering we do for deleted docs to match Filters (ie, a set bit means the doc is accepted) so that filters can be pushed all the way down to the enums when possible/appropriate. This change also inverts the meaning first arg to TermsEnum.docs/AndPositions (renames from skipDocs to liveDocs). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: CloudStateUpdateTest too many close
On Wed, Jul 6, 2011 at 9:51 AM, Mark Miller markrmil...@gmail.com wrote: I'll fix the test cleanup. Or you will beat me to it - conflicts! Heh, I should have just looked at the test first... it was easier than I thought. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] [jira] [Commented] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases
I've been looking at this with Olle over on the RavenDB mailing list. Just to add that this patch https://issues.apache.org/jira/secure/attachment/12420781/LUCENE-1930.patch solves the issue also. It's from this issue https://issues.apache.org/jira/browse/LUCENE-1930 But it's more complicated than the fix you propose. As far as I can tell it uses a completely different method of projecting locations, but I don't really know much about how it works other than that. On 6 July 2011 14:33, Digy (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060571#comment-13060571] Digy commented on LUCENENET-431: Hi Olle, {code} static double TransformLat(double lat) { var PI = 3.14159265358979323846; return (Math.Atan(Math.Exp((lat * 180 / 20037508.34) / 180 * PI)) / PI * 360 - 90) * 10; } static double TransformLon(double lon) { return (lon * 180 / 20037508.34) * 10; } private double _lat = TransformLat(55.6880508001); private double _lng = TransformLon(13.5871808352); // This passes: 13.6271808352 private void AddData(IndexWriter writer) { AddPoint(writer, Within radius, TransformLat(55.6880508001), TransformLon(13.5717346673)); AddPoint(writer, Within radius, TransformLat(55.6821978456), TransformLon(13.6076183965)); AddPoint(writer, Within radius, TransformLat(55.673251569), TransformLon(13.5946697607)); AddPoint(writer, Close but not in radius, TransformLat(55.8634157297), TransformLon(13.5497731987)); AddPoint(writer, Faar away, TransformLat(40.7137578228), TransformLon(-74.0126901936)); writer.Commit(); writer.Close(); } {code} When I change your code as above, it seems to work(According to above functions yours 4th point should be 11 miles away). If this works for all your cases, we can think of a patch for Spatial.Net. (Don't ask what these two functions do, since I found them somewhere in OpenLayers project :) ) Maybe someone can explain these projection issues(if this really is the case). DIGY Spatial.Net Cartesian won't find docs in radius in certain cases Key: LUCENENET-431 URL: https://issues.apache.org/jira/browse/LUCENENET-431 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4 Environment: Windows 7 x64 Reporter: Olle Jacobsen Labels: spatialsearch To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the following witch should return 3 results. Line 42: private double _lat = 55.6880508001; 43: private double _lng = 13.5871808352; // This passes: 13.6271808352 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673); 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965); 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607); 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987); 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936); 130: const double miles = 5.0; 156: Console.WriteLine(Distances should be 3 + distances.Count); 157: Console.WriteLine(Results should be 3 + results); 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed distances 160: Assert.AreEqual(3, results); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (LUCENE-3280) Add new bit set impl for caching filters
Add new bit set impl for caching filters Key: LUCENE-3280 URL: https://issues.apache.org/jira/browse/LUCENE-3280 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3280.patch I think OpenBitSet is trying to satisfy too many audiences, and it's confusing/error-proned as a result. It has int/long variants of many methods. Some methods require in-bound access, others don't; of those others, some methods auto-grow the bits, some don't. OpenBitSet doesn't always know its numBits. I'd like to factor out a more focused bit set impl whose primary target usage is a cached Lucene Filter, ie a bit set indexed by docID (int, not long) whose size is known and fixed up front (backed by final long[]) and is always accessed in-bounds. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3280) Add new bit set impl for caching filters
[ https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3280: --- Attachment: LUCENE-3280.patch Initial patch w/ some nocommits still but tests pass... Add new bit set impl for caching filters Key: LUCENE-3280 URL: https://issues.apache.org/jira/browse/LUCENE-3280 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3280.patch I think OpenBitSet is trying to satisfy too many audiences, and it's confusing/error-proned as a result. It has int/long variants of many methods. Some methods require in-bound access, others don't; of those others, some methods auto-grow the bits, some don't. OpenBitSet doesn't always know its numBits. I'd like to factor out a more focused bit set impl whose primary target usage is a cached Lucene Filter, ie a bit set indexed by docID (int, not long) whose size is known and fixed up front (backed by final long[]) and is always accessed in-bounds. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060613#comment-13060613 ] Robert Muir commented on LUCENE-3233: - Thanks Mike, I will set the option for now, we can address any potential perf hit in a number of different ways here (besides modifying FST itself). HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated LUCENE-2793: -- Attachment: LUCENE-2793.patch Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3280) Add new bit set impl for caching filters
[ https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060619#comment-13060619 ] Yonik Seeley commented on LUCENE-3280: -- I think FastBitSet should still have {code} /** Expert: returns the long[] storing the bits */ public long[] getBits() { return bits; } {code} The whole reason I had to create OpenBitSet in the first place was that you couldn't do anything custom fast (on a word-for-word basis) because the bits were locked away from you. Add new bit set impl for caching filters Key: LUCENE-3280 URL: https://issues.apache.org/jira/browse/LUCENE-3280 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3280.patch I think OpenBitSet is trying to satisfy too many audiences, and it's confusing/error-proned as a result. It has int/long variants of many methods. Some methods require in-bound access, others don't; of those others, some methods auto-grow the bits, some don't. OpenBitSet doesn't always know its numBits. I'd like to factor out a more focused bit set impl whose primary target usage is a cached Lucene Filter, ie a bit set indexed by docID (int, not long) whose size is known and fixed up front (backed by final long[]) and is always accessed in-bounds. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2793: Attachment: LUCENE-2793.patch I took varuns patch and cleaned a couple of things up. I think this is ready, if nobody objects I will go ahead and commit this to the branch, merge up with trunk and upload a new patch to integrate this into trunk. Once this is on trunk we can follow up with native stuff etc. Thoughts? Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3280) Add new bit set impl for caching filters
[ https://issues.apache.org/jira/browse/LUCENE-3280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060668#comment-13060668 ] Michael McCandless commented on LUCENE-3280: OK I'll add getBits(). Add new bit set impl for caching filters Key: LUCENE-3280 URL: https://issues.apache.org/jira/browse/LUCENE-3280 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 3.4, 4.0 Attachments: LUCENE-3280.patch I think OpenBitSet is trying to satisfy too many audiences, and it's confusing/error-proned as a result. It has int/long variants of many methods. Some methods require in-bound access, others don't; of those others, some methods auto-grow the bits, some don't. OpenBitSet doesn't always know its numBits. I'd like to factor out a more focused bit set impl whose primary target usage is a cached Lucene Filter, ie a bit set indexed by docID (int, not long) whose size is known and fixed up front (backed by final long[]) and is always accessed in-bounds. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext
[ https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-2793: Attachment: LUCENE-2793.patch s/4069/4096 Directory createOutput and openInput should take an IOContext - Key: LUCENE-2793 URL: https://issues.apache.org/jira/browse/LUCENE-2793 Project: Lucene - Java Issue Type: Improvement Components: core/store Reporter: Michael McCandless Assignee: Varun Thacker Labels: gsoc2011, lucene-gsoc-11, mentor Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch Today for merging we pass down a larger readBufferSize than for searching because we get better performance. I think we should generalize this to a class (IOContext), which would hold the buffer size, but then could hold other flags like DIRECT (bypass OS's buffer cache), SEQUENTIAL, etc. Then, we can make the DirectIOLinuxDirectory fully usable because we would only use DIRECT/SEQUENTIAL during merging. This will require fixing how IW pools readers, so that a reader opened for merging is not then used for searching, and vice/versa. Really, it's only all the open file handles that need to be different -- we could in theory share del docs, norms, etc, if that were somehow possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060680#comment-13060680 ] Robert Muir commented on LUCENE-3220: - Hi David: I had some ideas on stats to simplify some of these sims: # I think we can use an easier way to compute average document length: sumTotalTermFreq() / maxDoc(). This way the average is 'exact' and not skewed by index-time-boosts, smallfloat quantization, or anything like that. # To support pivoted unique normalization like lnu.ltc, I think we can solve this in a similar way: add sumDocFreq(), which is just a single long, and divide this by maxDoc. this gives us avg # of unique terms. I think terrier might have a similar stat (#postings or #pointers or something)? so i think this could make for nice simplifications: especially for switching norms completely over to docvalues: we should be able to do #1 immediately right now, change the way we compute avgdoclen for e.g. BM25 and DFR. then in a separate issue i could revert this norm summation stuff to make the docvalues integration simpler, and open a new issue for sumDocFreq() Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060682#comment-13060682 ] Hoss Man commented on SOLR-2399: bq. I also repackaged it... the only thing you really need to change is in index.jsp Hmmm... is that really necessary? SolrDispatchFilter already has the notion of a path-prefix setting that can be specified in the web.xml and defaults to null. it uses that wen proxying to build up the correct urls for things like the per core admin pages and what not anytime it proxies a request to the JSPs. couldn't we just make SolrDispatchFilter add the pathPrefix to the HttpServletRequest as an attribute, and then no one would ever need to modify the index.jsp ... it could just derive all the paths from request.getContextPath() and request.getAttribute(solr-path-prefix). right? Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060686#comment-13060686 ] Robert Muir commented on LUCENE-3233: - I ran some quick numbers, using the syn file example here, just best of 3 runs: ||Impl||Build time||RAM usage|| |SynonymFilterFactory|6619ms|207.92 mb| |FSTSynonymFilterFactory|463 ms|3.51 mb| I modified the builder slightly to build the FST more efficiently for this, will upload the updated patch. So i think the build-time and RAM consumption are really improved, the next thing is to benchmark the runtime perf. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3281) OpenBitSet should report the configured capacity/size
OpenBitSet should report the configured capacity/size - Key: LUCENE-3281 URL: https://issues.apache.org/jira/browse/LUCENE-3281 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.2, 3.1, 3.0.3, 3.0.2, 3.0.1, 3.0 Reporter: Robert Ragno Priority: Minor OpenBitSet rounds up the capacity() to the next multiple of 64 from what was specified. However, this is particularly damaging with the new asserts, which trigger when anything above the specified capacity is used as an index. The simple fix is to return numBits for capacity(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3233: Attachment: LUCENE-3233.patch HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Summary: REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings (was: In Solr 3.2 and trunk the admin/file handler fails to show directory listings) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.4, 4.0 Attachments: SOLR-2535.patch, SOLR-2535_fix_admin_file_handler_for_directory_listings.patch In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060698#comment-13060698 ] David Smiley commented on LUCENE-3233: -- Wow that's striking; nice work guys. FSTs are definitely one of those killer pieces of technology in Lucene. The difference in build time is surprising to me. Any theory why SynonymFilterFactory takes so much more time to build? HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060705#comment-13060705 ] Robert Muir commented on LUCENE-3233: - {quote} The difference in build time is surprising to me. Any theory why SynonymFilterFactory takes so much more time to build? {quote} Yes, its the n^2 portion where you have a synonym entry like this: a, b, c, d in reality this is creating entries like this: a - a a - b a - c a - d b - a b - b ... in the current impl, this is done using some inefficient datastructures (like nested chararraymaps with Token), as well as calling merge(). In the FST impl, we don't use any nested structures (instead input and output entries are just phrases), and we explicitly deduplicate both inputs and outputs during construction, the FST output is just a ListInteger basically pointing to ords in the deduplicated bytesrefhash. so during construction when you add() its just a hashmap lookup on the input phrase, a bytesrefhash get/put on the UTF16toUTF8WithHash to get the output ord, and an append to an arraylist. this code isn't really optimized right now and we can definitely speed it up even more in the future. but the main thing right now is to ensure the filter performance is good. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1768) NumericRange support for new query parser
[ https://issues.apache.org/jira/browse/LUCENE-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060707#comment-13060707 ] Adriano Crestani commented on LUCENE-1768: -- {quote} For numeric fields, half open ranges are important, as it supports queries like price 2.00 Dollar. Wasn't there not also an issue open to support other syntax for numerics like and operators? {quote} Yes, there is, just do not recall the JIRA number now. Maybe Vinicius could try to implement it as well to fill out his task list in case he finishes his tasks before schedule, since it is also related to numeric queries. I am just not sure how much complex the task would be, I know the big change for this is in the syntax parser, which will require to know how to change javacc files. NumericRange support for new query parser - Key: LUCENE-1768 URL: https://issues.apache.org/jira/browse/LUCENE-1768 Project: Lucene - Java Issue Type: New Feature Components: core/queryparser Affects Versions: 2.9 Reporter: Uwe Schindler Assignee: Adriano Crestani Labels: contrib, gsoc, gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: week1.patch, week2.patch, week3.patch, week4.patch, week5-6.patch It would be good to specify some type of schema for the query parser in future, to automatically create NumericRangeQuery for different numeric types? It would then be possible to index a numeric value (double,float,long,int) using NumericField and then the query parser knows, which type of field this is and so it correctly creates a NumericRangeQuery for strings like [1.567..*] or (1.787..19.5]. There is currently no way to extract if a field is numeric from the index, so the user will have to configure the FieldConfig objects in the ConfigHandler. But if this is done, it will not be that difficult to implement the rest. The only difference between the current handling of RangeQuery is then the instantiation of the correct Query type and conversion of the entered numeric values (simple Number.valueOf(...) cast of the user entered numbers). Evenerything else is identical, NumericRangeQuery also supports the MTQ rewrite modes (as it is a MTQ). Another thing is a change in Date semantics. There are some strange flags in the current parser that tells it how to handle dates. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size
[ https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060711#comment-13060711 ] Yonik Seeley commented on LUCENE-3281: -- See LUCENE-3280, it looks like Lucene will be switching to FastBitSet for most things? OpenBitSet is meant to be expert level and not impose any additional overhead (like keeping track of the largest bit that has been set). But yeah, the new asserts do make things a little odd w.r.t. capacity()... how about the following: {code} /** Returns the current capacity in bits (1 greater than the index of the last bit) */ - public long capacity() { return bits.length 6; } + public long capacity() { +long cap = bits.length 6; +assert( (cap = numBits) = 0); +return cap; + } {code} OpenBitSet should report the configured capacity/size - Key: LUCENE-3281 URL: https://issues.apache.org/jira/browse/LUCENE-3281 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2 Reporter: Robert Ragno Priority: Minor Original Estimate: 2m Remaining Estimate: 2m OpenBitSet rounds up the capacity() to the next multiple of 64 from what was specified. However, this is particularly damaging with the new asserts, which trigger when anything above the specified capacity is used as an index. The simple fix is to return numBits for capacity(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Commented] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases
[ https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060709#comment-13060709 ] Digy commented on LUCENENET-431: Thanks Olle and Matt, I committed the LUCENE-1930 patch to the 2.9.4g branch (+ added Olle's test case). (Another divergence from lucene.java; since this patch is still waiting to be applied). DIGY Spatial.Net Cartesian won't find docs in radius in certain cases Key: LUCENENET-431 URL: https://issues.apache.org/jira/browse/LUCENENET-431 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4 Environment: Windows 7 x64 Reporter: Olle Jacobsen Labels: spatialsearch To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the following witch should return 3 results. Line 42: private double _lat = 55.6880508001; 43: private double _lng = 13.5871808352; // This passes: 13.6271808352 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673); 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965); 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607); 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987); 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936); 130: const double miles = 5.0; 156: Console.WriteLine(Distances should be 3 + distances.Count); 157: Console.WriteLine(Results should be 3 + results); 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed distances 160: Assert.AreEqual(3, results); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size
[ https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060714#comment-13060714 ] Uwe Schindler commented on LUCENE-3281: --- Also to return this number, size() is the right method (at least in trunk). OpenBitSet should report the configured capacity/size - Key: LUCENE-3281 URL: https://issues.apache.org/jira/browse/LUCENE-3281 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2 Reporter: Robert Ragno Priority: Minor Original Estimate: 2m Remaining Estimate: 2m OpenBitSet rounds up the capacity() to the next multiple of 64 from what was specified. However, this is particularly damaging with the new asserts, which trigger when anything above the specified capacity is used as an index. The simple fix is to return numBits for capacity(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Resolved] (LUCENENET-431) Spatial.Net Cartesian won't find docs in radius in certain cases
[ https://issues.apache.org/jira/browse/LUCENENET-431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Digy resolved LUCENENET-431. Resolution: Fixed Fix Version/s: Lucene.Net 2.9.4g Assignee: Digy Spatial.Net Cartesian won't find docs in radius in certain cases Key: LUCENENET-431 URL: https://issues.apache.org/jira/browse/LUCENENET-431 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4 Environment: Windows 7 x64 Reporter: Olle Jacobsen Assignee: Digy Labels: spatialsearch Fix For: Lucene.Net 2.9.4g To replicate change Lucene.Net.Contrib.Spatial.Test.TestCartesian to the following witch should return 3 results. Line 42: private double _lat = 55.6880508001; 43: private double _lng = 13.5871808352; // This passes: 13.6271808352 73: AddPoint(writer, Within radius, 55.6880508001, 13.5717346673); 74: AddPoint(writer, Within radius, 55.6821978456, 13.6076183965); 75: AddPoint(writer, Within radius, 55.673251569, 13.5946697607); 76: AddPoint(writer, Close but not in radius, 55.8634157297, 13.5497731987); 77: AddPoint(writer, Faar away, 40.7137578228, -74.0126901936); 130: const double miles = 5.0; 156: Console.WriteLine(Distances should be 3 + distances.Count); 157: Console.WriteLine(Results should be 3 + results); 159: Assert.AreEqual(3, distances.Count); // fixed a store of only needed distances 160: Assert.AreEqual(3, results); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060717#comment-13060717 ] Shawn Heisey commented on SOLR-1972: A little further info on rollingAvgRequestsPerSecond ... I have noticed that it is always different from AvgRequestsPerSecond, even when requests and rollingRequests are the same. I would expect different numbers when requests and rollingRequests diverge, but not when they are the same. I did take a look at the code, but have to admit that I haven't wrapped my brain around it enough to figure out what the problem might be. Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size
[ https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060722#comment-13060722 ] Robert Ragno commented on LUCENE-3281: -- Well, size() and capacity() are currently the same. But all that is needed is actually: /** Returns the current capacity in bits (1 greater than the index of the last bit) */ - public long capacity() { return bits.length 6; } + public long capacity() { return numBits; } That will have the same effect. You throw away the first value for cap in the above, after all. Checking for numButs to be non-negative should be done in the constructor, if added, and maybe with a documented exception instead of an assert. OpenBitSet should report the configured capacity/size - Key: LUCENE-3281 URL: https://issues.apache.org/jira/browse/LUCENE-3281 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2 Reporter: Robert Ragno Priority: Minor Original Estimate: 2m Remaining Estimate: 2m OpenBitSet rounds up the capacity() to the next multiple of 64 from what was specified. However, this is particularly damaging with the new asserts, which trigger when anything above the specified capacity is used as an index. The simple fix is to return numBits for capacity(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3233: Attachment: LUCENE-3233.patch here is a patch with a little microbenchmark... so we have some tuning to do. the benchmark analyzes a short string a million times, that doesn't match any synonyms (actually hte solr default) ||impl||ms|| |SynonymsFilter|1692| |FST with array arcs|2794| |FST with no array arcs|8823| so, disabling the array arcs is a pretty crucial hit here. but we could do other options to speed up this common case, e.g. with daciuk we could build a charrunautomaton of the K-prefixes of the synonyms, this would be really fast to reject these terms that don't match any syns. or we could explicitly put our bytesref output in a byte[], and use long pointers as outputs. or we could speed up FST! But i think its interesting to see how important this parameter is. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size
[ https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060757#comment-13060757 ] Michael McCandless commented on LUCENE-3281: I think the challenge here is numBits is currently not maintained unless assertions are enabled (eg, see expandingWordNum), so we can't just always return numBits from capacity()... Maybe we should just always maintain numBits (ie, even when asserts are off)? Then capacity() could return numBits. OpenBitSet should report the configured capacity/size - Key: LUCENE-3281 URL: https://issues.apache.org/jira/browse/LUCENE-3281 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2 Reporter: Robert Ragno Priority: Minor Original Estimate: 2m Remaining Estimate: 2m OpenBitSet rounds up the capacity() to the next multiple of 64 from what was specified. However, this is particularly damaging with the new asserts, which trigger when anything above the specified capacity is used as an index. The simple fix is to return numBits for capacity(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: revisit naming for grouping/join?
: Also... I think we are over-thinking the name ;) We can't convey : *everything* in this name; as long as the name makes it clear that : you'll want to consider this / read its javadocs whenever doing : something with nested docs, I think that's sufficient. I think : NestedQueryWrapper (maybe NestedDocsQuery) and NestedDocsCollector are : good enough, at least better than the functional-driven names they now : have... Yeah, that's fair ... i'm not in love with NestedDocsQuery and NestedDocsCollector but i agree they are better then what we have now. : Honestly at this point I'm tempted to just stick with what we have : (the functionally driven names, instead of the dominant use case : driven name). : : At its heart, this query is performing a join (well, finishing the : join that was done during indexing), and despite our efforts to more : descriptively capture the dominant use case, I don't think we're : succeeding. We are basically struggling to find ways to explain what : a join does, into these class names. I really think it's a bad idea to use Join in the name ... i understand that to you this is a join, but as you say it's really just finishing a join that was already done at index time -- for most users join is going to have the connotation of a SQL join where you don't have to normalize the data in advance (ie: build the index with all the docs you want ot join in a block) and we shouldn't use it unless we are talking about a truely generic query time join -- particularly if we are going to use examples i nthe doc that seem like the kind of think you would do with a query time join in SQL. i know you feel like nested (or subdocs or parent) undersells the *possible* usecases of this feature, but the thing to remember is that even in the use cases where the real life data isn't something you might think of as being organized in a nested or hierarchical model, in order to use this feature the user must map their source data model to a Lucene Document model that *does* capture a hierarchy relationship so they can index their data in in the appropraite way. X and Y may not be in a hierarchy, but if you want to join them like this, then the Document for X and the Document for Y must be thought of as being in a hierarchy and indexed in lock step with eachother. Block just doesn't feel like it really conveys this ... but anything along the Nested, Parent, Subdoc, line of terminology would at least give some point of refrence to the idea that the *Document* model in Lucene needs to be organized in this way -- and i think it's really important that the name make that clear. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060764#comment-13060764 ] Michael McCandless commented on LUCENE-3233: Wow, it's very important to allow arcs to be encoded as arrays (for the binary search on lookup). I think we should just fix FST... I'll think about it. MemoryCodec would also get big gains here. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: revisit naming for grouping/join?
From my external POV on this debate, it seems as though the main point of contention is naming the nature of the relation between documents. Instead of doing that, a name that says that there is some form of relation, but leaving open its nature, might work: something like docrelation? (Avoiding the related documents IR concept would be important here.) Steve -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, July 06, 2011 2:59 PM To: dev@lucene.apache.org Subject: Re: revisit naming for grouping/join? : Also... I think we are over-thinking the name ;) We can't convey : *everything* in this name; as long as the name makes it clear that : you'll want to consider this / read its javadocs whenever doing : something with nested docs, I think that's sufficient. I think : NestedQueryWrapper (maybe NestedDocsQuery) and NestedDocsCollector are : good enough, at least better than the functional-driven names they now : have... Yeah, that's fair ... i'm not in love with NestedDocsQuery and NestedDocsCollector but i agree they are better then what we have now. : Honestly at this point I'm tempted to just stick with what we have : (the functionally driven names, instead of the dominant use case : driven name). : : At its heart, this query is performing a join (well, finishing the : join that was done during indexing), and despite our efforts to more : descriptively capture the dominant use case, I don't think we're : succeeding. We are basically struggling to find ways to explain what : a join does, into these class names. I really think it's a bad idea to use Join in the name ... i understand that to you this is a join, but as you say it's really just finishing a join that was already done at index time -- for most users join is going to have the connotation of a SQL join where you don't have to normalize the data in advance (ie: build the index with all the docs you want ot join in a block) and we shouldn't use it unless we are talking about a truely generic query time join -- particularly if we are going to use examples i nthe doc that seem like the kind of think you would do with a query time join in SQL. i know you feel like nested (or subdocs or parent) undersells the *possible* usecases of this feature, but the thing to remember is that even in the use cases where the real life data isn't something you might think of as being organized in a nested or hierarchical model, in order to use this feature the user must map their source data model to a Lucene Document model that *does* capture a hierarchy relationship so they can index their data in in the appropraite way. X and Y may not be in a hierarchy, but if you want to join them like this, then the Document for X and the Document for Y must be thought of as being in a hierarchy and indexed in lock step with eachother. Block just doesn't feel like it really conveys this ... but anything along the Nested, Parent, Subdoc, line of terminology would at least give some point of refrence to the idea that the *Document* model in Lucene needs to be organized in this way -- and i think it's really important that the name make that clear. -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060781#comment-13060781 ] Robert Muir commented on LUCENE-3233: - I agree, this would be the best solution. Maybe we should just open a separate issue for that? we can let this one be for now until that is resolved, can even continue working on other parts of it. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3281) OpenBitSet should report the configured capacity/size
[ https://issues.apache.org/jira/browse/LUCENE-3281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060786#comment-13060786 ] Robert Ragno commented on LUCENE-3281: -- Ah, good point. It seems cleaner to maintain it (which is straightforward). The other sensible alternative would be to make the asserts all refer to the up-rounded capacity. However, it seems reasonable and consistent to have an OBS present the capacity it was constructed with. I suppose there is room to split capacity() and size(), but that might confuse existing uses. Incidentally, if it were open to behavioral changes... I would find it more convenient if the asserts were replaced with assuming that the vector was infinite, filled with zeros. This seems more consistent with the set operations, anyway. And the union operation, and so on. (And it is not as if anyone can properly be relying on the current asserts to control flow.) OpenBitSet should report the configured capacity/size - Key: LUCENE-3281 URL: https://issues.apache.org/jira/browse/LUCENE-3281 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1, 3.2 Reporter: Robert Ragno Priority: Minor Original Estimate: 2m Remaining Estimate: 2m OpenBitSet rounds up the capacity() to the next multiple of 64 from what was specified. However, this is particularly damaging with the new asserts, which trigger when anything above the specified capacity is used as an index. The simple fix is to return numBits for capacity(). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reassigned SOLR-2535: Assignee: Erick Erickson REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Assignee: Erick Erickson Fix For: 3.4, 4.0 Attachments: SOLR-2535.patch, SOLR-2535_fix_admin_file_handler_for_directory_listings.patch In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1280) Fields used update processor
[ https://issues.apache.org/jira/browse/SOLR-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-1280: --- Attachment: FieldsUsedUpdateProcessorFactory.java Updated version that allows configuration of fields used field and a field name regex for matching Fields used update processor Key: SOLR-1280 URL: https://issues.apache.org/jira/browse/SOLR-1280 Project: Solr Issue Type: New Feature Components: update Reporter: Erik Hatcher Priority: Trivial Attachments: FieldsUsedUpdateProcessorFactory.java, FieldsUsedUpdateProcessorFactory.java When dealing with highly heterogeneous documents with different fields per document, it can be very useful to know what fields are present on the result documents from a search. For example, this could be used to determine which fields make the best facets for a given query. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1280) Fields used update processor
[ https://issues.apache.org/jira/browse/SOLR-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060826#comment-13060826 ] Erik Hatcher commented on SOLR-1280: In this update the config can be something like this: {code} updateRequestProcessorChain name=fields_used default=true processor class=solr.processor.FieldsUsedUpdateProcessorFactory str name=fieldsUsedFieldNameattribute_fields/str str name=fieldNameRegex.*_attribute/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain {code} Regex was chosen to allow flexibility in matching field names for inclusion, but I think perhaps a better (more easily comprehended/configured) way would be to have a comma-separated list of field names that could contain a * for globbing, which should be about all the flexibility needed for this. Fields used update processor Key: SOLR-1280 URL: https://issues.apache.org/jira/browse/SOLR-1280 Project: Solr Issue Type: New Feature Components: update Reporter: Erik Hatcher Priority: Trivial Attachments: FieldsUsedUpdateProcessorFactory.java, FieldsUsedUpdateProcessorFactory.java When dealing with highly heterogeneous documents with different fields per document, it can be very useful to know what fields are present on the result documents from a search. For example, this could be used to determine which fields make the best facets for a given query. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3179) OpenBitSet.prevSetBit()
[ https://issues.apache.org/jira/browse/LUCENE-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060829#comment-13060829 ] Uwe Schindler commented on LUCENE-3179: --- Committed long versions and additional tests: rev 1143558 (trunk), rev 1143560 (3.x). I did not commit the cutover to Long.numberOfLeadingZeroes, because it was not performance tested. Also from the use-case, on machines without intrinsics, the JDK-given methods are slower (see comments in BitUtils.ntz, as in most cases the bits are shifted away (in nextSetBit), so the faster algorithm is to inverse the algorithm when calculating ntz. OpenBitSet.prevSetBit() --- Key: LUCENE-3179 URL: https://issues.apache.org/jira/browse/LUCENE-3179 Project: Lucene - Java Issue Type: Improvement Reporter: Paul Elschot Assignee: Paul Elschot Priority: Minor Fix For: 3.3, 4.0 Attachments: LUCENE-3179-fix.patch, LUCENE-3179-fix.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179-long-ntz.patch, LUCENE-3179.patch, LUCENE-3179.patch, LUCENE-3179.patch, TestBitUtil.java, TestOpenBitSet.patch Find a previous set bit in an OpenBitSet. Useful for parent testing in nested document query execution LUCENE-2454 . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2308) Separately specify a field's type
[ https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nikola Tankovic updated LUCENE-2308: Attachment: LUCENE-2308-6.patch Minor fixes and more tests cutover. Separately specify a field's type - Key: LUCENE-2308 URL: https://issues.apache.org/jira/browse/LUCENE-2308 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Assignee: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2308-2.patch, LUCENE-2308-3.patch, LUCENE-2308-4.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch, LUCENE-2308-6.patch, LUCENE-2308.patch, LUCENE-2308.patch This came up from dicussions on IRC. I'm summarizing here... Today when you make a Field to add to a document you can set things index or not, stored or not, analyzed or not, details like omitTfAP, omitNorms, index term vectors (separately controlling offsets/positions), etc. I think we should factor these out into a new class (FieldType?). Then you could re-use this FieldType instance across multiple fields. The Field instance would still hold the actual value. We could then do per-field analyzers by adding a setAnalyzer on the FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise for per-field codecs (with flex), where we now have PerFieldCodecWrapper). This would NOT be a schema! It's just refactoring what we already specify today. EG it's not serialized into the index. This has been discussed before, and I know Michael Busch opened a more ambitious (I think?) issue. I think this is a good first baby step. We could consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold off on that for starters... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-3233: --- Attachment: LUCENE-3233.patch New patch, including some optimizing to FST (which we can commit under a separate issue): array arcs can now be any size, and I re-use the BytesReader inner class that's created for parsing arcs. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2637) Solrj support for Field Collapsing / Grouping query results parsing
[ https://issues.apache.org/jira/browse/SOLR-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Cheng updated SOLR-2637: Attachment: SOLR-2637.patch 1. fix 4 space tab to 2 2. added doc comments 3. extract ngroups when group.ngroups=true. Solrj support for Field Collapsing / Grouping query results parsing --- Key: SOLR-2637 URL: https://issues.apache.org/jira/browse/SOLR-2637 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 4.0 Reporter: Tao Cheng Priority: Minor Labels: features Fix For: 4.0 Attachments: SOLR-2637.patch, SOLR-2637.patch Original Estimate: 24h Remaining Estimate: 24h Patch ready for Field Collapsing query results parsing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Getting patches (with tests!) committed
In the past I've had to ping the dev list with an include patch XYZ please message But I've just assigned it to myself, I'll see if I can get it committed, I'm new enough at the process that I need the practice Best Erick On Wed, Jul 6, 2011 at 1:51 PM, Smiley, David W. dsmi...@mitre.org wrote: How do committers recommend that patch contributors (like me) get their patches committed? At the moment I'm thinking of this one: https://issues.apache.org/jira/browse/SOLR-2535 This is a regression bug. I found the bug, I added a patch which fixes the bug and tested that it was fixed. The tests are actually new tests that tested code that wasn't tested before. I put the fix version in JIRA as 3.3 at the time I did this, because it was ready to go. Well 3.3 came and went, and the version got bumped to 3.4. There are no processes in place for committers to recognize completed patches. I think that's a problem. It's very discouraging, as the contributor. I think prior to a release and ideally at other occasions, issues assigned to the next release number should actually be examined. Granted there are ~250 of them on the Solr side: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+SOLR+AND+resolution+%3D+Unresolved+AND+fixVersion+%3D+12316683+ORDER+BY+priority+DESC And some initial triage could separate the wheat from the chaff. ~ David Smiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060905#comment-13060905 ] Robert Muir commented on LUCENE-3233: - bq. New patch, including some optimizing to FST (which we can commit under a separate issue) works! I don't think we need to open a new issue, I didn't think you would come back with a patch in just two hours! I'll play with the patch some now and see what I can do with it. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-1972) Need additional query stats in admin interface - median, 95th and 99th percentile
[ https://issues.apache.org/jira/browse/SOLR-1972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated SOLR-1972: --- Attachment: SOLR-1972.patch Hi Shawn, I fixed the patch so that rolling statistics are now consistent with non-rolling statistics for the first requests. Average requests by second may sometimes be a little different, but ensuring rolling and non-rolling statistics have exactly the same value would require more synchronization, which is not an option in my opinion. Please let me know if you still get negative values for rollingAvgRequestsPerSecond with this patch. I hope this patch is the good one! Need additional query stats in admin interface - median, 95th and 99th percentile - Key: SOLR-1972 URL: https://issues.apache.org/jira/browse/SOLR-1972 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Shawn Heisey Priority: Minor Attachments: SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch, SOLR-1972.patch I would like to see more detailed query statistics from the admin GUI. This is what you can get now: requests : 809 errors : 0 timeouts : 0 totalTime : 70053 avgTimePerRequest : 86.59209 avgRequestsPerSecond : 0.8148785 I'd like to see more data on the time per request - median, 95th percentile, 99th percentile, and any other statistical function that makes sense to include. In my environment, the first bunch of queries after startup tend to take several seconds each. I find that the average value tends to be useless until it has several thousand queries under its belt and the caches are thoroughly warmed. The statistical functions I have mentioned would quickly eliminate the influence of those initial slow queries. The system will have to store individual data about each query. I don't know if this is something Solr does already. It would be nice to have a configurable count of how many of the most recent data points are kept, to control the amount of memory the feature uses. The default value could be something like 1024 or 4096. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3278) Rename contrib/queryparser project to queryparser-contrib
[ https://issues.apache.org/jira/browse/LUCENE-3278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male resolved LUCENE-3278. Resolution: Fixed Fix Version/s: 4.0 Assignee: Chris Male Committed revision 1143615. Rename contrib/queryparser project to queryparser-contrib - Key: LUCENE-3278 URL: https://issues.apache.org/jira/browse/LUCENE-3278 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male Assignee: Chris Male Fix For: 4.0 Attachments: LUCENE-3278.patch Much like with contrib/queries, we should differentiate the contrib/queryparser from the queryparser module. No directory structure changes will be made, just ant and maven. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3282) BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction
[ https://issues.apache.org/jira/browse/LUCENE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shay Banon updated LUCENE-3282: --- Attachment: LUCENE-3282.patch BlockJoinQuery: Allow to add a custom child collector, and customize the parent bitset extraction - Key: LUCENE-3282 URL: https://issues.apache.org/jira/browse/LUCENE-3282 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 3.4, 4.0 Reporter: Shay Banon Attachments: LUCENE-3282.patch It would be nice to allow to add a custom child collector to the BlockJoinQuery to be called on every matching doc (so we can do things with it, like counts and such). Also, allow to extend BlockJoinQuery to have a custom code that converts the filter bitset to an OpenBitSet. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3233: Attachment: LUCENE-3233.patch updated patch, this tableizes the first FST arcs for latin-1. precomputing this tiny table speeds up this filter a ton (~3000ms - ~2000ms) and I think is a cheap easy win for the terms index too. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3283) Move core QueryParsers to queryparser module
Move core QueryParsers to queryparser module Key: LUCENE-3283 URL: https://issues.apache.org/jira/browse/LUCENE-3283 Project: Lucene - Java Issue Type: Sub-task Components: core/queryparser, modules/queryparser Reporter: Chris Male Move the contents of lucene/src/java/org/apache/lucene/queryParser to the queryparser module. To differentiate these parsers from the others, they are going to be placed a 'classic' package. We'll rename QueryParser to ClassicQueryParser as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3284) Move contribs/modules away from QueryParser dependency
Move contribs/modules away from QueryParser dependency -- Key: LUCENE-3284 URL: https://issues.apache.org/jira/browse/LUCENE-3284 Project: Lucene - Java Issue Type: Sub-task Components: core/queryparser, modules/queryparser Reporter: Chris Male Some contribs and modules depend on the core QueryParser just for simplicity in their tests. We should apply the same process as I did to the core tests, and move them away from using the QueryParser where possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3285) Move QueryParsers from contrib/queryparser to queryparser module
Move QueryParsers from contrib/queryparser to queryparser module Key: LUCENE-3285 URL: https://issues.apache.org/jira/browse/LUCENE-3285 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male Each of the QueryParsers will be ported across. Those which use the flexible parsing framework will be placed under the package flexible. The StandardQueryParser will be renamed to FlexibleQueryParser and surround.QueryParser will be renamed to SurroundQueryParser. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3286) Move XML QueryParser to queryparser module
Move XML QueryParser to queryparser module -- Key: LUCENE-3286 URL: https://issues.apache.org/jira/browse/LUCENE-3286 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many addition dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3286) Move XML QueryParser to queryparser module
[ https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3286: --- Description: The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many additional dependencies. was: The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many addition dependencies. Move XML QueryParser to queryparser module -- Key: LUCENE-3286 URL: https://issues.apache.org/jira/browse/LUCENE-3286 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many additional dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™
[ https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061001#comment-13061001 ] Robert Muir commented on LUCENE-3233: - Benchmark with the synonyms.zip attached to this issue (so that we are actually matching synonyms): in this case i only analyzed the text 100,000 times, as its a lot more output. I also checked they are emitting the same stuff: ||Impl||ms|| |SynonymsFilter|112527| |FST|22872| So, thats 5x faster, I think probably due to avoiding the expensive cloning. I think we are fine on performance. HuperDuperSynonymsFilter™ - Key: LUCENE-3233 URL: https://issues.apache.org/jira/browse/LUCENE-3233 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip The current synonymsfilter uses a lot of ram and cpu, especially at build time. I think yesterday I heard about huge synonyms files three times. So, I think we should use an FST-based structure, sharing the inputs and outputs. And we should be more efficient with the tokenStream api, e.g. using save/restoreState instead of cloneAttributes() -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3286) Move XML QueryParser to queryparser module
[ https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061002#comment-13061002 ] Robert Muir commented on LUCENE-3286: - we had an idea to expand the demo module, to cover more than just the basics now (e.g. including examples and such). Maybe we can put it there? Move XML QueryParser to queryparser module -- Key: LUCENE-3286 URL: https://issues.apache.org/jira/browse/LUCENE-3286 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many additional dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3286) Move XML QueryParser to queryparser module
[ https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061005#comment-13061005 ] Chris Male commented on LUCENE-3286: Sounds like a great location for it. Move XML QueryParser to queryparser module -- Key: LUCENE-3286 URL: https://issues.apache.org/jira/browse/LUCENE-3286 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many additional dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3286) Move XML QueryParser to queryparser module
[ https://issues.apache.org/jira/browse/LUCENE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3286: --- Description: The XML QueryParser will be ported across to queryparser module. As part of this work, we'll move the QP's demo into the demo module. was: The XML QueryParser will be ported across to queryparser module. As part of this work I want to reconsider the need for its demo, which has many additional dependencies. Move XML QueryParser to queryparser module -- Key: LUCENE-3286 URL: https://issues.apache.org/jira/browse/LUCENE-3286 Project: Lucene - Java Issue Type: Sub-task Components: modules/queryparser Reporter: Chris Male The XML QueryParser will be ported across to queryparser module. As part of this work, we'll move the QP's demo into the demo module. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2639) default fl in solrconfig.xml does not recognize 'score' as a field
default fl in solrconfig.xml does not recognize 'score' as a field Key: SOLR-2639 URL: https://issues.apache.org/jira/browse/SOLR-2639 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 4.0 Environment: 4.0 trunk Reporter: Tao Cheng Priority: Minor Got the following exception when query SOLR without explicitly specifying fl. In my solrconfig.xml, I set default value of fl to some fields including 'score'. type Status report message undefined field score description The request sent by the client was syntactically incorrect (undefined field score). I didn't have such error in my march trunk build. Note that when I set fl=, all fields except 'score' are returned. In the March trunk build, fl= still makes SOLR returns all the default fields specified in solrconfig.xml. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3284) Move contribs/modules away from QueryParser dependency
[ https://issues.apache.org/jira/browse/LUCENE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3284: --- Attachment: LUCENE-3284.patch Patch which removes the QP dependency from the queries module and from highlighter. The only other module which depends on QP in its test is analysis-common, which combines some complex analysis with some reasonably complex queries. Consequently I will leave this till later when we can examine the QP needs for the analysis module in general. Move contribs/modules away from QueryParser dependency -- Key: LUCENE-3284 URL: https://issues.apache.org/jira/browse/LUCENE-3284 Project: Lucene - Java Issue Type: Sub-task Components: core/queryparser, modules/queryparser Reporter: Chris Male Attachments: LUCENE-3284.patch Some contribs and modules depend on the core QueryParser just for simplicity in their tests. We should apply the same process as I did to the core tests, and move them away from using the QueryParser where possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Putting search-lucene.com back on l.a.o/solr
Hi, I just noticed that over on http://lucene.apache.org/solr/ we are back to Lucid Find being the only search provider. 5 months ago we added search-lucene.com there, but now it's gone. Google Analytics shows that search-lucene.com was removed from there on June 4. This is when Lucene 3.2 was released, so I suspect the site was somehow rebuilt and published without it. Aha, I see, it looks like https://issues.apache.org/jira/browse/LUCENE-2660 was applied to trunk only and not branch_3x, and the site was built from 3x branch. As I'm about to go on vacation, I don't want to mess up the site by reforresting it (did it locally and it looks good, but it's past 1 AM here) and publishing it, so I'll just commit stuff in Solr's src/site after applying the patch from LUCENE-2660: branch_3x/solr/src/site$ svn st ? LUCENE-2660-solr.patch M src/documentation/skins/lucene/css/screen.css M src/documentation/skins/lucene/xslt/html/site-to-xhtml.xsl It would be great if somebody could publish this. Thanks, Otis - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org