RE: Can't find Japanese words ending with numbers
Please check here, you have to do it on your own: http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg - Uwe Schindler Achterdiek 19, D-28357 Bremen https://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Gareth Harper > Sent: Wednesday, April 17, 2019 12:45 PM > To: general@lucene.apache.org > Subject: RE: Can't find Japanese words ending with numbers > > Could someone please take me off this mailing list. > > -Original Message- > From: Antonio Facciorusso > Sent: 17 April 2019 11:05 > To: us...@jackrabbit.apache.org; general@lucene.apache.org > Subject: Can't find Japanese words ending with numbers > > Dear all, > > I'm using Jackrabbit 2.16.1 and Lucene 3.6.2. > > I have a node of type "mynodetype" having a property named "description" > having the following value: "横浜第2センタ". If I perform a full-text search > using "jcr:contains" like: > > jcr:contains(., '*') > > this query returns 0 results: > "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]" > > while all of the following work correctly and return at least one result: > > "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]" > "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]" > "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]" > "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]" > > I tried using both the default analyzer and the Japanese one > (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/j > a/JapaneseAnalyzer.html). > > This is the content of my indexingConfiguration.xml file: > > > "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd";> > http://www.jcp.org/jcr/nt/1.0";> > > > .*:[^_]+ > > .*:resources_data_[^_]+ > > .*:resources_data[^_]+ > .*:resources_(?!data)[^_]+ > > .*:resources[^_]+_[^_]+ > > .*:(?!resources)[^_]+_[^_]+ > > > > Should I use a different configuration/analyzer? Is it a bug? > > Thank you. > > Best regards, > Antonio. > [https://westpole.it/firma/logo.png] > > Antonio Facciorusso > WebRainbow(r) Software Analyst & Developer > > P +39 051 8550 562 > M +39 335 1219330 > E a.faccioru...@westpole.it > W https://westpole.webex.com/meet/A.Facciorusso > A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno > > [https://westpole.it/firma/sito.png]<https://westpole.it> > [https://westpole.it/firma/twitter.png] > <https://twitter.com/WESTPOLE_SPA> > [https://westpole.it/firma/facebook.png] > <https://www.facebook.com/WESTPOLESPA/> > [https://westpole.it/firma/linkedin.png] > <https://www.linkedin.com/company/westpole/> > > > This email for the D.lgs.196/2003 (Privacy Code) and European Regulation > 679/2016/UE (GDPR) may contain confidential and/or privileged information > for the exclusive use of the intended recipient. Any review or distribution by > others is strictly prohibited. If you are not the intended recipient, you must > not use, copy, disclose or take any action based on this message or any > information here. If you have received this email in error, please contact us > (email:priv...@westpole.it) by reply email and delete all copies. Legal > privilege is not waived because you have read this email. Thank you for your > cooperation. > > > [https://westpole.it/firma/ambiente.png] Please consider the environment > before printing this email > > > > > This e-mail has been scanned for all viruses by Claranet. The service is > powered by MessageLabs. For more information on a proactive anti-virus > service working around the clock, around the globe, visit: > http://www.claranet.co.uk > > > > > > This e-mail has been scanned for all viruses by Star Internet. The > service is powered by MessageLabs - For more information on a proactive > anti-virus service working around the clock, around the globe, visit: > http://www.star.net.uk > >
Re: Noticed performance degrade from lucene-7.5.0 to lucene-8.0.0
Without further information we can't help here. So we would need the type of queries (conjunction, disjunction, phrase,...). There are significant changes which may cause some queries to be slower, but others like 50 times faster if the exact number of results are not needed, see https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand Uwe Am April 14, 2019 2:22:59 PM UTC schrieb Khurram Shehzad : >Hi All, > >I have recently updated from lucene-7.5.0 to lucene-8.0.0. But I >noticed considerable performance degrade. Queries that used to be >executed in 18 to 24 milliseconds now taking 74 to 110 milliseconds. > >Any suggestion please? > >Regards, >Khurram -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
RE: [ANNOUNCE] Apache Lucene 8.0.0 released
Yeh! It's finally done. I am a bit sad, that the new query short circutting is not useable from Solr at the moment. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jim ferenczi > Sent: Thursday, March 14, 2019 1:16 PM > To: general@lucene.apache.org; d...@lucene.apache.org; java- > u...@lucene.apache.org > Subject: [ANNOUNCE] Apache Lucene 8.0.0 released > > 14 March 2019, Apache Lucene™ 8.0.0 available > > The Lucene PMC is pleased to announce the release of Apache Lucene 8.0.0. > > Apache Lucene is a high-performance, full-featured text search engine > library written entirely in Java. It is a technology suitable for nearly > any application that requires full-text search, especially cross-platform. > > This release contains numerous bug fixes, optimizations, and improvements, > some of which are highlighted below. The release is available for immediate > download at: > > http://lucene.apache.org/core/mirrors-core-latest-redir.html > > Lucene 8.0.0 Release Highlights: > > Query execution > Term queries, phrase queries and boolean queries introduced new > optimization that enables efficient skipping over non-competitive documents > when the total hit count is not needed. Depending on the exact query and > data distribution, queries might run between a few percents slower and > many > times faster, especially term queries and pure disjunctions. > In order to support this enhancement, some API changes have been made: > * TopDocs.totalHits is no longer a long but an object that gives a lower > bound of the actual hit count. > * IndexSearcher's search and searchAfter methods now only compute total > hit counts accurately up to 1,000 in order to enable this optimization by > default. > * Queries are now required to produce non-negative scores. > > Codecs > * Postings now index score impacts alongside skip data. This is how term > queries optimize collection of top hits when hit counts are not needed. > * Doc values introduced jump tables, so that advancing runs in constant > time. This is especially helpful on sparse fields. > * The terms index FST is now loaded off-heap for non-primary-key fields > using MMapDirectory, reducing heap usage for such fields. > > Custom scoring > The new FeatureField allows efficient integration of static features such > as a pagerank into the score. Furthermore, the new > LongPoint#newDistanceFeatureQuery and > LatLonPoint#newDistanceFeatureQuery > methods allow boosting by recency and geo-distance respectively. These new > helpers are optimized for the case when total hit counts are not needed. > For instance if the pagerank has a significant weight in your scores, then > Lucene might be able to skip over documents that have a low pagerank > value. > > Further details of changes are available in the change log available at: > http://lucene.apache.org/core/8_0_0/changes/Changes.html > > Please report any feedback to the mailing lists ( > http://lucene.apache.org/core/discussion.html) > > Note: The Apache Software Foundation uses an extensive mirroring network > for distributing releases. It is possible that the mirror you are using may > not have replicated the release yet. If that is the case, please try > another mirror. This also applies to Maven access.
Re: Customize Nested Query
Why not TermInSetQuery? Uwe Am December 23, 2018 5:10:02 AM UTC schrieb Khurram Shehzad : >Hi, > >I've a requirement of customized match of an string field against the >list of 0.5M elements. > >FunctionQuery and FunctionMatchQuery look appropriate for my need. It >seems that it only supports Double whereas I need String support. > >Any idea please? > >Regards, >Khurram -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)
CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext) Severity: High Vendor: The Apache Software Foundation Versions Affected: Solr 6.0.0 to 6.6.4 Solr 7.0.0 to 7.3.1 Description: The details of this vulnerability were reported by mail to the Apache security mailing list. This vulnerability relates to an XML external entity expansion (XXE) in Solr config files (currency.xml, enumsConfig.xml referred from schema.xml, TIKA parsecontext config file). In addition, Xinclude functionality provided in these config files is also affected in a similar way. The vulnerability can be used as XXE using file/ftp/http protocols in order to read arbitrary local files from the Solr server or the internal network. The manipulated files can be uploaded as configsets using Solr's API, allowing to exploit that vulnerability. See [1] for more details. Mitigation: Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases both of which address the vulnerability. Once upgrade is complete, no other steps are required. Those releases only allow external entities and Xincludes that refer to local files / zookeeper resources below the Solr instance directory (using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in mind, that external entities and XInclude are explicitly supported to better structure config files in large installations. Before Solr 6 this was no problem, as config files were not accessible through the APIs. If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are advised to make sure that Solr instances are only used locally without access to public internet, so the vulnerability cannot be exploited. In addition, reverse proxies should be guarded to not allow end users to reach the configset APIs. Please refer to [2] on how to correctly secure Solr servers. Solr 5.x and earlier are not affected by this vulnerability; those versions do not allow to upload configsets via the API. Nevertheless, users should upgrade those versions as soon as possible, because there may be other ways to inject config files through file upload functionality of the old web interface. Those versions are no longer maintained, so no deep analysis was done. Credit: Yuyang Xiao, Ishan Chattopadhyaya References: [1] https://issues.apache.org/jira/browse/SOLR-12450 [2] https://wiki.apache.org/solr/SolrSecurity - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
[SECURITY] CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload
CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload Severity: High Vendor: The Apache Software Foundation Versions Affected: Solr 6.0.0 to 6.6.3 Solr 7.0.0 to 7.3.0 Description: The details of this vulnerability were reported internally by one of Apache Solr's committers. This vulnerability relates to an XML external entity expansion (XXE) in Solr config files (solrconfig.xml, schema.xml, managed-schema). In addition, Xinclude functionality provided in these config files is also affected in a similar way. The vulnerability can be used as XXE using file/ftp/http protocols in order to read arbitrary local files from the Solr server or the internal network. See [1] for more details. Mitigation: Users are advised to upgrade to either Solr 6.6.4 or Solr 7.3.1 releases both of which address the vulnerability. Once upgrade is complete, no other steps are required. Those releases only allow external entities and Xincludes that refer to local files / zookeeper resources below the Solr instance directory (using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in mind, that external entities and XInclude are explicitly supported to better structure config files in large installations. Before Solr 6 this was no problem, as config files were not accessible through the APIs. If users are unable to upgrade to Solr 6.6.4 or Solr 7.3.1 then they are advised to make sure that Solr instances are only used locally without access to public internet, so the vulnerability cannot be exploited. In addition, reverse proxies should be guarded to not allow end users to reach the configset APIs. Please refer to [2] on how to correctly secure Solr servers. Solr 5.x and earlier are not affected by this vulnerability; those versions do not allow to upload configsets via the API. Nevertheless, users should upgrade those versions as soon as possible, because there may be other ways to inject config files through file upload functionality of the old web interface. Those versions are no longer maintained, so no deep analysis was done. Credit: Ananthesh, Ishan Chattopadhyaya References: [1] https://issues.apache.org/jira/browse/SOLR-12316 [2] https://wiki.apache.org/solr/SolrSecurity - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
Re: Manipulate stored string in Lucene
Oh it's Solr? Then it's not easy possible. Plain Lucene works like that. Uwe Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler : >Hi, > >You don't need a second field name, but you can once add the indexed >field with stored=false and then add a second instance with same field >name and the original stored content, but not indexed. If you want to >have docvalues, the same can be done for docvalues. Internally, Lucene >does it like that anyways. Adding a field to store and index at same >time is just for convenience. > >Uwe > >Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" >: >>Dear all, >> >>currently I am reading text fields that contain xml text. Hence, the >>solr input may look like this: >> >><sec sec-type="Introduction" >>id="SECID0E4F"> >><title>Introduction</title> >></sec> >> >> >>With all “<” and “>” escaped. >>I wrote a tokenizer that indexes the tag attributes (e.g. >>sec-type=”Introduction”) on the position of the tagged word >>(“Introduction” in this case) and hence I need the HTML tags when >>indexing. However, I want to strip the HTML in the stored string that >>is shown to the user on a query. So far, I figured out that the index >>and the stored string a separated. Thus, I thought it should be >>possible to manipulate the stored string either after indexing. >> >>Is there a way to do so? I would prefer to manipulate the stored >string >>and not introduce a second field with the plain text in the input >file. >> >>I am glad for any help! >> >>Best Regards, >> >>Adrian >> >>--- >>Adrian Pachzelt >>- Fachinformationsdienst Biodiversitaetsforschung - >>- Hosting von Open Access-Zeitschriften - >>Universitaetsbibliothek Johann Christian Senckenberg >>Bockenheimer Landstr. 134-138 >>60325 Frankfurt am Main >>Tel. 069/798-39382 >>a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de> >>--- > >-- >Uwe Schindler >Achterdiek 19, 28357 Bremen >https://www.thetaphi.de -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
Re: Manipulate stored string in Lucene
Hi, You don't need a second field name, but you can once add the indexed field with stored=false and then add a second instance with same field name and the original stored content, but not indexed. If you want to have docvalues, the same can be done for docvalues. Internally, Lucene does it like that anyways. Adding a field to store and index at same time is just for convenience. Uwe Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" : >Dear all, > >currently I am reading text fields that contain xml text. Hence, the >solr input may look like this: > ><sec sec-type="Introduction" >id="SECID0E4F"> ><title>Introduction</title> ></sec> > > >With all “<” and “>” escaped. >I wrote a tokenizer that indexes the tag attributes (e.g. >sec-type=”Introduction”) on the position of the tagged word >(“Introduction” in this case) and hence I need the HTML tags when >indexing. However, I want to strip the HTML in the stored string that >is shown to the user on a query. So far, I figured out that the index >and the stored string a separated. Thus, I thought it should be >possible to manipulate the stored string either after indexing. > >Is there a way to do so? I would prefer to manipulate the stored string >and not introduce a second field with the plain text in the input file. > >I am glad for any help! > >Best Regards, > >Adrian > >--- >Adrian Pachzelt >- Fachinformationsdienst Biodiversitaetsforschung - >- Hosting von Open Access-Zeitschriften - >Universitaetsbibliothek Johann Christian Senckenberg >Bockenheimer Landstr. 134-138 >60325 Frankfurt am Main >Tel. 069/798-39382 >a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de> >--- -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
[SECURITY] CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter
CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter Severity: Major Vendor: The Apache Software Foundation Versions Affected: Solr 1.2 to 6.6.2 Solr 7.0.0 to 7.2.1 Description: The details of this vulnerability were reported to the Apache Security mailing list. This vulnerability relates to an XML external entity expansion (XXE) in the `&dataConfig=` parameter of Solr's DataImportHandler. It can be used as XXE using file/ftp/http protocols in order to read arbitrary local files from the Solr server or the internal network. See [1] for more details. Mitigation: Users are advised to upgrade to either Solr 6.6.3 or Solr 7.3.0 releases both of which address the vulnerability. Once upgrade is complete, no other steps are required. Those releases disable external entities in anonymous XML files passed through this request parameter. If users are unable to upgrade to Solr 6.6.3 or Solr 7.3.0 then they are advised to disable data import handler in their solrconfig.xml file and restart their Solr instances. Alternatively, if Solr instances are only used locally without access to public internet, the vulnerability cannot be used directly, so it may not be required to update, and instead reverse proxies or Solr client applications should be guarded to not allow end users to inject `dataConfig` request parameters. Please refer to [2] on how to correctly secure Solr servers. Credit: 麦 香浓郁 References: [1] https://issues.apache.org/jira/browse/SOLR-11971 [2] https://wiki.apache.org/solr/SolrSecurity - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
Re: Register custom TokenizerFactory with Maven
Hi, You have to add a META-INF/services file into the JAR file with all factories contained in the JAR file listed. More info in the documentation of java.util.ServiceLoader: https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html The name to use for lookup is generated from the simple class name, basically it lowercases and removes "(Token)FilterFactory" from it. Uwe Am March 22, 2018 12:25:41 PM UTC schrieb "Pachzelt, Adrian" : >Hi everybody, > >I am currently writing a custom TokenizerFactory for Lucene. However, >as far as I understood, Tokenizer are called by their name like this >for the StandardTokenizer: >tokenizerFactory("Standard").create(newAttributeFactory()); > >Accordingly, my code looks like this: >tokenizerFactory("TaggedText").create(newAttributeFactory()); > >I apply Maven for compiling my code. Where do I need to register my >Factory class, since, as expected, I get the compiling error: > >“ A SPI class of type org.apache.lucene.analysis.util.TokenizerFactory >with name ‘TaggedText’ does not exist. You need to add the >corresponding JAR file supporting this SPI to your classpath.” > >How can I do this? > >I am glad for any advice! > >Thanks a lot! > >Cheers, > >Adrian -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
FOSS Backstage Micro Summit on Monday in Berlin
Hi, It's already a bit late, but all people who are visiting Germany next week and want to do a short trip to Berlin: There are still slots free on the FOSS Backstage Micro Summit. It is a mini conference conference on everything related to governance, collaboration, legal and economics within the scope of FOSS. The main event will take place as part of berlinbuzzwords 2018. We have a lot of speakers invited - also from ASF! https://www.foss-backstage.de/ Program: https://www.foss-backstage.de/news/micro-summit-program-online-now I hope to see you there, Uwe ----- Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
RE: RE: RE: Would docvalues be loaded into jvm?
Yes. - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: wangqinghuan [mailto:1095193...@qq.com] > Sent: Thursday, June 15, 2017 12:21 PM > To: general@lucene.apache.org > Subject: Re: RE: RE: Would docvalues be loaded into jvm? > > Does "hotspot" reffers to java virtual machine? > > > > ---Original--- > From: "Uwe Schindler [via > Lucene]" > Date: 2017/6/15 17:03:46 > To: "wangqinghuan"<1095193...@qq.com>; > Subject: RE: RE: Would docvalues be loaded into jvm? > > > Hi, > > There is no design document about that. Lucene uses MMAP for all index > files since a long time ago. DocValues is just another implementation. > Basically it uses IndexInput's methods to access the underlying data, which is > memory mapped if you are on 64 bit platforms. For DocValues there are also > positional reads available. There is not much stuff specifically for > docvalues, > it is just a file format that supports column based access with positional > reads. The mmap implementation is separated from this and a bit lower in > the I/O layer of Lucene. Sorting is just a use case of DocValues, but it does > not sort directly on the mmapped files, there are several abstractions > inbetween (which are of course removed by the Hotspot optimizer). > > Some information (a bit older, but still valid) is here: > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: [hidden email] > > > -Original Message- > > From: wangqinghuan [mailto:[hidden email]] > > Sent: Thursday, June 15, 2017 10:41 AM > > To: [hidden email] > > Subject: Re: RE: Would docvalues be loaded into jvm? > > > > hi > > Is there any design document on this aspect (sorting algorithm off mmap)? > > > > > > > > ---Original--- > > From: "Uwe Schindler [via > > Lucene]"<[hidden email]> > > Date: 2017/6/15 14:39:30 > > To: "wangqinghuan"<[hidden email]>; > > Subject: RE: Would docvalues be loaded into jvm? > > > > > > Hi > > > > It works directly off the mmapped files. It is not fully loaded into heap, > only > > some small control structures are allocated on heap. During sorting the > > TopDocsCollector uses the memory mapped structures to uncompress and > > lookup the sort values. > > > > Uwe > > > > - > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > http://www.thetaphi.de > > eMail: [hidden email] > > > > > -Original Message- > > > From: wangqinghuan [mailto:[hidden email]] > > > Sent: Thursday, June 15, 2017 4:36 AM > > > To: [hidden email] > > > Subject: Would docvalues be loaded into jvm? > > > > > > hi > > > I know that data is written into disk with the style of column-store if I > > > enable doc-values for certain field. > > > But I don't understand why sorting with docvalues doesn't increase the > > load > > > of jvm. whatever sorting algorithm , data would be loaded into jvm to > sort. > > > This should be a high load for jvm when I sort all index , but no change > > > for jvm in fact. How does lucene sort with docvalues ? Can sort algorithm > > > work directly based on the file (Mmap) ? > > > > > > > > > > > > -- > > > View this message in context: > http://lucene.472066.n3.nabble.com/Would- > > > docvalues-be-loaded-into-jvm-tp4340644.html > > > Sent from the Lucene - General mailing list archive at Nabble.com. > > > > > > > > > > If you reply to this email, your message will be added to the discussion > > below: > > http://lucene.472066.n3.nabble.com/Would-docvalues-be-loaded-into- > jvm- > > tp4340644p4340659.html > > To unsubscribe from Would docvalues be loaded into jvm?, click here. > > NAML > > > > > > > > -- > > View this message in context: http://lucene.472066.n3.nabble.com/Would- > > docvalues-be-loaded-into-jvm-tp4340644p4340667.html > > Sent from the Lucene - General mailing list archive at Nabble.com. > > > > > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/Would-docvalues-be-loaded-into-jvm- > tp4340644p4340678.html > To unsubscribe from Would docvalues be loaded into jvm?, click here. > NAML > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Would- > docvalues-be-loaded-into-jvm-tp4340644p4340689.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: RE: Would docvalues be loaded into jvm?
Hi, There is no design document about that. Lucene uses MMAP for all index files since a long time ago. DocValues is just another implementation. Basically it uses IndexInput's methods to access the underlying data, which is memory mapped if you are on 64 bit platforms. For DocValues there are also positional reads available. There is not much stuff specifically for docvalues, it is just a file format that supports column based access with positional reads. The mmap implementation is separated from this and a bit lower in the I/O layer of Lucene. Sorting is just a use case of DocValues, but it does not sort directly on the mmapped files, there are several abstractions inbetween (which are of course removed by the Hotspot optimizer). Some information (a bit older, but still valid) is here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: wangqinghuan [mailto:1095193...@qq.com] > Sent: Thursday, June 15, 2017 10:41 AM > To: general@lucene.apache.org > Subject: Re: RE: Would docvalues be loaded into jvm? > > hi > Is there any design document on this aspect (sorting algorithm off mmap)? > > > > ---Original--- > From: "Uwe Schindler [via > Lucene]" > Date: 2017/6/15 14:39:30 > To: "wangqinghuan"<1095193...@qq.com>; > Subject: RE: Would docvalues be loaded into jvm? > > > Hi > > It works directly off the mmapped files. It is not fully loaded into heap, > only > some small control structures are allocated on heap. During sorting the > TopDocsCollector uses the memory mapped structures to uncompress and > lookup the sort values. > > Uwe > > - > Uwe Schindler > Achterdiek 19, D-28357 Bremen > http://www.thetaphi.de > eMail: [hidden email] > > > -Original Message- > > From: wangqinghuan [mailto:[hidden email]] > > Sent: Thursday, June 15, 2017 4:36 AM > > To: [hidden email] > > Subject: Would docvalues be loaded into jvm? > > > > hi > > I know that data is written into disk with the style of column-store if I > > enable doc-values for certain field. > > But I don't understand why sorting with docvalues doesn't increase the > load > > of jvm. whatever sorting algorithm , data would be loaded into jvm to sort. > > This should be a high load for jvm when I sort all index , but no change > > for jvm in fact. How does lucene sort with docvalues ? Can sort algorithm > > work directly based on the file (Mmap) ? > > > > > > > > -- > > View this message in context: http://lucene.472066.n3.nabble.com/Would- > > docvalues-be-loaded-into-jvm-tp4340644.html > > Sent from the Lucene - General mailing list archive at Nabble.com. > > > > > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/Would-docvalues-be-loaded-into-jvm- > tp4340644p4340659.html > To unsubscribe from Would docvalues be loaded into jvm?, click here. > NAML > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Would- > docvalues-be-loaded-into-jvm-tp4340644p4340667.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Would docvalues be loaded into jvm?
Hi It works directly off the mmapped files. It is not fully loaded into heap, only some small control structures are allocated on heap. During sorting the TopDocsCollector uses the memory mapped structures to uncompress and lookup the sort values. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: wangqinghuan [mailto:1095193...@qq.com] > Sent: Thursday, June 15, 2017 4:36 AM > To: general@lucene.apache.org > Subject: Would docvalues be loaded into jvm? > > hi > I know that data is written into disk with the style of column-store if I > enable doc-values for certain field. > But I don't understand why sorting with docvalues doesn't increase the load > of jvm. whatever sorting algorithm , data would be loaded into jvm to sort. > This should be a high load for jvm when I sort all index , but no change > for jvm in fact. How does lucene sort with docvalues ? Can sort algorithm > work directly based on the file (Mmap) ? > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Would- > docvalues-be-loaded-into-jvm-tp4340644.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Developing experimental "more advanced" analyzers
Hi, as you are using Elasticsearch, there is no need to implement an Analyzer instance. In general, this is never needed in Lucene, too, as there is the class CustomAnalyzer that uses a builder pattern to construct an analyzer like Elasticsearch or Solr are doing. For your use-case you need to implement a custom Tokenizer and/or several TokenFilters. In addition you need to create the corresponding factory classes and bundle everything as an Elasticsearch plugin. I'd suggest to ask on the Elasticsearch mailing lists about this. After that you can define your analyzer in the Elasticsearch mapping/index config. The Tokenizer and TokenFilters can be implemented, e.g. like Robert Muir was telling you. The sentence stuff can be done as a segmenting tokenizer subclass. Keep in mind, that many tasks can be done with already existing TokenFilters and/or Tokenizers. Lucene has no index support for POS tags, they are only used in the analysis chain. To somehow add them to the index, you may use TokenFilters as last stage that adds the POS tags to the term (e.g., term "Windmill", pos "subject" could be combined in the last TokenFilter to a term called "Windmill#subject" and indexed like that). For keeping track of POS tags during the analysis (between the tokenfilters and tokenizers), you may need to define custom attributes. Check the UIMA analysis module for more information how to do this. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Christian Becker [mailto:christian.frei...@gmail.com] > Sent: Monday, May 29, 2017 2:37 PM > To: general@lucene.apache.org > Subject: Developing experimental "more advanced" analyzers > > Hi There, > > I'm new to lucene (in fact im interested in ElasticSearch but in this case > its related to lucene) and I want to make some experiments with some > enhanced analyzers. > > Indeed I have an external linguistic component which I want to connect to > Lucene / EleasticSearch. So before I'm producing a bunch of useless code, I > want to make sure that I'm going the right way. > > The linguistic component needs at least a whole sentence as Input (at best > it would be the whole text at once). > > So as far as I can see I would need to create a custom Analyzer and > overrride "createComponents" and "normalize". > > Is that correct or am I on the wrong track? > > Bests > Chris
RE: Java version set to 1.8 for SOLR 6.4.0
Hi, SOLR_JAVA_HOME must point to the directory of the JDK, not the "java" command: SOLR_JAVA_HOME = "/opt/wml/jdk1.8.0_66" Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uchit Patel [mailto:uchitspa...@yahoo.com.INVALID] > Sent: Monday, February 13, 2017 11:14 AM > To: general@lucene.apache.org; solr-u...@lucene.apache.org; > jan@cominvent.com > Subject: Re: Java version set to 1.8 for SOLR 6.4.0 > > > I have updated SOLR_JAVA_HOME in following file. > /opt/wml/solr-6.4.0/bin/solr.in.sh > SOLR_JAVA_HOME = "/opt/wml/jdk1.8.0_66/jre/bin/java" > But it is not working. > Regards, > Uchit Patel > I have installed SOLR 6.4.0 on Linux box. I have Java 1.7.0 and 1.8.0 both on > the box. By default it point to 1.7.0. Some other applications using 1.7.0 > Java. > I want to set Java 1.8.0 only for SOLR 6.4.0. What should I need to update for > only SOLR 6.4.0 to hit Java 1.8.0. I don't want to remove Java 1.7.0 because > some other applications using Java 1.7.0. > Thanks. > Regards, > Uchit Patel > > From: Uchit Patel > To: "general@lucene.apache.org" ; "solr- > u...@lucene.apache.org" ; > "jan@cominvent.com" > Sent: Monday, February 13, 2017 3:38 PM > Subject: Re: Java version set to 1.8 for SOLR 6.4.0 > > Hi , > I tried SOLR_JAVA_HOME = "/opt/wml/jdk1.8.0_66/jre/bin/java" but it is not > working. > Regards, > Uchit Patel > I have installed SOLR 6.4.0 on Linux box. I have Java 1.7.0 and 1.8.0 both on > the box. By default it point to 1.7.0. Some other applications using 1.7.0 > Java. > I want to set Java 1.8.0 only for SOLR 6.4.0. What should I need to update for > only SOLR 6.4.0 to hit Java 1.8.0. I don't want to remove Java 1.7.0 because > some other applications using Java 1.7.0. > Thanks. > Regards, > Uchit Patel > > > > > > > > > >
RE: LongField when searched using classic QueryParser doesnot yield results
Hi, this is indeed related to this. The problem is a missing "schema" in Lucene. If you index values using several different field types (like TextField vs. IntField/Float/Double...) this information how they were indexed is completely unknown to the query parser. The default query parser is using legacy code to create numeric ranges or numeric terms: It is just treating them as text! If it searches on a numeric field using text terms, it won't find anything. Solr and Elasticsearch are maintaining a schema of the index. So they subclass the query parser and override getRangeQuery and getFieldQuery protected methods and using their schema to create the correct query types depending on the schema. The default is to create TermQuery and TermRangeQuery, which won't work on numeric fields. To fix this in your code you have to do something similar. YOU are the person who knows what the type of Field XY is. If XY is a numeric field, the query parser must check the field name and then build the correct query (NumericRangeQuery). Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Amrit Sarkar [mailto:sarkaramr...@gmail.com] > Sent: Wednesday, January 11, 2017 9:52 AM > To: general@lucene.apache.org > Cc: java-u...@lucene.apache.org > Subject: Re: LongField when searched using classic QueryParser doesnot yield > results > > Hi Jaspreet, > > Not sure whether this helps to answer your question as I didn't try to run > the code: > > From official guide: > > > Within Lucene, each numeric value is indexed as a *trie* structure, where > > each term is logically assigned to larger and larger pre-defined brackets > > (which are simply lower-precision representations of the value). The step > > size between each successive bracket is called the precisionStep, > > measured in bits. Smaller precisionStep values result in larger number of > > brackets, which consumes more disk space in the index but may result in > > faster range search performance. The default value, 4, was selected for a > > reasonable tradeoff of disk space consumption versus performance > > > > If you only need to sort by numeric value, and never run range > > querying/filtering, you can index using a precisionStep of > > Integer.MAX_VALUE > > > <http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html?is- > external=true#MAX_VALUE>. > > This will minimize disk space consumed.
RE: Lucene filter
Hi, You could use 2 query parsers, e.g., one for the user input and another one for the filters. Finally combine the 2 results into one query by combining them with an outer BooleanQuery. Having everything in one single string is quite uncommon for typical search application logic. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Reda Kouba [mailto:redateksys...@gmail.com] > Sent: Friday, December 2, 2016 8:33 AM > To: general@lucene.apache.org > Subject: Re: Lucene filter > > Hi Mikhail, > > Do you have any suggestion to transform a string to a query object? > thanks, > reda > > > > On 2 Dec. 2016, at 18:26, Mikhail Khludnev wrote: > > > > Hello, > > > > I don't think # is supported in query parsers, although it would be great. > > So, far I saw them in only in toString(). > > > > On Fri, Dec 2, 2016 at 9:30 AM, Bouadjenek mohamed reda < > > redateksys...@gmail.com> wrote: > > > >> Hi All, > >> > >> > >> I wanna use a filter into a query (BooleanClause.Occur.FILTER). For > >> example, my query is: > >> > >> #repository:clinicaltrials +title:multipl > >> > >> It looks like when I build the query from this String, the filter is not > >> working. In other words, the total hits in the first example below is > >> different from total hits in the second example below. Please, does > anyone > >> know what wrong with this simple example? > >> > >> Example 1: > >> String query = "#repository:clinicaltrials +title:multipl"; > >> QueryParser qr = new QueryParser("", new StandardAnalyzer()); > >> TopDocs hits = is.search(qr.parse(query), 1); > >> > >> Example 2: > >> String[] fields = new String[]{"repository", "title"}; > >> BooleanClause.Occur[] allflags = new > >> BooleanClause.Occur[]{BooleanClause.Occur.FILTER, > >> BooleanClause.Occur.MUST}; > >> String[] query_text = new String[]{"clinicaltrials", "multipl"}; > >> Query finalQuery = MultiFieldQueryParser.parse(query_text, fields, > >> allflags, new StandardAnalyzer()); > >> TopDocs hits = is.search(finalQuery, 1); > >> > >> > >> thanks, > >> > >> > >> Best, > >> reda > >> > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev
RE: Doing Range/NUmber queries
Hi, I don't understand your question. Filter queries no longer exit since Lucene 6! If you want to filter, use any query and pass it as BooleanClause.Occur.FILTER to aBooleanQuery. Done. Those FILTER clauses may (depending on the query type) perform better than MUST clauses, because no score is calculated. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: lukes [mailto:mail2lok...@gmail.com] > Sent: Saturday, August 13, 2016 11:04 PM > To: general@lucene.apache.org > Subject: RE: Doing Range/NUmber queries > > Thanks Uwe, > > Quick follow up questions, would Filter query perform any better ? I hope > performance would still be same, but just out of curosity. > > Regards. > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Doing- > Range-NUmber-queries-tp4290722p4291666.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Doing Range/NUmber queries
Hi, Since Lucene 6, Filters as a separate class are gone (deprecated in Lucene 5). Every query can now be used as a filter. There are 2 possibilities: - To apply as a filter next to other scoring queries, use a BooleanQuery with filter clauses (BooleanClause.Occur.FILTER) next to the standard scoring clauses (MUST or SHOULD). The FILTER clauses are standard queries. - To execute a single query without scoring (constant score of 1), use ConstantScoreQuery Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: lukes [mailto:mail2lok...@gmail.com] > Sent: Wednesday, August 10, 2016 1:44 AM > To: general@lucene.apache.org > Subject: Re: Doing Range/NUmber queries > > Hi Michael, > > Do you know, if Filtered Queries are supported in Lucene ? I tried to > find, but couldn't find anything relevant. So i have some queries which i > want to apply as filter, don't want to contribute in the scoring.Would > filter queries speed up the query process ? Can i combine filter and queries > together ? Also during indexing do i need to mention anything special for > fields that can be used in filters ? > > Regards. > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Doing- > Range-NUmber-queries-tp4290722p4291058.html > Sent from the Lucene - General mailing list archive at Nabble.com.
Apache Solr and Tika used to index Panama Papers
Hi all, I just wanted to repost the following by Chris Mattman on the TIKA list: If you have been following the news you’ve seen the Panama papers and how the world’s rich and elite have been storing all their money offshore to hide it. Two of the ASF’s key technologies were used in uncovering that story and showing the world what was going on: Apache Tika and Apache Solr. Solr was used for making the Terabytes of Panama Papers available to journalists. The preprocessing of the documents for indexing was done with Tika (maybe through the contrib/extraction module). Here is the article by Forbes about that: http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak Uwe - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
RE: Solr- Fuzzy Search
Hi, see here: https://cwiki.apache.org/confluence/display/solr/Highlighting Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: akshaymall [mailto:akshay.m...@eclerx.com] > Sent: Monday, March 07, 2016 2:16 PM > To: general@lucene.apache.org > Subject: RE: Solr- Fuzzy Search > > Hi Uwe, > > Thanks for the reply. > > Could you please help us with the code also? > > We are using C# as the base language and the solr query we have built is as > follows: > >var facetResults = solr.Query(new SolrQuery("text: agree~2"), new > QueryOptions > { > Rows = 10, > Facet = new FacetParameters > { > Queries = new[]{ > new SolrFacetFieldQuery("extension"){ > Prefix = "", > Limit = 5 > } > } > } > }); > > Regards, > > Akshay Mall > > Senior Analyst, > Financial Services Product Development, > eClerx Services Limited > > Phone: +91 9173435462 | +91 9167368827 > eClerx Services Limited [www.eClerx.com] > [cid:image001.png@01CFD376.89BEAC90] > > From: Uwe Schindler [via Lucene] [mailto:ml- > node+s472066n4262093...@n3.nabble.com] > Sent: Monday, March 07, 2016 6:36 PM > To: Akshay Mall > Subject: RE: Solr- Fuzzy Search > > Hi, > > you can use the highlighter functionality. It will "mark" the hits in the > document text. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: [hidden > email] > > > -Original Message- > > From: akshaymall [mailto:[hidden > email]] > > Sent: Monday, March 07, 2016 1:46 PM > > To: [hidden email] > > Subject: Solr- Fuzzy Search > > > > I want to try the fuzzy search in Solr with a specific query. > > > > For example, I want to search all the words that match this query: "agree > > ~2". > > > > Now using a simple query, I can find the documents that have the words > > matching the above query. But I want to know the words that Solr has > found > > in the document. > > > > Example: > > > > Search Result: > > 1. Sample1.pdf > > 2. Sample2.pdf > > > > What I want as a result: > > > > 1. Sample1.pdf : agree, agrea > > 2. Sample2.pdf : agref, agret > > > > > > > > > > -- > > View this message in context: http://lucene.472066.n3.nabble.com/Solr- > > Fuzzy-Search-tp4262092.html > > Sent from the Lucene - General mailing list archive at Nabble.com. > > > > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/Solr-Fuzzy-Search- > tp4262092p4262093.html > To unsubscribe from Solr- Fuzzy Search, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro > =unsubscribe_by_code&node=4262092&code=YWtzaGF5Lm1hbGxAZWNsZX > J4LmNvbXw0MjYyMDkyfC00OTQ1Nzk3MjE=>. > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?mac > ro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble. > naml.namespaces.BasicNamespace- > nabble.view.web.template.NabbleNamespace- > nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscrib > ers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml- > send_instant_email%21nabble%3Aemail.naml> > > > image001.png (21K) > <http://lucene.472066.n3.nabble.com/attachment/4262097/0/image001.png > > > > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Solr- > Fuzzy-Search-tp4262092p4262097.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Solr- Fuzzy Search
Hi, you can use the highlighter functionality. It will "mark" the hits in the document text. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: akshaymall [mailto:akshay.m...@eclerx.com] > Sent: Monday, March 07, 2016 1:46 PM > To: general@lucene.apache.org > Subject: Solr- Fuzzy Search > > I want to try the fuzzy search in Solr with a specific query. > > For example, I want to search all the words that match this query: "agree > ~2". > > Now using a simple query, I can find the documents that have the words > matching the above query. But I want to know the words that Solr has found > in the document. > > Example: > > Search Result: > 1. Sample1.pdf > 2. Sample2.pdf > > What I want as a result: > > 1. Sample1.pdf : agree, agrea > 2. Sample2.pdf : agref, agret > > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Solr- > Fuzzy-Search-tp4262092.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: find out rank of document within a lucene query (with custom sorting)
Hi, If you only want to get that information for a single, certain document, you can use the "explain" functionality - but you need the internal document ID for that. Alternatively execute the query once (for whole result set) and an additional time with a filter on your external ID applied. The only result would be the filtered document, but with the same score as in first result set. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Anton [mailto:anton.te...@gmail.com] > Sent: Wednesday, January 06, 2016 10:27 AM > To: general@lucene.apache.org > Subject: find out rank of document within a lucene query (with custom > sorting) > > Hi, > > I am interested to find out what rank a document holds within a search > query among all documents within the index. > > In more detail: I would like to create a query with a sorting. But I am not > interested in getting, for instance, the top 10 search hits for that query > and sorting. I am only interested what rank a certain document from the > index would be in the result of that query and sorting. It could be the > 42nd from 1024 documents in that result. I could identify the document via > an ID field. > > Is there a possibility to do get that rank number efficiently? > (A simple, but probably time consuming, solution would be: Get all > documents according to query and sorting. Loop through the result list and > find the specific document. Return the counter of the loop.) > > Here is a similar question on stackoverflow (without a satisfying answer): > http://stackoverflow.com/questions/7924146/is-there-a-way-for-solr- > lucene-to-return-the-ranks-of-selected-documents-instead > > Have a nice day, > Anton.
CfP about Geospatial Track at ApacheCon, Vancouver
Hi Committers, hi Lucene users, On the next ApacheCon in Vancouver, Canada (May 9 - 13 2016), there will be a track about geospatial data. The track is organized by Chris Mattmann together with George Percivall of the OGC (Open Geospatial Consortium). As I am also a member of OGC, they invited me to ask the Lucene Community to propose talks. Apache Lucene, Solr, and Elasticsearch have great geospatial features, this would be a good idea to present them. This is especially important because the current OGC standards are very RDBMS-focused (like filter definitions, services,...), so we can use the track to talk with OGC representatives to better match OGC standards with full text search. I am not sure if I can manage to get to Vancouver, but the others are kindly invited to submit talks. It is not yet sure if the track will be part of ApacheCon Core or ApacheCon BigData. I will keep you informed. If you have talk suggestions, please send them to me or Chris Mattmann. Alternatively, submit them to the Big Data track @ http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp (and mention geospatial track). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
RE: Lucene TypeAttribute not used during querying
Not as attributes. As said before, you have to write a separate TokenFilter at the end of your indexing chain, that collects the attributes you want to index and add them to the term: - Append the type to term like: TokenFilter's incrementToken does something like: termAtt.append('#').append(typeAtt); - Create a payload out of it: see payload package in analyzers-common for examples. After that you can query using the "extended term" or using payload queries. You may ask the question about how to query then: appending the type on the term itself (see above like "term#type") is no problem during search, because also on search side the analyzer is used. The search query gets analyzed and the last TokenFilter of the analyzer will add the type to the term as described before. The query will then hit all terms with that type. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Bedaride [mailto:paul.bedar...@xilopix.com] > Sent: Wednesday, September 23, 2015 11:38 AM > To: general@lucene.apache.org > Subject: Re: Lucene TypeAttribute not used during querying > > Ok so it is not possible to store other part of information in the index ? > like > part-of-speach ? > > Thanks for the fast answer > > Paul > > On 23/09/2015 11:21, Uwe Schindler wrote: > > Hi, > > > > The type attribute is not stored in index. The main intention behind this > attribute is to use it inside the analysis chain. E.g. you have some > tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter > before indexing may then change the term accordingly (e.g. adding the type > as a payload, or append it to the term itsself) to get the information into > index - but this is mainly your task. The same applies for other language > specific attributes (like Japanese ones). The keyword attribute is another > example, it is also not indexed, but is solely used to control behavior of > later > TokenFilters (e.g. prevent stemming). > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -Original Message- > >> From: Paul Bedaride [mailto:paul.bedar...@xilopix.com] > >> Sent: Wednesday, September 23, 2015 11:16 AM > >> To: general@lucene.apache.org > >> Subject: Lucene TypeAttribute not used during querying > >> > >> Hello, > >> > >> I wonder why the TypeAttribute is not used for queries ? > >> It seems that it is used only during analysis. > >> Why it is not used in org.apache.lucene.index.Term ? > >> > >> Paul Bédaride
RE: Lucene TypeAttribute not used during querying
Hi, The type attribute is not stored in index. The main intention behind this attribute is to use it inside the analysis chain. E.g. you have some tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter before indexing may then change the term accordingly (e.g. adding the type as a payload, or append it to the term itsself) to get the information into index - but this is mainly your task. The same applies for other language specific attributes (like Japanese ones). The keyword attribute is another example, it is also not indexed, but is solely used to control behavior of later TokenFilters (e.g. prevent stemming). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Paul Bedaride [mailto:paul.bedar...@xilopix.com] > Sent: Wednesday, September 23, 2015 11:16 AM > To: general@lucene.apache.org > Subject: Lucene TypeAttribute not used during querying > > Hello, > > I wonder why the TypeAttribute is not used for queries ? > It seems that it is used only during analysis. > Why it is not used in org.apache.lucene.index.Term ? > > Paul Bédaride
RE: Lucene 5 Custom FieldComparator
You have to index as Docvalues since 5.0 to do that type of query. FieldCache is gone, see MIGRATE.txt. MultiFields does not help here, it is more to view the whole index as a single LeafReader although it contains of multiple segments (LeafReaders). Its also used for merging, but user code should not use it. The doSetNextReader is provided in the API because the collecting of results is done per index segment (means per LeafReader) and the document ids reported to collect() are relative to those readers, not valid globally. In setNextReader you have to fetch the docvalues from the index using LeafReader and access them later in the compare methods using the local docids. Uwe P.S.: FieldCache is still available as a reader wrapper in misc modules 'uninverting' package, but the API no longer returns arrays. You just get back a DocValues emulation, which is random access. You still have to do this per index segment (setNextReader). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Pablo Mincz [mailto:pmi...@gmail.com] > Sent: Thursday, August 13, 2015 5:35 PM > To: general@lucene.apache.org > Subject: Lucene 5 Custom FieldComparator > > Hi, > > I'm doing a migration from Lucene 3.6.1 to 5.2.1 and I have a custom > FieldComparator that sort the search for availables discounts. For this, > first I > check that the date range is valid and later sort by the discount amount. > > I did this in Lucene 3.6.1 but now in 5.2.1 version, the FieldComparator has > the method doSetNextReader that has a LeafReaderContext and I do not > know how to read all the fields from the LeafReader because I did not > indexed this field with DocValues. > > I tried with MultiFields but I got only one result instead of an array, and > some > values are Floats. > > Someone know how to do this? > > Thanks a lot for the help. > > Regards, > Pablo.
RE: Controlling size of matched results in Lucene
There is one possibility to get *all* documents, if you are happy with the fact that non-matching documents get a score factor of 0.0: Scores of SHOULD clauses in a Boolean query get "added". You can combine your original query ("yourOriginalQuery"; with real scores) with another query ("fakeQuery") matching all documents and having score = 0.0 in a single final BooleanQuery ("finalQuery"=, you get score 0.0 for non-matching documents, and score 0.0 + realScore => realScore for all others: BooleanQuery finalQuery = new BooleanQuery(true); // BooleanQuery without coord factors, so it just adds scores nothing else finalQuery.add(yourOriginalQuery, BooleanClause.Occur.SHOULD); Query fakeQuery = new MatchAllDocsQuery(); fakeQuery.setBoost(0); // tune the boost, so this query always returns a score of 0 finalQuery.add(fakeQuery, BooleanClause.Occur.SHOULD); // execute finalQuery! Hope that helps! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Susrutha Gongalla [mailto:susrutha.gonga...@gmail.com] > Sent: Tuesday, May 12, 2015 12:28 PM > To: general@lucene.apache.org > Subject: Controlling size of matched results in Lucene > > Hello, > > I am developing a matching algorithm using Lucene 4.10.0 My index consists > of about 2000 documents. > When I use the 'search' method on a query term, I get about n results that > point to the n documents in index along with their corresponding scores. > What I would like to get is - All the 2000 documents with their lucene scores > along with whether they are matched/unmatched. > I would like to control the size of results that lucene returns when I search > for > a query term. > > I have tried altering the default similarity used in lucene by overriding the > score methods. > However, this did not affect the size of results generated by lucene. > > I also tried explicitly given 'null' value for filter, when calling the > 'search' > method. > This also did not affect the size of results. > > I just started working with Lucene. > Would appreciate any help in this regard! > > Best, > Susrutha Gongalla
ApacheCon NA 2015 in Austin, Texas
Dear Apache Lucene/Solr enthusiast, In just a few weeks, we'll be holding ApacheCon in Austin, Texas, and we'd love to have you in attendance. You can save $300 on admission by registering NOW, since the early bird price ends on the 21st. Register at http://s.apache.org/acna2015-reg ApacheCon this year celebrates the 20th birthday of the Apache HTTP Server, and we'll have Brian Behlendorf, who started this whole thing, keynoting for us, and you'll have a chance to meet some of the original Apache Group, who will be there to celebrate with us. We also have talks about Apache Lucene and Apache Solr in 7 tracks of great talks, as well as BOFs, the Apache BarCamp, project-specific hack events, and evening events where you can deepen your connection with the larger Apache community. See the full schedule at http://apacheconna2015.sched.org/ And if you have any questions, comments, or just want to hang out with us before and during the event, follow us on Twitter - @apachecon - or drop by #apachecon on the Freenode IRC network. Hope to see you in Austin! - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: If I put a cached filter, do i still need to set the max clause count?
Hi, This question is about Lucene.NET, which is not part of the Apache Lucene project. Please send your questions to the Lucene.NET mailing list: u...@lucenenet.apache.org Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ > -Original Message- > From: doglin82 [mailto:karen@bellmedia.ca] > Sent: Wednesday, March 11, 2015 4:21 PM > To: general@lucene.apache.org > Subject: If I put a cached filter, do i still need to set the max clause > count? > > We are using Lucene for our web site and as the index grew, we got the > following exception. > > the Message: maxClauseCount is set to 1024 Stack Trace: > at Lucene.Net.Search.BooleanQuery.Add(BooleanClause clause) at > Lucene.Net.Search.BooleanQuery.Add(Query query, Occur occur) Instead of > using Range Query, I am using a Range Filter now, and wrapped it with a > Cached Filter, > > RangeFilter dateFilter = new RangeFilter("documentpublishfrom", > "210100", > DateTime.Now.AddYears(10).ToString("MMddHHmmss"), > true, true); > > > CachingWrapperFilter cachingFilter = new CachingWrapperFilter(dateFilter ); > > > var results = _searcher.Search(bq, cachingFilter, sortBy); So, now that I > am > using filters intsead, do i need to set max clause count still? Please advise > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/If-I-put- > a-cached-filter-do-i-still-need-to-set-the-max-clause-count-tp4192413.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: how to use CachingWrapperFilter correctly and effectively in Lucene
Hi, This question is about Lucene.NET, which is not part of the Apache Lucene project. Please send your questions to the Lucene.NET mailing list: u...@lucenenet.apache.org Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ > -Original Message- > From: doglin82 [mailto:karen@bellmedia.ca] > Sent: Wednesday, March 11, 2015 8:23 PM > To: general@lucene.apache.org > Subject: how to use CachingWrapperFilter correctly and effectively in Lucene > > We are using Lucene for our web site and as the index grew, we got the > following exception due to too many clause. > > the Message: maxClauseCount is set to 1024 Stack Trace: > at Lucene.Net.Search.BooleanQuery.Add(BooleanClause clause) at So I did > some research and added a CachingWrapperFilter , my code now looks like > this > > BooleanQuery bq = new BooleanQuery(); > > //publishedQuery is set to BooleanQuery > bq.Add(publishedQuery, BooleanClause.Occur.MUST); > > var sortBy = customSort ?? new Sort(Sort.RELEVANCE.GetSort()); > BooleanQuery.SetMaxClauseCount(4096); > >Filter filter = new QueryFilter(bq); > CachingWrapperFilter cachingFilter = new > CachingWrapperFilter(filter); > > var results = _searcher.Search(bq, cachingFilter,sortBy); I want > to know > 1) If I am using the CachingWrapperFilter correct and effectively > 2) Do I still need to set the Max Clause to 4096 if I am using CachingWrapper > Filter, default is 1024 for max clause count > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/how-to- > use-CachingWrapperFilter-correctly-and-effectively-in-Lucene- > tp4192491.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: If I put a cached filter, do i still need to set the max clause count?
Hi, This question is about Lucene.NET, which is not part of the Apache Lucene project. Please send your questions to the Lucene.NET mailing list: u...@lucenenet.apache.org Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ > -Original Message- > From: doglin82 [mailto:karen@bellmedia.ca] > Sent: Wednesday, March 11, 2015 4:21 PM > To: general@lucene.apache.org > Subject: If I put a cached filter, do i still need to set the max clause > count? > > We are using Lucene for our web site and as the index grew, we got the > following exception. > > the Message: maxClauseCount is set to 1024 Stack Trace: > at Lucene.Net.Search.BooleanQuery.Add(BooleanClause clause) at > Lucene.Net.Search.BooleanQuery.Add(Query query, Occur occur) Instead of > using Range Query, I am using a Range Filter now, and wrapped it with a > Cached Filter, > > RangeFilter dateFilter = new RangeFilter("documentpublishfrom", > "210100", > DateTime.Now.AddYears(10).ToString("MMddHHmmss"), > true, true); > > > CachingWrapperFilter cachingFilter = new CachingWrapperFilter(dateFilter ); > > > var results = _searcher.Search(bq, cachingFilter, sortBy); So, now that I > am > using filters intsead, do i need to set max clause count still? Please advise > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/If-I-put- > a-cached-filter-do-i-still-need-to-set-the-max-clause-count-tp4192413.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Reminder: FOSDEM 2015 - Open Source Search Dev Room
Hello everyone, We have extended the deadline for submissions to the FOSDEM 2015 Open Source Search Dev Room to Monday, 9 December at 23:59 CET. We are looking forward to your talk proposal! Cheers, Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ > -Original Message- > From: Uwe Schindler [mailto:uschind...@apache.org] > Sent: Monday, November 24, 2014 9:33 AM > To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr- > u...@lucene.apache.org; general@lucene.apache.org > Subject: Reminder: FOSDEM 2015 - Open Source Search Dev Room > > Hi, > > We host a Dev-Room about "Open Source Search" on this year's FOSDEM > 2015 (https://fosdem.org/2015/), taking place on January 31th and February > 1st, 2015, in Brussels, Belgium. There is still one more week to submit your > talks, so hurry up and submit your talk early! > > Here is the full CFP as posted a few weeks ago: > > Search has evolved to be much more than simply full-text search. We now > rely on “search engines” for a wide variety of functionality: > search as navigation, search as analytics and backend for data visualization > and sometimes, dare we say it, as a data store. The purpose of this dev room > is to explore the new world of open source search engines: their enhanced > functionality, new use cases, feature and architectural deep dives, and the > position of search in relation to the wider set of software tools. > > We welcome proposals from folks working with or on open source search > engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) > or technologies that heavily depend upon search (e.g. > NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in > presentations on search algorithms, machine learning, real-world > implementation/deployment stories and explorations of the future of > search. > > Talks should be 30-60 minutes in length, including time for Q&A. > > You can submit your talks to us here: > https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 > 8G0OxSfp84A/viewform > > Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We > cannot guarantee we will have the opportunity to review submissions made > after the deadline, so please submit early (and often)! > > Should you have any questions, you can contact the Dev Room > organizers: opensourcesearch-devr...@lists.fosdem.org > > Cheers, > LH on behalf of the Open Source Search Dev Room Program Committee* > > * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten > Curdt, Uwe Schindler > > - > Uwe Schindler > uschind...@apache.org > Apache Lucene PMC Member / Committer > Bremen, Germany > http://lucene.apache.org/ > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org
Reminder: FOSDEM 2015 - Open Source Search Dev Room
Hi, We host a Dev-Room about "Open Source Search" on this year's FOSDEM 2015 (https://fosdem.org/2015/), taking place on January 31th and February 1st, 2015, in Brussels, Belgium. There is still one more week to submit your talks, so hurry up and submit your talk early! Here is the full CFP as posted a few weeks ago: Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for Q&A. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: FOSDEM 2015 - Open Source Search Dev Room
Hi, forgot to mention: FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See also: https://fosdem.org/2015/ I hope to see you there! Uwe > -Original Message- > From: Uwe Schindler [mailto:uschind...@apache.org] > Sent: Monday, November 03, 2014 1:29 PM > To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr- > u...@lucene.apache.org; general@lucene.apache.org > Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room > > ***Please forward this CFP to anyone who may be interested in > participating.*** > > Hi, > > Search has evolved to be much more than simply full-text search. We now > rely on “search engines” for a wide variety of functionality: > search as navigation, search as analytics and backend for data visualization > and sometimes, dare we say it, as a data store. The purpose of this dev room > is to explore the new world of open source search engines: their enhanced > functionality, new use cases, feature and architectural deep dives, and the > position of search in relation to the wider set of software tools. > > We welcome proposals from folks working with or on open source search > engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) > or technologies that heavily depend upon search (e.g. > NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in > presentations on search algorithms, machine learning, real-world > implementation/deployment stories and explorations of the future of > search. > > Talks should be 30-60 minutes in length, including time for Q&A. > > You can submit your talks to us here: > https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 > 8G0OxSfp84A/viewform > > Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We > cannot guarantee we will have the opportunity to review submissions made > after the deadline, so please submit early (and often)! > > Should you have any questions, you can contact the Dev Room > organizers: opensourcesearch-devr...@lists.fosdem.org > > Cheers, > LH on behalf of the Open Source Search Dev Room Program Committee* > > * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten > Curdt, Uwe Schindler > > - > Uwe Schindler > uschind...@apache.org > Apache Lucene PMC Member / Committer > Bremen, Germany > http://lucene.apache.org/ > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org
CFP: FOSDEM 2015 - Open Source Search Dev Room
***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for Q&A. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler ----- Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations
Hallo Apache Solr Users, the Apache Lucene PMC wants to make the users of Solr aware of the following issue: Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its binary release tarball. This version (and all previous ones) of Apache POI are vulnerable to the following issues: = CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML parser = Type: Information disclosure Description: Apache POI uses Java's XML components to parse OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML External Entity (XXE) attacks, which allows remote attackers to bypass security restrictions and read arbitrary files via a crafted OpenXML document that provides an XML external entity declaration in conjunction with an entity reference. = CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML parser = Type: Denial of service Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML Entity Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume large amounts of CPU resources. The Apache POI PMC released a bugfix version (3.10.1) today. Solr users are affected by these issues, if they enable the "Apache Solr Content Extraction Library (Solr Cell)" contrib module from the folder "contrib/extraction" of the release tarball. Users of Apache Solr are strongly advised to keep the module disabled if they don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can update the affected libraries by replacing the vulnerable JAR files in the distribution folder. Users of previous versions have to update their Solr release first, patching older versions is impossible. To replace the vulnerable JAR files follow these steps: - Download the Apache POI 3.10.1 binary release: http://poi.apache.org/download.html#POI-3.10.1 - Unzip the archive - Delete the following files in your "solr-4.X.X/contrib/extraction/lib" folder: # poi-3.10-beta2.jar # poi-ooxml-3.10-beta2.jar # poi-ooxml-schemas-3.10-beta2.jar # poi-scratchpad-3.10-beta2.jar # xmlbeans-2.3.0.jar - Copy the following files from the base folder of the Apache POI distribution to the "solr-4.X.X/contrib/extraction/lib" folder: # poi-3.10.1-20140818.jar # poi-ooxml-3.10.1-20140818.jar # poi-ooxml-schemas-3.10.1-20140818.jar # poi-scratchpad-3.10.1-20140818.jar - Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the "solr-4.X.X/contrib/extraction/lib" folder. - Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any files with version number "3.10-beta2". - Verify that the folder contains one xmlbeans JAR file with version 2.6.0. If you just want to disable extraction of Microsoft Office documents, delete the files above and don't replace them. "Solr Cell" will automatically detect this and disable Microsoft Office document extraction. Coming versions of Apache Solr will have the updated libraries bundled. Happy Searching and Extracting, The Apache Lucene Developers PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting these issues! - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
Close of Apache Lucene's Open Relevance sub-project
Hi, the PMC decided in a vote, that the Apache Lucene sub-project "Open Relevance" will be discontinued. There was no activity during the last years and the project made no releases at all. I send this as the last message to the already existing mailing lists, so people are aware that we no longer provide infrastructure like mailing lists. Any discussion in the "open relevance" context should in the future be directed to: general@lucene.apache.org The already existing data collections in SVN will be kept alive, because Subversion never forgets. Please use them, if you are willing to do so. Thank you to all committers for their support in this project! I very much like the Wiki page discussing all the problems with available collections. We should maybe move over these pages to a static web page on the "attic" Lucene project page or copy them into the Lucene Wiki. The CWIKI should then be closed, too. Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Chair / Committer Bremen, Germany http://lucene.apache.org/
[ANNOUNCE] Apache Lucene 4.8.0 released
28 April 2014, Apache Lucene™ 4.8.0 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.0 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Lucene 4.8.0 Release Highlights: * Apache Lucene now requires Java 7 or greater (recommended is Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions have known JVM bugs affecting Lucene). * Apache Lucene is fully compatible with Java 8. * All index files now store end-to-end checksums, which are now validated during merging and reading. This ensures that corruptions caused by any bit-flipping hardware problems or bugs in the JVM can be detected earlier. For full detection be sure to enable all checksums during merging (it's disabled by default). * Lucene has a new Rescorer/QueryRescorer API to perform second-pass rescoring or reranking of search results using more expensive scoring functions after first-pass hit collection. * AnalyzingInfixSuggester now supports near-real-time autosuggest. * Simplified impact-sorted postings (using SortingMergePolicy and EarlyTerminatingCollector) to use Lucene's Sort class to express the sort order. * Bulk scoring and normal iterator-based scoring were separated, so some queries can do bulk scoring more effectively. * Switched to MurmurHash3 to hash terms during indexing. * IndexWriter now supports updating of binary doc value fields. * HunspellStemFilter now uses 10 to 100x less RAM. It also loads all known OpenOffice dictionaries without error. * Lucene now also fsyncs the directory metadata on commits, if the operating system and file system allow it (Linux, MacOSX are known to work). * Lucene now uses Java 7 file system functions under the hood, so index files can be deleted on Windows, even when readers are still open. * A serious bug in NativeFSLockFactory was fixed, which could allow multiple IndexWriters to acquire the same lock. The lock file is no longer deleted from the index directory even when the lock is not held. * Various bugfixes and optimizations since the 4.7.2 release. Please read CHANGES.txt for a full list of new features. Please report any feedback to the mailing lists (http://lucene.apache.org/core/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - Uwe Schindler uschind...@apache.org Apache Lucene PMC Chair / Committer Bremen, Germany http://lucene.apache.org/
[ANNOUNCE] Apache Solr 4.8.0 released
28 April 2014, Apache Solr™ 4.8.0 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.8.0 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Solr 4.8.0 Release Highlights: * Apache Solr now requires Java 7 or greater (recommended is Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions have known JVM bugs affecting Solr). * Apache Solr is fully compatible with Java 8. * and tags have been deprecated from schema.xml. There is no longer any reason to keep them in the schema file, they may be safely removed. This allows intermixing of , and definitions if desired. * The new {!complexphrase} query parser supports wildcards, ORs etc. inside Phrase Queries. * New Collections API CLUSTERSTATUS action reports the status of collections, shards, and replicas, and also lists collection aliases and cluster properties. * Added managed synonym and stopword filter factories, which enable synonym and stopword lists to be dynamically managed via REST API. * JSON updates now support nested child documents, enabling {!child} and {!parent} block join queries. * Added ExpandComponent to expand results collapsed by the CollapsingQParserPlugin, as well as the parent/child relationship of nested child documents. * Long-running Collections API tasks can now be executed asynchronously; the new REQUESTSTATUS action provides status. * Added a hl.qparser parameter to allow you to define a query parser for hl.q highlight queries. * In Solr single-node mode, cores can now be created using named configsets. * New DocExpirationUpdateProcessorFactory supports computing an expiration date for documents from the "TTL" expression, as well as automatically deleting expired documents on a periodic basis. Solr 4.8.0 also includes many other new features as well as numerous optimizations and bugfixes of the corresponding Apache Lucene release. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - Uwe Schindler uschind...@apache.org Apache Lucene PMC Chair / Committer Bremen, Germany http://lucene.apache.org/
Attention: Lucene 4.8 and Solr 4.8 will require minimum Java 7
Hi, the Apache Lucene/Solr committers decided with a large majority on the vote to require Java 7 for the next minor release of Apache Lucene and Apache Solr (version 4.8)! Support for Java 6 by Oracle already ended more than a year ago and Java 8 is coming out in a few days. The next release will also contain some improvements for Java 7: - Better file handling (especially on Windows) in the directory implementations. Files can now be deleted on windows, although the index is still open - like it was always possible on Unix environments (delete on last close semantics). - Speed improvements in sorting comparators: Sorting now uses Java 7's own comparators for integer and long sorts, which are highly optimized by the Hotspot VM.. If you want to stay up-to-date with Lucene and Solr, you should upgrade your infrastructure to Java 7. Please be aware that you must use at least use Java 7u1. The recommended version at the moment is Java 7u25. Later versions like 7u40, 7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 prerelease, which has fixed this bug. Once 7u60 is out, this will be the recommended version. In addition, there is no Oracle/BEA JRockit available for Java 7, use the official Oracle Java 7. JRockit was never working correctly with Lucene/Solr (causing index corrumption), so this should not be an issue for you. Please also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs Apache Lucene and Apache Solr were also heavily tested with all prerelease versions of Java 8, so you can also give it a try! Looking forward to the official Java 8 release next week - I will run my indexes with that version for sure! Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
RE: VOTE: Release apache-solr-ref-guide-4.7.pdf (RC1)
Hi, I did not read the whole guide, but the SHA1 and signature are correct. I checked the documentation index and searched for outdated version numbers, looks like nothing is seriously broken. +1 to release. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: u...@thetaphi.de From: Cassandra Targett [mailto:casstarg...@gmail.com] Sent: Wednesday, February 26, 2014 5:59 PM To: d...@lucene.apache.org Cc: Lucene mailing list Subject: VOTE: Release apache-solr-ref-guide-4.7.pdf (RC1) I generated a new release candidate for the Solr Reference Guide. This fixes the page numbering problem and a few other minor edits folks made yesterday after I generated RC0. https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.7-RC1/ +1 from me. Cassandra
September 2013 Lucene report to the ASF Board
Hi, I checked in a draft version of the board report for the September ASF board meeting: http://s.apache.org/xz3 Please change & commit, if you notice anything missing/wrong. If everybody is happy, I will submit this at latest: Wednesday, 11 September 2013. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
RE: Welcome Cassandra Targett as Lucene/Solr committer
Welcome Cassandra! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Thursday, August 01, 2013 12:48 AM > To: d...@lucene.apache.org; Lucene mailing list > Cc: Cassandra Targett > Subject: Welcome Cassandra Targett as Lucene/Solr committer > > I'm pleased to announce that Cassandra Targett has accepted to join our > ranks as a committer. > > Cassandra worked on the donation of the new Solr Reference Guide [1] and > getting things in order for its first official release [2]. > Cassandra, it is tradition that you introduce yourself with a brief bio. > > Welcome! > > P.S. As soon as your SVN access is setup, you should then be able to add > yourself to the committers list on the website as well. > > [1] > https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+ > Guide > [2] https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/
RE: How to start with Lucene
It depends what you are intending to do: - If you want to write an application that uses Apache Lucene, you can use the binary tgz file. It will contain all needed JARs to build an application. Alternatively, setup a Maven project and add the Lucene dependencies (core, analyzers, queryparser,...). - If you want to hack Apache Lucene itself (changing implementation, submitting patches to the developers), check out SVN and start with running "ant" to build. To setup in IDEs use "ant eclipse" or "ant idea", which builds project files that you can import into your IDE - You only need to do this if you want to modify Lucene, not to use it - If you don't want to code and just use a Server with Lucene (like a database server), you can start with Apache Solr or ElasticSearch. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Vinh Đặng [mailto:dqvin...@gmail.com] > Sent: Sunday, July 07, 2013 11:51 AM > To: general@lucene.apache.org > Subject: How to start with Lucene > > Hi everyone, > > I am a new with Lucene, with basic programming Java knowledge, but not > export. > > Now I am very confused, because there are three kinds of Lucene I can > download: from svn, TAR.GZ file and .ZIP file, but the problem is they seems > different. > > Could you give me an instruction to setup Lucene with some kinds of IDE, > such as Eclipse, to start
PMC Chair change
Hi all, most of you will already know it: Since June 19, 2013, I am the new Project Management Committee Chair, replacing Steve Rowe. I am glad to manage all the legal stuff for new committers or contributions from external entities - and also preparing the board reports. All this, of course and as always, with included but not deliberate @UweSays quotations. Many thanks to all PMC members who have voted for me! Many thanks to Steve for the help and hints to all the things I have to know in my new role! Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Chair / Committer Bremen, Germany http://lucene.apache.org/
RE: problem with solr 4.3.1 installation
Hi, It depends on your configuration! The installation of jetty shipped with Solr is optimized for the typical usage pattern of Solr. Webapp containers like JBoss often have additional monitoring modules that may have an impact on the performance (I know that Jboss often has crazy plugins in the JVM for that, which have a large impact on garbage collection). Sometimes the web app container ships with malfunctioning Java versions, so take care. Also some containers use incorrect charsets, so the UTF-8 decoding of %-encoded query parameters is broken. We have a workaround for that in later Solr versions (4.1+), but in general all this is not tested with foreign servlet containers, so we cannot give any support. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: u...@thetaphi.de From: Kuldeep Sharma [mailto:kuldeep.sha...@hcl.com] Sent: Wednesday, June 19, 2013 6:20 PM To: general@lucene.apache.org Cc: Uwe Schindler Subject: RE: problem with solr 4.3.1 installation Hi Uwe, Is there any performance degradation or other limitation if we use JBOSS instead of jetty for deploying Solr? Currently, we are using Jboss AS in Production and don’t seems any issue till now since last 2-3 months. Thanks! Kuldeep -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Wednesday, June 19, 2013 10:49 AM To: general@lucene.apache.org Subject: RE: problem with solr 4.3.1 installation Hi, > Thanks Uwe for your quick reply. > > I got the the problem of setting classpath now. > > But i have few question based on your reply. May not be related to the topic > of the thread. > > Question 1 related to the point - *"In general it is not recommended to > install > Solr inside a custom webapp container"* where will solr runs in production > environments? i was thinking that it has to run on some web containers. > Jetty is only for playing around/testing. > Please correct me if i am wrong. See Solr like a database server (MySQL or Postgres). Do you install MySQL inside a servlet container? - no it runs as a separate service in a separate process! Jetty has nothing to do with "playing around". Jetty is just the web connector of Solr and is the officially supported HTTP interface. Future versions of Solr may replace jetty by e.g. the netty library for high performance select-based I/O. Your custom application talks to Solr using the HTTP protocol, but that does not mean that Solr must run in your webapp container. Solr runs (like a MySQL database server) ideally as a separate JVM instance. That's the recommended installation. > Question 2 related to the point - "*Future versions of Solr may no longer ship > with a WAR file because it causes too many problems, because Solr does not > work well with other webapps in the same JVM"* > * > * > how can we use solr with any app say my own web app (typically, a spring > mvc or any EE app) if it is not shipped as war? Can't we use Solr for solving > problem of complex/advance search implementation (time consuming > search queries from RDBMS) that normally exists in any web app? say any > inventory management or warehouse management apps. See above. > My requirement is for inputting data to reporting engine like crystal/jasper > and to generate analytic chars for our dashboards. > > After seeing your reply, i started thinking that solr is not the one for my > requirement. Please clarify. > > Thanks a ton. > > Pradeep > * > * > > > On Wed, Jun 19, 2013 at 8:22 PM, Uwe Schindler < <mailto:u...@thetaphi.de> > u...@thetaphi.de> wrote: > > > Hi, > > > > See <http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home> > > http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home for > > instructions (scroll down and look for the JNDI options). The problem > > is that SOLR_HOME must be known to JBoss, otherwise the webapp cannot > > locate any files from the config. > > > > In general it is not recommended to install Solr inside a custom > > webapp container (means installing the WAR file in tomcat, jboss or > > whatever). You should use the included web engine (provided by jetty) > > with a recent JDK version. The example folder has a start.jar. You can > > start the correctly configured Jetty engine with solr by running "java > > -jar start.jar" from the example folder. > > > > Future versions of Solr may no longer ship with a WAR file because it > > causes too many problems, because Solr does not work well with other > > webapps in the same JVM (it has very spe
RE: problem with solr 4.3.1 installation
Hi, > Thanks Uwe for your quick reply. > > I got the the problem of setting classpath now. > > But i have few question based on your reply. May not be related to the topic > of the thread. > > Question 1 related to the point - *"In general it is not recommended to > install > Solr inside a custom webapp container"* where will solr runs in production > environments? i was thinking that it has to run on some web containers. > Jetty is only for playing around/testing. > Please correct me if i am wrong. See Solr like a database server (MySQL or Postgres). Do you install MySQL inside a servlet container? - no it runs as a separate service in a separate process! Jetty has nothing to do with "playing around". Jetty is just the web connector of Solr and is the officially supported HTTP interface. Future versions of Solr may replace jetty by e.g. the netty library for high performance select-based I/O. Your custom application talks to Solr using the HTTP protocol, but that does not mean that Solr must run in your webapp container. Solr runs (like a MySQL database server) ideally as a separate JVM instance. That's the recommended installation. > Question 2 related to the point - "*Future versions of Solr may no longer ship > with a WAR file because it causes too many problems, because Solr does not > work well with other webapps in the same JVM"* > * > * > how can we use solr with any app say my own web app (typically, a spring > mvc or any EE app) if it is not shipped as war? Can't we use Solr for solving > problem of complex/advance search implementation (time consuming > search queries from RDBMS) that normally exists in any web app? say any > inventory management or warehouse management apps. See above. > My requirement is for inputting data to reporting engine like crystal/jasper > and to generate analytic chars for our dashboards. > > After seeing your reply, i started thinking that solr is not the one for my > requirement. Please clarify. > > Thanks a ton. > > Pradeep > * > * > > > On Wed, Jun 19, 2013 at 8:22 PM, Uwe Schindler wrote: > > > Hi, > > > > See http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home for > > instructions (scroll down and look for the JNDI options). The problem > > is that SOLR_HOME must be known to JBoss, otherwise the webapp cannot > > locate any files from the config. > > > > In general it is not recommended to install Solr inside a custom > > webapp container (means installing the WAR file in tomcat, jboss or > > whatever). You should use the included web engine (provided by jetty) > > with a recent JDK version. The example folder has a start.jar. You can > > start the correctly configured Jetty engine with solr by running "java > > -jar start.jar" from the example folder. > > > > Future versions of Solr may no longer ship with a WAR file because it > > causes too many problems, because Solr does not work well with other > > webapps in the same JVM (it has very special garbage collection and > > memory requirements), so it should run as a separate server in a separate > VM. > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: pradeep kumar [mailto:pradeepkuma...@gmail.com] > > > Sent: Wednesday, June 19, 2013 3:56 PM > > > To: general@lucene.apache.org > > > Subject: problem with solr 4.3.1 installation > > > > > > hello all, > > > > > > I have a problem with installing solr 4.31. > > > > > > Giving you a background of what i am doing: > > > > > > I am trying to evaluate Solr as our search in engine for my project > > where we > > > have requirement multiple complex search functionality, reporting > > > and analytics. We deal with lakhs of records from RDBMS tables which > > > are > > linked. > > > Just to give you an idea, Order, item, item_details, files, etc. i > > proposed solr > > > and told about lucidworks to rest of my technical team and were > > impressed. > > > > > > My reasons for using solr is to achive fastrer search, input data to > > report > > > engine and analytic graphs for our dashboard. > > > > > > Other alternative to my solr approach is off-line db with star > > > schema and from that, get data fro reports or analytics. > > > > > > But, I am facing few problems in installing. I so
RE: problem with solr 4.3.1 installation
Hi, See http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home for instructions (scroll down and look for the JNDI options). The problem is that SOLR_HOME must be known to JBoss, otherwise the webapp cannot locate any files from the config. In general it is not recommended to install Solr inside a custom webapp container (means installing the WAR file in tomcat, jboss or whatever). You should use the included web engine (provided by jetty) with a recent JDK version. The example folder has a start.jar. You can start the correctly configured Jetty engine with solr by running "java -jar start.jar" from the example folder. Future versions of Solr may no longer ship with a WAR file because it causes too many problems, because Solr does not work well with other webapps in the same JVM (it has very special garbage collection and memory requirements), so it should run as a separate server in a separate VM. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: pradeep kumar [mailto:pradeepkuma...@gmail.com] > Sent: Wednesday, June 19, 2013 3:56 PM > To: general@lucene.apache.org > Subject: problem with solr 4.3.1 installation > > hello all, > > I have a problem with installing solr 4.31. > > Giving you a background of what i am doing: > > I am trying to evaluate Solr as our search in engine for my project where we > have requirement multiple complex search functionality, reporting and > analytics. We deal with lakhs of records from RDBMS tables which are linked. > Just to give you an idea, Order, item, item_details, files, etc. i proposed > solr > and told about lucidworks to rest of my technical team and were impressed. > > My reasons for using solr is to achive fastrer search, input data to report > engine and analytic graphs for our dashboard. > > Other alternative to my solr approach is off-line db with star schema and > from that, get data fro reports or analytics. > > But, I am facing few problems in installing. I some how feel that installation > guide is not clear. > > I am missing something? > > > > Downloaded solr 4.3.1 binaries. Extracted to my local drive. Set SOLR_HOME > class path in evn variables pointing C:\solr-4.3.1\example\solr and > SOLR_HOME/bin in path variable. > > Copied solr-4.3.1.war from SOLR_HOME/dist/ to my local jboss instance. > > Started my jboss server. > > Here is the log > > 16:05:13,715 ERROR [org.apache.solr.core.CoreContainer] > (coreLoadExecutor-3-thread-1) Unable to create core: collection1: > org.apache.solr.common.SolrException: Could not load config for > solrconfig.xml > > at > org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:91 > 9) > [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33] > > at > org.apache.solr.core.CoreContainer.create(CoreContainer.java:984) > [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33] > > at > org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) > [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33] > > at > org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) > [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33] > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > [rt.jar:1.6.0_29] > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > [rt.jar:1.6.0_29] > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > [rt.jar:1.6.0_29] > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > [rt.jar:1.6.0_29] > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > [rt.jar:1.6.0_29] > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut > or.java:886) > [rt.jar:1.6.0_29] > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja > va:908) > [rt.jar:1.6.0_29] > > at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_29] > > Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in > classpath or 'solr\collection1\conf/', cwd=C:\jboss-as-7.1.1.Final\bin > > at > org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoade > r.java:337) > [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33] > > at > org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.ja > va:302) > [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-
RE: Securing stored data using Lucene
Hi, > My name is Rafaela and I am just starting to work with Lucene for a project > that involves quite a few security aspects. > > I am working on an app that aims to manage data by using Lucene on a > mobile device. However, my application will require data to be confidential > (users will need to be logged in and have certain permissions regarding the > data). > I am currently trying to find a way to make this possible and still keep using > Lucene without having a very high performance drop-down. > > I was searching around and I found the patch from > https://issues.apache.org/jira/browse/LUCENE-2228. Since it seems to be > quite a bit old and the issue is not marked as resolved, I wanted to ask about > the status of this. Is this something that could work for securing the > information? Or is there another better solution already implemented? You can still use the Directory implementation posted in this issue with minor modifications. Lucene always had and still has the abstract Directory interface and yes, you can use it, to implement a block-based encryption below Lucene's storage layer. In any case, you still have to cope with the performance degrade introduced by this additional layer. Another idea is to make the encryption completely invisible to lucene by using a Linux loop device that encrypts everything written / read from it. Uwe
RE: XSS Issue
Hi, I already said that you should report your issue to priv...@lucene.apache.org The thing I wanted to say is: Everything in Solr is insecure by default, an additional XSS or whatever XFOOBAR does not matter at all because Solr should only run on a completely secured private network. So any issue like this has no great impact at all. The main issue of triggering stateful GET requests can only be fixed by redesigning Solr's public and documented APIs. This is impossible for bug fix releases, also major releases need to keep backwards, so fixing all issues that involve triggering stateful GET requests to the public API (through whatever mechanism) is far out > XSS is a large more problem than CSRF because you can execute JavaScript > code on the user browser that can lead to a compromission. In your original report you were telling about XSS and also in the same email the IMG-based links a user may get with his email. I was solely referring to the latter ones - which are unfixable without changing the REST API. You were also referring to: > Yes he can do that but as I said the same problem can occur without his > consent (and without a click) > if he's on an arbitrary website which hosts a HTML IMG pointing to the > vulnerable page of the solr > administrator interface (like src="http://X.X.X.X/solr/admin/xss_vulnerable_page/> ) This is again not related to XSS at all! I was telling you to report the XSS to the above mail address, you did not do that until now, so I assume you were only talking about similar things like the funny web page I was referring to. Finally: The XSS issues are low priority, because the admin web interface of Solr should never ever be in a network where you have access from browsers that have access to the internet. This is why I referred to SolrSecurity Wiki page. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gregory draperi [mailto:gregory.drap...@gmail.com] > Sent: Tuesday, June 18, 2013 11:04 PM > To: general > Subject: Re: XSS Issue > > I speak about XSS not CSRF. > > The way to fix XSS is to encode tainted data like user's inputs. > > For the CSRF problem there are techniques to prevent them in REST API (cf > OWASP or NSA document) but I understand that it may not be done due to > impacts http://fr.slideshare.net/johnwilander/advanced-csrf-and-stateless- > anticsrf > http://www.nsa.gov/ia/_files/support/guidelines_implementation_rest.pdf > > XSS is a large more problem than CSRF because you can execute JavaScript > code on the user browser that can lead to a compromission. > > > > > 2013/6/18 Uwe Schindler > > > The issue from the webpage I posted cannot be fixed because it would > > break all clients out there, because the REST API is the official API > > to Solr implemented by millions of clients... This is what I mean > > with: Reinvent Solr to fix this. > > The issue here is that it allows GET requests to modify the index. But > > as said before, it is unfixable unless you want to break all client > > software outside. > > > > If you want to prevent this, use e.g. ElasticSearch, which has a > > better, standards conform-designed REST API (which does not allow GET > > requests to modify anything). > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > Sent: Tuesday, June 18, 2013 6:43 PM > > > To: general > > > Subject: Re: XSS Issue > > > > > > Yes, it works because it exploits a CSRF issue and in my opinion it > > should also > > > be fixed like XSS vulnerabilities in the application. > > > > > > I think we don't understand each other. > > > > > > I'm going to send details to the private mailing list and I won't > > > waste > > your > > > time more. > > > > > > Regards, > > > > > > > > > 2013/6/18 Uwe Schindler > > > > > > > Have fun with this web page: > > > > > > > > http://www.thetaphi.de/nukeyoursolrindex.html > > > > > > > > It really works, if you have a default Solr instance running on > > > > your local machine on default port with default collection, and > > > > you open this web page > > > > -> this nukes your index. This has nothing to do with the Admin > > interface. > > > > > > > > Uwe
RE: XSS Issue
The issue from the webpage I posted cannot be fixed because it would break all clients out there, because the REST API is the official API to Solr implemented by millions of clients... This is what I mean with: Reinvent Solr to fix this. The issue here is that it allows GET requests to modify the index. But as said before, it is unfixable unless you want to break all client software outside. If you want to prevent this, use e.g. ElasticSearch, which has a better, standards conform-designed REST API (which does not allow GET requests to modify anything). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gregory draperi [mailto:gregory.drap...@gmail.com] > Sent: Tuesday, June 18, 2013 6:43 PM > To: general > Subject: Re: XSS Issue > > Yes, it works because it exploits a CSRF issue and in my opinion it should > also > be fixed like XSS vulnerabilities in the application. > > I think we don't understand each other. > > I'm going to send details to the private mailing list and I won't waste your > time more. > > Regards, > > > 2013/6/18 Uwe Schindler > > > Have fun with this web page: > > > > http://www.thetaphi.de/nukeyoursolrindex.html > > > > It really works, if you have a default Solr instance running on your > > local machine on default port with default collection, and you open > > this web page > > -> this nukes your index. This has nothing to do with the Admin interface. > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > Sent: Tuesday, June 18, 2013 6:27 PM > > > To: general > > > Subject: Re: XSS Issue > > > > > > This is a Cross-Site Request Forgery issue (not a XSS) and should be > > fixed by > > > example by adding an impredictible parameter to the request. > > > > > > I'm going to send to priv...@lucene.apache.org what I have found. > > > > > > Best regards, > > > > > > Grégory > > > > > > 2013/6/18 Uwe Schindler > > > > > > > Just to show this without the admin interface: Add these two > > > > images to any web page like this: > > > > > > > > http://localhost:8983/solr/collection1/update?stream.body=%3Cdelete% > > > 3E %3Cquery%3E*:*%3C/query%3E%3C/delete%3E" > > > > /> > > > > http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/ > > > %3 > > > E" > > > > /> > > > > > > > > Anybody who visits this web page would nuke the index of his > > > > running solr server on the local machine - there is not even the > > > > admin web interface involved. Any REST API on earth has this > > > > problem, it is not specific to Solr! > > > > > > > > Uwe > > > > > > > > - > > > > Uwe Schindler > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > > eMail: u...@thetaphi.de > > > > > > > > > > > > > -Original Message- > > > > > From: Uwe Schindler [mailto:u...@thetaphi.de] > > > > > Sent: Tuesday, June 18, 2013 6:01 PM > > > > > To: general@lucene.apache.org > > > > > Cc: 'gregory draperi' > > > > > Subject: RE: XSS Issue > > > > > > > > > > Hi, > > > > > > > > > > you can of course send your investigation to > > > > > priv...@lucene.apache.org, > > > > we > > > > > greatly appreciate this. > > > > > An XSS problem in the Solr Admin interface can for sure be solved > > > > somehow, > > > > > but would not help to make Solr secure. Without the admin interface > > > > > you > > > > can > > > > > still add some image into any web page that executes a "delete whole > > > > index > > > > > request" on the Solr server. > > > > > > > > > > If you want to prevent this, you can add HTTP basic authentication > > > > > to > > > > your > > > > > web container, as described in the solr wiki. > > > > > > > > > > In general:
RE: XSS Issue
Have fun with this web page: http://www.thetaphi.de/nukeyoursolrindex.html It really works, if you have a default Solr instance running on your local machine on default port with default collection, and you open this web page -> this nukes your index. This has nothing to do with the Admin interface. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gregory draperi [mailto:gregory.drap...@gmail.com] > Sent: Tuesday, June 18, 2013 6:27 PM > To: general > Subject: Re: XSS Issue > > This is a Cross-Site Request Forgery issue (not a XSS) and should be fixed by > example by adding an impredictible parameter to the request. > > I'm going to send to priv...@lucene.apache.org what I have found. > > Best regards, > > Grégory > > 2013/6/18 Uwe Schindler > > > Just to show this without the admin interface: Add these two images to > > any web page like this: > > > > http://localhost:8983/solr/collection1/update?stream.body=%3Cdelete%3E > %3Cquery%3E*:*%3C/query%3E%3C/delete%3E" > > /> > > http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3 > E" > > /> > > > > Anybody who visits this web page would nuke the index of his running > > solr server on the local machine - there is not even the admin web > > interface involved. Any REST API on earth has this problem, it is not > > specific to Solr! > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: Uwe Schindler [mailto:u...@thetaphi.de] > > > Sent: Tuesday, June 18, 2013 6:01 PM > > > To: general@lucene.apache.org > > > Cc: 'gregory draperi' > > > Subject: RE: XSS Issue > > > > > > Hi, > > > > > > you can of course send your investigation to > > > priv...@lucene.apache.org, > > we > > > greatly appreciate this. > > > An XSS problem in the Solr Admin interface can for sure be solved > > somehow, > > > but would not help to make Solr secure. Without the admin interface > > > you > > can > > > still add some image into any web page that executes a "delete whole > > index > > > request" on the Solr server. > > > > > > If you want to prevent this, you can add HTTP basic authentication > > > to > > your > > > web container, as described in the solr wiki. > > > > > > In general: If you have e.g. an EC2 coud of solr servers, add an > > > extra > > security > > > group to your cloud and limit all access from outside. Then also no > > admin can > > > access this. > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > > > > -Original Message- > > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > > Sent: Tuesday, June 18, 2013 5:46 PM > > > > To: Uwe Schindler > > > > Cc: general > > > > Subject: Re: XSS Issue > > > > > > > > Yes he can do that but as I said the same problem can occur without > > > > his consent (and without a click) if he's on an arbitrary website > > > > which hosts a HTML IMG pointing to the vulnerable page of the solr > > > > administrator interface (like > > > src="http://X.X.X.X/solr/admin/xss_vulnerable_page/> ) > > > > > > > > I'm thankful for your quick responses despite I don't understand this > > > > philosophy. I note the point. > > > > > > > > Regards, > > > > > > > > Grégory DRAPERI > > > > > > > > > > > > 2013/6/18 Uwe Schindler > > > > > > > > > He can also delete his whole index with a single click on a http > > > > > link referring to his Solr server. This is his problem. Never click > > > > > on links from eMail. > > > > > Solr is, as said already, not secured at all. If you want a "secure" > > > > > Solr server, rewrite the whole thing. The same applies to other > > > > > Lucene based products like ElasticSearch that have no "security" > > included. > > > > > > > > > > --
RE: XSS Issue
Just to show this without the admin interface: Add these two images to any web page like this: http://localhost:8983/solr/collection1/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E"; /> http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3E"; /> Anybody who visits this web page would nuke the index of his running solr server on the local machine - there is not even the admin web interface involved. Any REST API on earth has this problem, it is not specific to Solr! Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Tuesday, June 18, 2013 6:01 PM > To: general@lucene.apache.org > Cc: 'gregory draperi' > Subject: RE: XSS Issue > > Hi, > > you can of course send your investigation to priv...@lucene.apache.org, we > greatly appreciate this. > An XSS problem in the Solr Admin interface can for sure be solved somehow, > but would not help to make Solr secure. Without the admin interface you can > still add some image into any web page that executes a "delete whole index > request" on the Solr server. > > If you want to prevent this, you can add HTTP basic authentication to your > web container, as described in the solr wiki. > > In general: If you have e.g. an EC2 coud of solr servers, add an extra > security > group to your cloud and limit all access from outside. Then also no admin can > access this. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > > -----Original Message- > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > Sent: Tuesday, June 18, 2013 5:46 PM > > To: Uwe Schindler > > Cc: general > > Subject: Re: XSS Issue > > > > Yes he can do that but as I said the same problem can occur without > > his consent (and without a click) if he's on an arbitrary website > > which hosts a HTML IMG pointing to the vulnerable page of the solr > > administrator interface (like > src="http://X.X.X.X/solr/admin/xss_vulnerable_page/> ) > > > > I'm thankful for your quick responses despite I don't understand this > > philosophy. I note the point. > > > > Regards, > > > > Grégory DRAPERI > > > > > > 2013/6/18 Uwe Schindler > > > > > He can also delete his whole index with a single click on a http > > > link referring to his Solr server. This is his problem. Never click > > > on links from eMail. > > > Solr is, as said already, not secured at all. If you want a "secure" > > > Solr server, rewrite the whole thing. The same applies to other > > > Lucene based products like ElasticSearch that have no "security" included. > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > eMail: u...@thetaphi.de > > > > > > > > > > -Original Message- > > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > > Sent: Tuesday, June 18, 2013 5:26 PM > > > > To: Uwe Schindler > > > > Cc: general > > > > Subject: Re: XSS Issue > > > > > > > > Hi Uwe, > > > > > > > > Thank you for your quick response. > > > > > > > > I'm a little bit surprised because XSS is not a problem of making > > > > solr > > > accessible > > > > or not to Internet because this a reflected XSS. If an administrator > > > receives a > > > > mail with a malicious link pointing to the solr administrator > > > > interface > > > and > > > > containing a malicious payload he will execute the JavaScript if he > > > clicks on it. > > > > > > > > There also others techniques that can be used to make an solr > > > administrator > > > > executing this link without his consent (HTML IMG TAG pointing to > > > > the > > > solr > > > > administration interface and hosted on a malicious website) and > > > > that > > > will > > > > bypass network based protection. > > > > > > > > Regards, > > > > > > > > Grégory DRAPERI > > > > > > > > > > > > 2013/6/18 Uwe Schindler > > > > > > > > > Hi Grégory, > > > > >
RE: XSS Issue
Hi, you can of course send your investigation to priv...@lucene.apache.org, we greatly appreciate this. An XSS problem in the Solr Admin interface can for sure be solved somehow, but would not help to make Solr secure. Without the admin interface you can still add some image into any web page that executes a "delete whole index request" on the Solr server. If you want to prevent this, you can add HTTP basic authentication to your web container, as described in the solr wiki. In general: If you have e.g. an EC2 coud of solr servers, add an extra security group to your cloud and limit all access from outside. Then also no admin can access this. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gregory draperi [mailto:gregory.drap...@gmail.com] > Sent: Tuesday, June 18, 2013 5:46 PM > To: Uwe Schindler > Cc: general > Subject: Re: XSS Issue > > Yes he can do that but as I said the same problem can occur without his > consent (and without a click) if he's on an arbitrary website which hosts a > HTML IMG pointing to the vulnerable page of the solr administrator interface > (like http://X.X.X.X/solr/admin/xss_vulnerable_page/> ) > > I'm thankful for your quick responses despite I don't understand this > philosophy. I note the point. > > Regards, > > Grégory DRAPERI > > > 2013/6/18 Uwe Schindler > > > He can also delete his whole index with a single click on a http link > > referring to his Solr server. This is his problem. Never click on > > links from eMail. > > Solr is, as said already, not secured at all. If you want a "secure" > > Solr server, rewrite the whole thing. The same applies to other Lucene > > based products like ElasticSearch that have no "security" included. > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > Sent: Tuesday, June 18, 2013 5:26 PM > > > To: Uwe Schindler > > > Cc: general > > > Subject: Re: XSS Issue > > > > > > Hi Uwe, > > > > > > Thank you for your quick response. > > > > > > I'm a little bit surprised because XSS is not a problem of making > > > solr > > accessible > > > or not to Internet because this a reflected XSS. If an administrator > > receives a > > > mail with a malicious link pointing to the solr administrator > > > interface > > and > > > containing a malicious payload he will execute the JavaScript if he > > clicks on it. > > > > > > There also others techniques that can be used to make an solr > > administrator > > > executing this link without his consent (HTML IMG TAG pointing to > > > the > > solr > > > administration interface and hosted on a malicious website) and > > > that > > will > > > bypass network based protection. > > > > > > Regards, > > > > > > Grégory DRAPERI > > > > > > > > > 2013/6/18 Uwe Schindler > > > > > > > Hi Grégory, > > > > > > > > Solr should be always only listen on private networks, never make > > > > it accessible to the internet. This is officially documented; for > > > > more Information about this, see: > > > > http://wiki.apache.org/solr/SolrSecurity > > > > Solr uses HTTP as its programming API and you can do everything > > > > Java allows via HTTP, but HTTP does not mean it must be open to > > > > the internet. By opening a Solr server to the internet you are > > > > somehow wrapping everything Java allows to the internet, so it is > > > > not recommeneded. Solr also has no security features at all; > > > > managing this is all up to the front-end, sitting on internet or > > > > insecure > networks. > > > > > > > > There are already some issues open to limit some XSS and similar > > access: > > > > https://issues.apache.org/jira/browse/SOLR-4882 > > > > > > > > Uwe > > > > > > > > - > > > > Uwe Schindler > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > > eMail: u...@thetaphi.de > > > > > > > > > > > > > -Original Message- > > > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > > > Sent: Tuesday, June 18, 2013 3:13 PM > > > > > To: general@lucene.apache.org > > > > > Subject: XSS Issue > > > > > > > > > > Dear Solr project members, > > > > > > > > > > I think I have found a XSS (Cross-Site Scripting) issue in the 3.6.2 > > > > version of > > > > > Solr. > > > > > > > > > > How can I give you more details? > > > > > > > > > > Regards, > > > > > > > > > > -- > > > > > Grégory Draperi > > > > > > > > > > > > > > > > > -- > > > Grégory Draperi > > > > > > > -- > Grégory Draperi
RE: XSS Issue
He can also delete his whole index with a single click on a http link referring to his Solr server. This is his problem. Never click on links from eMail. Solr is, as said already, not secured at all. If you want a "secure" Solr server, rewrite the whole thing. The same applies to other Lucene based products like ElasticSearch that have no "security" included. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gregory draperi [mailto:gregory.drap...@gmail.com] > Sent: Tuesday, June 18, 2013 5:26 PM > To: Uwe Schindler > Cc: general > Subject: Re: XSS Issue > > Hi Uwe, > > Thank you for your quick response. > > I'm a little bit surprised because XSS is not a problem of making solr > accessible > or not to Internet because this a reflected XSS. If an administrator receives > a > mail with a malicious link pointing to the solr administrator interface and > containing a malicious payload he will execute the JavaScript if he clicks on > it. > > There also others techniques that can be used to make an solr administrator > executing this link without his consent (HTML IMG TAG pointing to the solr > administration interface and hosted on a malicious website) and that will > bypass network based protection. > > Regards, > > Grégory DRAPERI > > > 2013/6/18 Uwe Schindler > > > Hi Grégory, > > > > Solr should be always only listen on private networks, never make it > > accessible to the internet. This is officially documented; for more > > Information about this, see: http://wiki.apache.org/solr/SolrSecurity > > Solr uses HTTP as its programming API and you can do everything Java > > allows via HTTP, but HTTP does not mean it must be open to the > > internet. By opening a Solr server to the internet you are somehow > > wrapping everything Java allows to the internet, so it is not > > recommeneded. Solr also has no security features at all; managing this > > is all up to the front-end, sitting on internet or insecure networks. > > > > There are already some issues open to limit some XSS and similar access: > > https://issues.apache.org/jira/browse/SOLR-4882 > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > > -Original Message- > > > From: gregory draperi [mailto:gregory.drap...@gmail.com] > > > Sent: Tuesday, June 18, 2013 3:13 PM > > > To: general@lucene.apache.org > > > Subject: XSS Issue > > > > > > Dear Solr project members, > > > > > > I think I have found a XSS (Cross-Site Scripting) issue in the 3.6.2 > > version of > > > Solr. > > > > > > How can I give you more details? > > > > > > Regards, > > > > > > -- > > > Grégory Draperi > > > > > > > -- > Grégory Draperi
RE: XSS Issue
Hi Grégory, Solr should be always only listen on private networks, never make it accessible to the internet. This is officially documented; for more Information about this, see: http://wiki.apache.org/solr/SolrSecurity Solr uses HTTP as its programming API and you can do everything Java allows via HTTP, but HTTP does not mean it must be open to the internet. By opening a Solr server to the internet you are somehow wrapping everything Java allows to the internet, so it is not recommeneded. Solr also has no security features at all; managing this is all up to the front-end, sitting on internet or insecure networks. There are already some issues open to limit some XSS and similar access: https://issues.apache.org/jira/browse/SOLR-4882 Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gregory draperi [mailto:gregory.drap...@gmail.com] > Sent: Tuesday, June 18, 2013 3:13 PM > To: general@lucene.apache.org > Subject: XSS Issue > > Dear Solr project members, > > I think I have found a XSS (Cross-Site Scripting) issue in the 3.6.2 version > of > Solr. > > How can I give you more details? > > Regards, > > -- > Grégory Draperi
RE: Best way to construct term?
Very simple: new Term(fieldName, termText) The reason for the extra constructor and createTerm() in Lucene 3.x and before was the extra cost of interning (String.intern()) the field name. In Lucene 4.0 field names are no longer interned, because the index structure changed and field<->field comparisons in term enumerations is no longer needed. So just create a term by using the constructor. In general Term is just a light wrapper and no longer a fundamental component of Lucene, it is just used for "backwards compatibility" with earlier versions and mainly only used for constructing Query like new TermQuery(Term), From the implementation point of view, in Lucene 4.x every field is like a separate index, the terms of each field are represented by the new class BytesRef, which is a slice out of a larger byte[] array containing the data of many terms of a field in the index. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: bbarani [mailto:bbar...@gmail.com] > Sent: Friday, May 10, 2013 7:15 PM > To: general@lucene.apache.org > Subject: Best way to construct term? > > Hi, > > I am currently constructing a term using the below steps, > > Final Static (class level): Term t=new Term(fieldName); > > Inside some function(s): > > t.createTerm(termText); > > > It seems like createTerm method has been removed from Lucene 4.3.0 API, I > just thought of checking the best / efficient way to create a Term. Can > someone please guide me on that? > > Thanks, > BB > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Best- > way-to-construct-term-tp4062388.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Welcome David Smiley to the PMC
Welcome! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Steve Rowe [mailto:sar...@gmail.com] > Sent: Monday, March 18, 2013 3:14 PM > To: d...@lucene.apache.org > Cc: general@lucene.apache.org > Subject: Welcome David Smiley to the PMC > > I'm pleased to announce that David Smiley has accepted the PMC's invitation > to join. > > Welcome David! > > - Steve=
RE: Singular to plural
You have to do stemming on both indexing and query side! If the query submitted by the user is also stemmed the plurals get singular and a result is found. The important rule for Lucene is: Use the same Analyzer for indexing and query parsing (which is in most cases true, there are some special cases, but not related to stemming). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: amita [mailto:amita_bhak...@persistent.co.in] > Sent: Thursday, February 07, 2013 8:54 AM > To: general@lucene.apache.org > Subject: Singular to plural > > Hi, > > I am using Snowball Analyzer with English Stemmer. It stems the plural term > to singular and shows the proper search result however there is problem > with singular to plural. > My requirement is if document title is "guest room", it should be shown in > search result upon searching for "rooms". The terms indexed for my > document title are "guest" and "room". Since "rooms" is not indexed for this > document, it's not being shown currently. > Is there any way to achieve this? > > Regards, > Amita > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Singular- > to-plural-tp4038931.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Indexer search with filter and sort
This is because you tokenized/analyzed the field. It is sorted against the lowest term in the field ("sundheimer" is the largest value of all terms, so sorted first). This is why you cannot tokenize/analyze fields to sort against. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: johnbesel [mailto:johnbe...@web.de] > Sent: Monday, January 21, 2013 2:29 PM > To: general@lucene.apache.org > Subject: Re: Indexer search with filter and sort > > I made a Junit Test with some values. I sort values DESC and get following: > > im sundheimer fort 1-9 > krammwinkel/surkampstr.78 > sundweg > bray-sur-seine-str. 1 > bray-sur-seine-str. 1 > berck-sur-mer-str. 1 > > > this is not DESC, but also not ASC. what is this > SortField field=new SortField(sortKey, SortField.STRING, true); > List luceneSortFields = new ArrayList(); > luceneSortFields.add(field); > > Sort sort = new Sort(luceneSortFields.toArray(new > SortField[luceneSortFields.size()])); > topDocs = indexSearcher.search(booleanQuery, Integer.MAX_VALUE, sort); > > has anybody any Idea?? > what is wrong in my implementation? > > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Indexer- > search-with-filter-and-sort-tp4034733p4035047.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Indexer search with filter and sort
You have to index the field 2 times with different names. One time for search (analyzed) and one time for sorting (not analyzed). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: johnbesel [mailto:johnbe...@web.de] > Sent: Saturday, January 19, 2013 10:55 AM > To: general@lucene.apache.org > Subject: Indexer search with filter and sort > > Hello together, > > I work since 2 weeks with lucene and developed an application which put > some values into index and get it from it. > > it works very good. > > now I want to sort values, which I put into index. > I used a SortField to sort values: > new SortField(sortKey, SortField.STRING, SortDirection.DESC); > > It didn't work. > I found in internet that when I want to use a SortField with type String, I > should build Index with Index.NOT_ANALYZED. > (http://blog.richeton.com/2009/05/12/lucene-sort-tips/) > > I tested it, so now I could sort my values correct, BUT I could not search for > them :( > > How can I search and sort String values > > thank you for your help. > > P.S. > I use StandardAnalyzer and Lucene 3.6. > I put values into index > doc.add(new Field(KEY_CUSTOMER_NUMBER, customer.getKunnr() != null ? > customer.getKunnr() : "", Store.YES, Index.ANALYZED)); > > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Indexer- > search-with-filter-and-sort-tp4034733.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Welcome Sami Siren to the PMC
Welcome Sami! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Wednesday, December 12, 2012 9:17 PM > To: d...@lucene.apache.org; general@lucene.apache.org > Subject: Welcome Sami Siren to the PMC > > I'm please to announce that Sami Siren has accepted the PMC's invitation to > join. > > Welcome Sami! > > - Mark
RE: Lucene and Solr installation problems
Lucene / Solr does definitely not work with any GCJ or other CLASSPATH-based JVM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Saturday, November 10, 2012 7:11 PM > To: general@lucene.apache.org > Subject: Re: Lucene and Solr installation problems > > On Sat, Nov 10, 2012 at 12:49 PM, David Alyea wrote: > > > > Any ideas why lucene won't run on my server? > > $ java -version > java version "1.5.0" > gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4) > > I don't think Lucene will work with GCJ... you will need a fully functional > JVM.
RE: Welcome Alan Woodward as Lucene/Solr committer
Welcome Alan! It's good to have you on board! Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Wednesday, October 17, 2012 7:37 AM > To: d...@lucene.apache.org; Lucene mailing list > Subject: Welcome Alan Woodward as Lucene/Solr committer > > I'm pleased to announce that the Lucene PMC has voted Alan as a > Lucene/Solr committer. > > Alan has been contributing patches on various tricky stuff: positions > iterators, > span queries, highlighters, codecs, and so on. > > Alan: its tradition that you introduce yourself with your background. > > I think your account is fully working and you should be able to add yourself > to > the who we are page on the website as well. > > Congratulations! > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org
RE: Custom Filter Indexing Slow
The problem ist hat your transformation method needs Strings, but your incrementToken method also has a serious bug: It does not respect the length of the buffer, so it may hit additional garbage! The easiest way to do this in lots less code and not having those bugs: public boolean incrementToken() throws IOException { if (!input.incrementToken()) { return false; } final String normalizedLCcallnum = getLCShelfkey(charTermAttr.toString()); charTermAttr.setEmpty().append(normalizedLCcallnum); return true; } This fixes part of your performance problem: It does not 2 times convert the result of your transformation between char arrays, Strings,.. To further improve speed, make the method getLCShelfKey directly operatate on char[] and length. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk] > Sent: Friday, September 14, 2012 11:58 AM > To: general@lucene.apache.org > Subject: Custom Filter Indexing Slow > > Hi Folks, > > I have a custom filter which does everything I need it to but it has reduced > my > indexing speed to a crawl. Are there any methods I need to call to clear / > clean > things up once my script (details below) has done it's work? > > Thanks, > > Luke > > public LCCNormalizeFilter(TokenStream input) > { > super(input); > this.charTermAttr = addAttribute(CharTermAttribute.class); > } > > public boolean incrementToken() throws IOException { > > if (!input.incrementToken()) { > return false; > } > > char[] buffer = charTermAttr.buffer(); > String rawLCcallnum = new String(buffer); > String normalizedLCcallnum = getLCShelfkey(rawLCcallnum); > char[] newBuffer = normalizedLCcallnum.toCharArray(); > charTermAttr.setEmpty(); > charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length); > return true; > }=
RE: charFilter
Hi, You must *implement* the protected method correct(int offset) in your own charFilter, that does the following: call super.correct(offset) - (this is important if you chain several filters) and then return a corrected offset according to the transformations you did in your own charfilter. If e.g. the character at offset 3 corresponds to offset 5 in the filtered data, you must return 5 when the given offset (after calling super) is 3. Unrelated to that: Catching the IOException and printing it to system out is suboptimal to implement such a filter. Just make your constructor throw IOException itself, so it bubbles up to Solr. In the factory you can re-throw a SolrException. Your code would silently index nonsense or NPE later. In general, a CharFilter should *not* read the whole input up-front in constructor and then transform it, instead it should implement the read(...) methods and transform the input on-the-fly. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk] > Sent: Thursday, September 13, 2012 12:43 PM > To: general@lucene.apache.org > Subject: RE: charFilter > > Hi Folks, > > I'm getting the following error after using a custom filter: > > SEVERE: org.apache.solr.common.SolrException: > org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token PR > 2823.00 A0.20 S0.819880 exceeds length of provided text sized 15 > > As the error suggests, the input value is PR2823.A2S81988 (15 chars). I have > been informed that correctOffset() method of the CharFilter class can be used > to resolve this issue but as far as I can tell, all that does is return the > value - it > doesn't set it. > > I have included some details below. > > Kind Regards, > > Luke > > In my schema I have: > > sortMissingLast="true" omitNorms="true"> > >class="com.test.solr.analysis.LukesTestCharFilterFactory"/> > > > > > and the method is: > > public class LukesTestCharFilterFactory extends BaseCharFilterFactory { > > public CharStream create(CharStream input) { > return new LukesTestCharFilter(input); > } > } > > public final class LukesTestCharFilter extends BaseCharFilter { ... > public LukesTestCharFilter(CharStream input) { > super(input); > try { > // Load the whole input into a string > StringBuilder sb = new StringBuilder(); > char[] buf = new char[1024]; > > int len; > while ((len = input.read(buf)) >= 0) { > sb.append(buf, 0, len); > } > > String original = sb.toString(); > String modified = getLCShelfkey(original); > CharStream result = CharReader.get(new StringReader(modified)); > > this.input = result; > this.input.correctOffset(modified.length()); > } catch (IOException e) { > System.err.println("There was a problem parsing input. Skipping."); > } > } > ... > } > =
RE: file handle leaks appearing on Index files
Hi, this is not a leak, lsof will report "deleted" files with filehandles still open, when at the time when the changes were committed by IndexWriter, in parallel another IndexReader is open, that stays on an older snapshot of the files. In that case, this IndexReader still uses files, not yet completely deleted by the file system (inode is still there, but directory entry is already deleted). You have to use IndexReader.openIfChanged() or open a completely new instance with IndexReader.open() to get an updated view on the index after committing changes to IndexWriter. Don't forget to close the old IndexReader! If you don't do this, the older snapshot view is still references, preventing files from being completely deleted and disappearing from lsof. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Jiajun Chen [mailto:cjjvict...@gmail.com] > Sent: Thursday, September 13, 2012 6:36 AM > To: general@lucene.apache.org > Subject: file handle leaks appearing on Index files > > heapdump show the instance of IndexReader hava 164 at Thu Sep 13 12:04:00 > CST 2012. > > > $ lsof |grep deleted > > reports the following: > .. > java 13436 uu 5086r REG9,4 7573 79311898 > /var/index/full/20120910/_v68n.fdt (deleted) > java 13436 uu 5087r REG9,4 2340 79311970 > /var/index/full/20120910/_v68n.fdx (deleted) > java 13436 uu 5088r REG9,4 2058 79311887 > /var/index/full/20120910/_v68o.fdt (deleted) > java 13436 uu 5089r REG9,4636 79311854 > /var/index/full/20120910/_v68o.fdx (deleted) > java 13436 uu 5090w REG9,4 8038 79312040 > /var/index/full/20120910/_v68p.fdt (deleted) > java 13436 uu 5091r REG9,4 2476 79312050 > /var/index/full/20120910/_v68p.fdx (deleted) > java 13436 uu 5092r REG9,4 7015 79312087 > /var/index/full/20120910/_v68q.fdt (deleted) > java 13436 uu 5093r REG9,4 2332 79312091 > /var/index/full/20120910/_v68q.fdx (deleted) > java 13436 uu 5094r REG9,4648 79312128 > /var/index/full/20120910/_v68r.fdt (deleted) . > > $ lsof |grep deleted |wc -l ;date > > reports the following: > > 494 > Wed Sep 12 23:11:40 CST 2012 > > 506 > Wed Sep 12 23:22:57 CST 2012 > > 560 > Wed Sep 12 23:34:56 CST 2012 > > 560 > Wed Sep 12 23:46:29 CST 2012 > > 560 > Wed Sep 12 23:49:56 CST 2012 > > 566 > Wed Sep 12 23:56:08 CST 2012 > > 4275 > Thu Sep 13 12:04:00 CST 2012
RE: charFilter
You have to implement the correctOffset method to take care of deleted or added chars. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk] > Sent: Wednesday, September 12, 2012 7:25 PM > To: general@lucene.apache.org > Subject: charFilter > > Hi Folks, > > I have created a custom charFilter for use in Solr which does everything I > need > it to with one exception - it kills Solr when highlighting is used. > > I am modifying the input with the following: > > public myCharFilter (ChearStream input) { super(input); > > ... > > CharStream result = CharReader.get(new StringReader(modified)); this.input = > result > > } > > Is there any way of modifying the input offset to that it doesn't throw the > error? > > Thanks, > > Luke
RE: OOM with Lucene 3.6 & Jrockit
Hi, Have you read http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html ? Especially the last part about configuring your system correctly for mmap is important. Mmap can handle index files with hundreds of Gigabytes on systems with less physical RAM, you just have to define the ulimit settings correctly. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Snehal Chennuru [mailto:snehal.ku...@gmail.com] > Sent: Tuesday, September 11, 2012 12:52 AM > To: general@lucene.apache.org > Subject: Re: OOM with Lucene 3.6 & Jrockit > > It turns out that using MMapDirectory was causing OOM exception as the index > size was over 20GB. Changed it to use SimpleFSDirectory avoids this issue. > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/OOM-with- > Lucene-3-6-Jrockit-tp4006487p4006747.html > Sent from the Lucene - General mailing list archive at Nabble.com.
[ANNOUNCE] Apache Solr 3.6.1 released
22 July 2012, Apache SolrT 3.6.1 available The Lucene PMC is pleased to announce the release of Apache Solr 3.6.1. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This release is a bug fix release for version 3.6.0. It contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see note below). See the CHANGES.txt file included with the release for a full list of details. Solr 3.6.1 Release Highlights: * The concurrency of MMapDirectory was improved, which caused a performance regression in comparison to Solr 3.5.0. This affected users with 64bit platforms (Linux, Solaris, Windows) or those explicitely using MMapDirectoryFactory. * ReplicationHandler "maxNumberOfBackups" was fixed to work if backups are triggered on commit. * Charset problems were fixed with HttpSolrServer, caused by an upgrade to a new Commons HttpClient version in 3.6.0. * Grouping was fixed to return correct count when not all shards are queried in the second pass. Solr no longer throws Exception when using result grouping with main=true and using wt=javabin. * Config file replication was made less error prone. * Data Import Handler threading fixes. * Various minor bugs were fixed. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Uwe Schindler (release manager) & all Lucene/Solr developers - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
[ANNOUNCE] Apache Lucene 3.6.1 released
22 July 2012, Apache LuceneT 3.6.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 3.6.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. This release is a bug fix release for version 3.6.0. It contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/core/mirrors-core-3x-redir.html (see note below). See the CHANGES.txt file included with the release for a full list of details. Lucene 3.6.1 Release Highlights: * The concurrency of MMapIndexInput.clone() was improved, which caused a performance regression in comparison to Lucene 3.5.0. * MappingCharFilter was fixed to return correct final token positions. * QueryParser now supports +/- operators with any amount of whitespace. * DisjunctionMaxScorer now implements visitSubScorers(). * Changed the visibility of Scorer#visitSubScorers() to public, otherwise it's impossible to implement Scorers outside the Lucene package. This is a small backwards break, affecting a few users who implemented custom Scorers. * Various analyzer bugs where fixed: Kuromoji to not produce invalid token graph due to UNK with punctuation being decompounded, invalid position length in SynonymFilter, loading of Hunspell dictionaries that use aliasing, be consistent with closing streams when loading Hunspell affix files. * Various bugs in FST components were fixed: Offline sorter minimum buffer size, integer overflow in sorter, FSTCompletionLookup missed to close its sorter. * Fixed a synchronization bug in handling taxonomies in facet module. * Various minor bugs were fixed: BytesRef/CharsRef copy methods with nonzero offsets and subSequence off-by-one, TieredMergePolicy returned wrong-scaled floor segment setting. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Uwe Schindler (release manager) & all Lucene/Solr developers - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: Error message running: ant clean test
You don't need to download asm at all. Just use a plain default ANT installation as extracted from ANT's zip file. Also Nuke your user's ~/.ant directory if possible. ANT does that automatiucally when building Lucene/Solr. The problem you had was a preexisting version in a global lib folder that should not be there. ANT always prefers global lib folders over local ones. So it did not respect Lucene's requirements. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jellyman [mailto:colm_r...@hotmail.com] > Sent: Thursday, July 19, 2012 4:51 PM > To: general@lucene.apache.org > Subject: RE: Error message running: ant clean test > > Hey Uwe, > >Thanks. I found that I'm asm running version 2.2.3! I'll uninstall and download > something 4.0.0+ > >I'm working on a windows box btw. I guess I can install ant anywhere... > > Thanks > jellym > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Error- > message-running-ant-clean-test-tp3995956p3995981.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Error message running: ant clean test
Hi, > I got latest version and rebuilt. That changed the error message a bit > (included below). Can you please explain your second point in a bit more detail > please. I'm very new to ASM (in fact I don't even know what this means). >For example how do I know that the ~/.ant/lib folder contains an outdated > and old ASM versions? This is not immediately obvious to me due to my > ignorance of Java tech. Can you hand hold me a for a bit? That is obvious from the exception messages. ASM 4.0 completely changed the API in a backwards-incompatible way. The only chance that this can hit you is when you have customized your ANT installation with extension modules (I have no idea which). Those modules could be installed in: ~/.ant/lib $ANT_HOME/lib I cannot say more, look into those directories and look for asm-xxx.jar files. If there is any version < 4.0, you cannot build Lucene with this configuration. I would recommend to uninstall ANT, reinstall a new ANT version (downloaded from APACHE) and clean up you ~/.ant folder. > BUILD FAILED > C:\trunk\build.xml:55: The following error occurred while executing this > line: > C:\trunk\lucene\build.xml:176: java.lang.IncompatibleClassChangeError: class > org.apache.lucene.validation.ForbiddenApisCheckTask$ClassSignatureLookup$1 > has interface org.objectweb.asm.ClassVisitor as super class > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(Unknown Source) > at > org.apache.tools.ant.AntClassLoader.defineClassFromData(AntClassLoader.java > :1128) > at > org.apache.tools.ant.AntClassLoader.getClassFromStream(AntClassLoader.java: > 1299) > at > org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.ja > va:1355) > at > org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1315) > at > org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1068) > at java.lang.ClassLoader.loadClass(Unknown Source) > at > org.apache.lucene.validation.ForbiddenApisCheckTask$ClassSignatureLookup. nit>(ForbiddenApisCheckTask.java:457) > at > org.apache.lucene.validation.ForbiddenApisCheckTask.getClassFromClassLoade > r(ForbiddenApisCheckTask.java:92) > at > org.apache.lucene.validation.ForbiddenApisCheckTask.addSignature(Forbidden > ApisCheckTask.java:133) > at > org.apache.lucene.validation.ForbiddenApisCheckTask.parseApiFile(ForbiddenA > pisCheckTask.java:170) > at > org.apache.lucene.validation.ForbiddenApisCheckTask.execute(ForbiddenApisC > heckTask.java:353) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.Target.execute(Target.java:392) > at org.apache.tools.ant.Target.performTasks(Target.java:413) > at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) > at > org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckE > xecutor.java:38) > at org.apache.tools.ant.Project.executeTargets(Project.java:1251) > at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442) > at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:303) > at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.Target.execute(Target.java:392) > at org.apache.tools.ant.Target.performTasks(Target.java:413) > at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) > at org.apache.tools.ant.Project.executeTarget(Project.java:1368) > at > org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.j a > va:41) > at org.apache.
RE: Error message running: ant clean test
You should update your SVN, this class is no longer used in trunk! There seems to be some classpath + source checkout confusion. If you still get those errors, please check if you ~/.ant/lib folder contain outdated and old ASM versions. Maybe you installed some ANT plugin in your classpath that ships with outdated ASM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jellyman [mailto:colm_r...@hotmail.com] > Sent: Thursday, July 19, 2012 3:26 PM > To: general@lucene.apache.org > Subject: Error message running: ant clean test > > Hi, > >I have downloaded the Solr and Lucene source code. I'm running the > command: "ant clean test" from C:\trunk and am getting hte following error > message: > > > BUILD FAILED > C:\trunk\build.xml:55: The following error occurred while executing this > line: > C:\trunk\lucene\build.xml:180: java.lang.NoSuchMethodError: > org.objectweb.asm.tree.ClassNode.(I)V > at > org.apache.lucene.validation.ForbiddenApisCheckTask.addSignature(Forbidden > ApisCheckTask.java:128) > at > org.apache.lucene.validation.ForbiddenApisCheckTask.parseApiFile(ForbiddenA > pisCheckTask.java:175) > at > org.apache.lucene.validation.ForbiddenApisCheckTask.execute(ForbiddenApisC > heckTask.java:301) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.Target.execute(Target.java:392) > at org.apache.tools.ant.Target.performTasks(Target.java:413) > at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) > at > org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckE > xecutor.java:38) > at org.apache.tools.ant.Project.executeTargets(Project.java:1251) > at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442) > at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:303) > at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68) > at > org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > at > org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) > at org.apache.tools.ant.Task.perform(Task.java:348) > at org.apache.tools.ant.Target.execute(Target.java:392) > at org.apache.tools.ant.Target.performTasks(Target.java:413) > at > org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) > at org.apache.tools.ant.Project.executeTarget(Project.java:1368) > at > org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.j a > va:41) > at org.apache.tools.ant.Project.executeTargets(Project.java:1251) > at org.apache.tools.ant.Main.runBuild(Main.java:811) > at org.apache.tools.ant.Main.startAnt(Main.java:217) > at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280) > at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109) > > Total time: 2 minutes 7 seconds > > > I have checked the the file is there as the method (though it is private, should > that matter?). > > Does anyone have any ideas/suggestions/ideas to help me fix this? Would > greatly appreciate any comments. > > Struggling, > jellyman > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Error- > message-running-ant-clean-test-tp3995956.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Just getting started with Eclipse & lucene source code -- please help
Hi, To compile your own project based on Lucene as library, you don't need to import the whole Lucene source code. Just download the distribution JAR files of Lucene by adding Maven or Ivy-dependencies (lucene-core.jar) to your project. Downloading and installing the source distribution of Lucene is not for own projects, it is for "developing of Lucene/Solr itself". Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jellyman [mailto:colm_r...@hotmail.com] > Sent: Wednesday, July 11, 2012 6:22 PM > To: general@lucene.apache.org > Subject: Re: Just getting started with Eclipse & lucene source code -- please > help > > Actually I'm seeing a lot of red underline errors like: "the import > org.apache.lucene cannot be resolved" > > What's going on. I pointed the project to C:\trunk\solr > > Slightly worried, > jellyman > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Just-getting- > started-with-Eclipse-lucene-source-code-please-help-tp3994398p3994423.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: TermFreqVector cannot be resolved to a type
You should fix your classpath, it may contain different Lucene versions! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Aoi Morida [mailto:xu.xum...@gmail.com] > Sent: Monday, May 14, 2012 10:19 PM > To: general@lucene.apache.org > Subject: TermFreqVector cannot be resolved to a type > > Hi all, > > I am using lucene to do index and I want to get the term frequency vector. > > I use this code: > Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY); > IndexReader indexReader = IndexReader.open(directory); TermFreqVector > vector=indexReader.getTermFreqVector(1, "subject"); > > But eclipse always tells me that TermFreqVector cannot be resolved to a type. > > I cannot figure out what's wrong. > > Regards, > > Aoi > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/TermFreqVector-cannot-be-resolved-to-a- > type-tp3983748.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: CouchDB-Lucene Integration
Hi, I have no idea, about what are you talking about. Maybe that question is better asked on the CouchDB mailing list, as we are only providing the infrastructure, but CouchDB is just one of many *users* of Lucene / Solr? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Fabian Seitz [mailto:se...@fzi.de] > Sent: Tuesday, April 24, 2012 1:39 PM > To: general@lucene.apache.org > Subject: CouchDB-Lucene Integration > > Hi, > > > I installed Apache CouchDB ("version":"1.2.0") on my System (Windows XP 32- > bit) and downloaded the Lucene-files (version: 3.6.0). I tried to integrate Lucene > with CouchDB but I didn't manage to get it work. > > I need the Lucene extension for being able to run an existing project. I've > already pushed the project files to CouchDB but without Lucene it won't work. > > Since I'm new to CouchDB and Lucene and I've already tried several things I > don't know how to move on. Can anyone help me with that? > > Fabian
RE: Welcome Martijn van Groningen to the Lucene PMC
Welcome! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Wednesday, February 08, 2012 2:17 AM > To: Lucene mailing list; d...@lucene.apache.org > Subject: Welcome Martijn van Groningen to the Lucene PMC > > Hello, > > I'm pleased to announce that the Lucene PMC has voted to add Martijn as a > PMC Member. > > Congratulations Martijn! > > -- > lucidimagination.com
RE: indexSearcher using NumericRangeQuery doesn't gives result. Any help ?
To get the value you have to enable storing the NumericField. By default it is only indexed: new NumericField(name, Field.Store.YES, true). You can then get back the value as string with doc.get(name) from search results - or since Lucene 3.4 as Number instance: ((NumericField) doc.getFieldable(name)).getNumericValue().intValue() - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: gubs [mailto:gub...@gmail.com] > Sent: Thursday, December 08, 2011 5:47 PM > To: general@lucene.apache.org > Subject: RE: indexSearcher using NumericRangeQuery doesn't gives result. Any > help ? > > HI Uwe, > > Thanks for your reply. I followed the same stuffs as suggested by you in the NF > and NRQ object creation. Still, i can see the count of the hits in docs. > But, not able to fetch the value successfully and print it. Do you see any other > ways to get the value from the docs ? > > Gubs > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/indexSearcher-using- > NumericRangeQuery-doesn-t-gives-result-Any-help-tp3569338p3570762.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: indexSearcher using NumericRangeQuery doesn't gives result. Any help ?
Hi, > I am new to lucene library. I need to write some numeric field in the doc using > indexwriter and using index searcher i need to search if the value range (age > > 40) as example. > > IndexWriter Snippet below. The value is available in the doc properly. > (Document 0>>) > Document doc = new Document(); > NumericField numericField = new NumericField("title", > Integer.parseInt(value)); > numericField.setIntValue(Integer.parseInt(value)); > doc.add(numericField); > iw.addDocument(doc); >iw.optimize(); > iw.close(); This code is wrong. The 2nd value passed to NumericFields ctor is a configuration constant and not the value. Just leave the default, so use 'new NumericField("title")' - otherwise RTFM. > indexSearcher snippet below. IndexSearcher prints the hits length correctly. > Having said the result is not getting printed from the doc. Any help ? I spend so > much time and failed to find. > > Query queryParser = NumericRangeQuery.newIntRange("title", 40, 6000, true, > true); This one is correct and uses the default precisionStep. Because of the mismatch between precision steps on NumericField and NumericRangeQuery, no results are returned. For simple use cases it's better to use the default precisionStep (so don't pass it to neither NRQ nor NF). > // 3. Search > int hitsPerPage = 10; > IndexSearcher indexSearcher = new IndexSearcher(index, true); > TopScoreDocCollector collector = > TopScoreDocCollector.create( > hitsPerPage, true); > indexSearcher.search(queryParser, collector); > > ScoreDoc[] hits = collector.topDocs().scoreDocs; > > // 4. Display result > log.info("List of docs found : " + hits.length); > for (int i = 0; i < hits.length; i++) { > int docId = hits[i].doc; > System.out.println(docId); > Document doc = indexSearcher.doc(docId); > log.info(i + 1 + " . " + doc.get("title")); > } Uwe
RE: MaxFieldLength in Lucene 3.4
Hi, This option is a safety thing in the case you cannot trust your input data. Maybe you suddenly tokenize a binary file and produce millions of random tokens. In that case only maybe 1 are generated. If you input data is trusted and text-based (e.g. read from elements in XML files, databases,...), then you don't need this filter. > Maybe I am too far behind the times. I was updating some pretty old stuff. > I think it was written originally with Lucene 1.4. I seem to recall that Lucene > v1.x had analyzers where the default was "limited", because I learned pretty > early that I had to set that option during indexing. Perhaps at some point the The limiting option was almost always on IndexWriter, but it defaulted to 1 tokens from the beginning. The analyzers had nothing to do with this option. The recent change removed the token counting from IndexWriter (as it only makes the already complicated code more unreadable) and was moved to a simple TokenFilter because it's much more reasonable to do it during analysis. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Joe MA [mailto:mrj...@comcast.net] > Sent: Thursday, December 01, 2011 9:24 AM > To: general@lucene.apache.org > Subject: RE: MaxFieldLength in Lucene 3.4 > > > > "of course all other analyzers are unlimited" > > Maybe I am too far behind the times. I was updating some pretty old stuff. > I think it was written originally with Lucene 1.4. I seem to recall that Lucene > v1.x had analyzers where the default was "limited", because I learned pretty > early that I had to set that option during indexing. Perhaps at some point the > switch was made to default unlimited. Thanks your answer clears it up. > > One question - why even have this option now? Are things more efficient with a > limited token field? If you know your data is 'bounded', should you always limit > the token field to improve performance? > > Thanks! > > > -Original Message- > From: Uwe Schindler [mailto:u...@thetaphi.de] > Sent: Monday, November 28, 2011 2:41 AM > To: general@lucene.apache.org > Subject: RE: MaxFieldLength in Lucene 3.4 > > Hi, > > The move is simple - LimitTokenCountAnalyzer is just a wrapper around any > other Analyzer, so I don't really understand your question - of course all other > analyzers are unlimited. If you have myAnalyzer with myMaxFieldLengthValue > used before, you can change your code as follows: > > Before: > new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, > myAnalyzer).setFoo().setBar().setMaxFieldLength(myMaxFieldLengthValue)); > > After: > new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, new > LimitTokenCountAnalyzer(myAnalyzer, > myMaxFieldLengthValue)).setFoo().setBar()); > > You only have to do this on the indexing side, on the query side > (QueryParser) just use myAnalyzer without wrapping. With the new code, the > responsibilities for cutting the field after a specific number of tokens was > moved out out the indexing code in Lucene. This is now just an analysis feature > not a indexing feature anymore. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Joe MA [mailto:mrj...@comcast.net] > > Sent: Monday, November 28, 2011 8:09 AM > > To: general@lucene.apache.org > > Subject: MaxFieldLength in Lucene 3.4 > > > > While upgrading to Lucene 3.4, I noticed the MaxFieldLength values on the > > indexers are deprecated. There appears to be a LimitTokenCountAnalyzer > > that limits the tokens - so does that mean the default for all other > analyzers is > > unlimited? > > > > Thanks in advance - > > JM
RE: MaxFieldLength in Lucene 3.4
Hi, The move is simple - LimitTokenCountAnalyzer is just a wrapper around any other Analyzer, so I don't really understand your question - of course all other analyzers are unlimited. If you have myAnalyzer with myMaxFieldLengthValue used before, you can change your code as follows: Before: new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, myAnalyzer).setFoo().setBar().setMaxFieldLength(myMaxFieldLengthValue)); After: new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, new LimitTokenCountAnalyzer(myAnalyzer, myMaxFieldLengthValue)).setFoo().setBar()); You only have to do this on the indexing side, on the query side (QueryParser) just use myAnalyzer without wrapping. With the new code, the responsibilities for cutting the field after a specific number of tokens was moved out out the indexing code in Lucene. This is now just an analysis feature not a indexing feature anymore. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Joe MA [mailto:mrj...@comcast.net] > Sent: Monday, November 28, 2011 8:09 AM > To: general@lucene.apache.org > Subject: MaxFieldLength in Lucene 3.4 > > While upgrading to Lucene 3.4, I noticed the MaxFieldLength values on the > indexers are deprecated. There appears to be a LimitTokenCountAnalyzer > that limits the tokens - so does that mean the default for all other analyzers is > unlimited? > > Thanks in advance - > JM
RE: Index Corruption in Lucene 2.9.3
Hi, In general it's a bad idea to use Lucene on network-mounted drives. E.g., NFS is heavily broken with the file locking used by Lucene (NIO does not work at all, and file-based lock support fails because directory updated may not be visible at all times, or are visible before files are flushed - happens-before is violated). This can lead to index corruption; you should use local disks, especially as they are much faster. ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Nishesh [mailto:nishesh.gu...@emc.com] > Sent: Monday, November 14, 2011 8:47 PM > To: general@lucene.apache.org > Subject: Index Corruption in Lucene 2.9.3 > > We are seeing Index corruption very often with version 2.9.3. Our indexing > process is on Linux ( centos 5 ). Index is created on a mounted drive which is a > shared drive from Windows 2008 server running in a VM. We generally see > index corruption in merge or optimize after indexing runs continuously for > 6-7 hrs with index size reaching around 7-8GB. To reproduce the corruptions > sooner, I have placed the merge ( maybemerge ) call immediately after > addIndex is called. We have a final index which is in the mounted drive, we > always add documents to an local intermediate index and then call add index > and merge to the final index. > > The exception that I get - > > 2011-11-11 15:19:16,929 [MC:10.10.176.148-1321045422606-204 > FS:emag_393219_0] ERROR indexer - MergeWithFiler: MC: 393219, shard 0, > guid > 10.10.176.148-1321045422606-204: Error in addIndex()/kazMaybeMerge(): > /sideline/fs_393219/cas/search/index_0/primary, java.io.IOException: > background merge hit exception: _1t:c262436 > _10.10.176.148-1321045422606-204_0:cx4000 into _1u [optimize] > [mergeDocStores] % STACK: > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2932) > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2867) > org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2837) > org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3652) > > com.kazeon.search.indexingengine.context.MergerContext.mergeWithFilerInde > x(MergerContext.java:1004) > > com.kazeon.search.indexingengine.context.MergerContext.mergeWithFilerInde > x(MergerContext.java:1094) > > com.kazeon.search.indexingengine.context.MergerContext.mergeWithFiler(Me > rgerContext.java:1140) > > com.kazeon.search.indexingengine.statemachine.modifiers.merger.LocalIndex > OptimizeAndCompressModifier.modifyStateAux(LocalIndexOptimizeAndCompr > essModifier.java:375) > > com.kazeon.search.indexingengine.statemachine.modifiers.merger.LocalIndex > OptimizeAndCompressModifier.mergeAllICs(LocalIndexOptimizeAndCompress > Modifier.java:181) > > com.kazeon.search.indexingengine.statemachine.modifiers.merger.LocalIndex > OptimizeAndCompressModifier.modifyState(LocalIndexOptimizeAndCompress > Modifier.java:106) > com.kazeon.util.scoreboard.WorkerThread.run(WorkerThread.java:31) > > > CheckIndex command shows the following output - > > NOTE: testing will be more thorough if you run java with '- > ea:org.apache.lucene...', so assertions are enabled > > Opening index @ . > > Segments file=segments_23 numSegments=1 version=FORMAT_DIAGNOSTICS > [Lucene 2.9] > 1 of 1: name=_1t docCount=262436 > compound=true > hasProx=true > numFiles=1 > size (MB)=937.835 > diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.18-92.1.18.el5, > os=Linux, mergeDocStores=true, lucene.version=2.9.3-dev, source=merge, > os.arch=i386, java.version=1.6.0_02, java.vendor=Sun Microsystems Inc.} > no deletions > test: open reader.OK > test: fields..OK [79 fields] > test: field norms.OK [79 fields] > test: terms, freq, prox...ERROR [term fulltext:creativecommons: doc > 262603 >= maxDoc 262436] > java.lang.RuntimeException: term fulltext:creativecommons: doc 262603 >= > maxDoc 262436 > at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:646) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530) > at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903) > test: stored fields...OK [524872 total field count; avg 2 fields per doc] > test: term vectorsOK [0 total vector count; avg 0 term/freq vector fields > per doc] FAILED > WARNING: fixIndex() would remove reference to this segment; full > exception: > java.lang.RuntimeException: Term Index test failed > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543) > at org.apache.lucene.index.CheckIndex.main(Chec
Java 7u1 fixes index corruption and crash bugs in Apache Lucene Core and Apache Solr
Hi users of Apache Lucene Core and Apache Solr, Oracle released Java 7u1 [1] on October 19. According to the release notes and tests done by the Lucene committers, all bugs reported on July 28 are fixed in this release, so code using Porter stemmer no longer crashes with SIGSEGV. We were not able to experience any index corruption anymore, so it is safe to use Java 7u1 with Lucene Core and Solr. On the same day, Oracle released Java 6u29 [2] fixing the same problems occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or -XX:+OptimizeStringConcat were used. Of course, you should not use experimental JVM options like -XX:+AggressiveOpts in production environments! We recommend everybody to upgrade to this latest version 6u29. In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package! On behalf of the Apache Lucene/Solr committers, Uwe Schindler [1] http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html [2] http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: Question about file format
Bytes since recent Lucene versions (on or after 2.4). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Lorenzo Luengo [mailto:lolue...@gmail.com] > Sent: Tuesday, October 04, 2011 7:23 PM > To: general@lucene.apache.org > Subject: Question about file format > > Hi all, > > I'm trying to make my own reader for lucene files, in pure python (i haven't > found a suitable library for windows x64). And while reading docs, a question > arises. > > In http://lucene.apache.org/java/3_4_0/fileformats.html#String it says that the > string is composed of an VInt and a sequence of modified UTF-8 encoded chars. > My question is: That VInt is the length of the string before encoding or is the > number of encoded bytes? > > Regards. > > -- > Lorenzo Luengo C. > Ingeniero Civil Electrónico > Cel: 98270385
RE: Having trouble getting QueryParser to work...
Hi, I would also use NumericField + NumericRangeQuery for date fields. There are several possibilities to incorporate those, the easiest is to use a LONG numeric field and encode Date.getTime() [milliseconds since 1970-01-01T00:00:00.000] into it. The flexible QueryParser in contrib can directly use those encoded fields and parser the entered query string to dates (with the corresponding configuration). See the testcase for a usage example. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael Remijan [mailto:mjremi...@yahoo.com] > Sent: Wednesday, September 21, 2011 7:34 PM > To: general@lucene.apache.org > Subject: Re: Having trouble getting QueryParser to work... > > OK, that worked great. > > Now, when I am indexing a date, I see that DateField is deprecated so I am > using just a Field for this... > > > new Field( > "day" // The name of the field > ,DateTools.dateToString(day, Resolution.DAY) // The string to > process > ,Field.Store.NO // Whether value should be > stored in the index > ,Field.Index.ANALYZED // Whether the field should > be indexed, and if so, if it should be tokenized before indexing > ) > > so when I'm searching a date field range, I would use a TermRangeQuery > correct? > > > > > From: Uwe Schindler > To: general@lucene.apache.org; 'Michael Remijan' > Sent: Wednesday, September 21, 2011 11:24 AM > Subject: RE: Having trouble getting QueryParser to work... > > Lucene 3.4 has NumericField support in it's flexible QueryParser > (contrib/queryparser). The core Queryparser has no idea about numeric fields > and always produces TermQuery/TermRangeQuery. > > To your code: In general you should use NumericRangeQuery always and not > TermQuery with NumericUtils (which is internal expert class) on numeric fields. > Just use upper=lower, speed is same and it does not wrongly rank the results. > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -Original Message- > > From: Michael Remijan [mailto:mjremi...@yahoo.com] > > Sent: Wednesday, September 21, 2011 6:20 PM > > To: general@lucene.apache.org > > Subject: Having trouble getting QueryParser to work... > > > > Sorry if this is an obvious question. > > > > I have the following BooleanQuery set up which works fine, and when I > > say "fine" I mean if I change the search values to values which I know > > are not > in the > > index then the search returns no results. So this works OK. > > > > Query > > dinnerQuery = new TermQuery(new Term("entry", > > "dinner")) > > ,accountIdQuery = new TermQuery(new Term("accountid", > > NumericUtils.intToPrefixCoded(1))) > > ; > > BooleanQuery > > query = new BooleanQuery(); > > query.add(accountIdQuery, Occur.MUST); > > query.add(dinnerQuery, Occur.MUST); > > > > When I run the above code I get 1 result I am expecting: > > > > Found 1 hits > > HIT #1 > > accountid = 1 > > journalid = 1 > > id = 306 > > > > > > > > Now I've been trying to convert this to use a QueryParser expression > > but I > have > > not had any luck. Here is my first attempt. > > > > String str = > > "accountid:1 AND entry:dinner" > > ; > > Query query > > = parser.parse(str); > > > > When I execute this, I get no results: > > > > Found 0 hits > > > > So I changed the query to use NumericUtils thinking that might be the > > problem... > > > > String str = > > "accountid:" +NumericUtils.intToPrefixCoded(1)+ " AND > > entry:dinner" > > ; > > Query query > > = parser.parse(str); > > > > When I execute this, I thought I got the results I was looking for > > because > the > > query found the 1 hit it was suppose to, however, during testing I > > found I > could > > put any value i want in for accountid and the search will always > > return > the 1 > > hit. > > > > So I'm not sure what I'm doing wrong and why QueryParser is giving the > results > > it is.
RE: Having trouble getting QueryParser to work...
Lucene 3.4 has NumericField support in it's flexible QueryParser (contrib/queryparser). The core Queryparser has no idea about numeric fields and always produces TermQuery/TermRangeQuery. To your code: In general you should use NumericRangeQuery always and not TermQuery with NumericUtils (which is internal expert class) on numeric fields. Just use upper=lower, speed is same and it does not wrongly rank the results. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael Remijan [mailto:mjremi...@yahoo.com] > Sent: Wednesday, September 21, 2011 6:20 PM > To: general@lucene.apache.org > Subject: Having trouble getting QueryParser to work... > > Sorry if this is an obvious question. > > I have the following BooleanQuery set up which works fine, and when I say > "fine" I mean if I change the search values to values which I know are not in the > index then the search returns no results. So this works OK. > > Query > dinnerQuery = new TermQuery(new Term("entry", "dinner")) > ,accountIdQuery = new TermQuery(new Term("accountid", > NumericUtils.intToPrefixCoded(1))) > ; > BooleanQuery > query = new BooleanQuery(); > query.add(accountIdQuery, Occur.MUST); > query.add(dinnerQuery, Occur.MUST); > > When I run the above code I get 1 result I am expecting: > > Found 1 hits > HIT #1 > accountid = 1 > journalid = 1 > id = 306 > > > > Now I've been trying to convert this to use a QueryParser expression but I have > not had any luck. Here is my first attempt. > > String str = > "accountid:1 AND entry:dinner" > ; > Query query > = parser.parse(str); > > When I execute this, I get no results: > > Found 0 hits > > So I changed the query to use NumericUtils thinking that might be the > problem... > > String str = > "accountid:" +NumericUtils.intToPrefixCoded(1)+ " AND > entry:dinner" > ; > Query query > = parser.parse(str); > > When I execute this, I thought I got the results I was looking for because the > query found the 1 hit it was suppose to, however, during testing I found I could > put any value i want in for accountid and the search will always return the 1 > hit. > > So I'm not sure what I'm doing wrong and why QueryParser is giving the results > it is.
RE: Upgrade solr
Hi SN, The latest stable Solr release is Solr 3.3, with 3.4 coming this month. Solr 4.0 and Lucene 4.0 are both not yet stable, as we are still changing APIs and optimizing things like the index format, algorithms. You can use Solr 4.0 snapshot, if you really need the new features and report back what you find out. But you need to know, that we might change index formats from one day to the other, so after an upgrade to a later trunk version, your indexes may no longer be readable and throw scary Exceptions when reading indexes. Upgrading 3.x index formats to 4.0 unstable is always possible, just not 4.0-old to 4.0-new, so be prepared to reindex your stuff after upgrade. Otherwise, Lucene/Solr 4.0 seems quite stable if you don't upgrade. The Solr version Lucid Imagination ships with LucidWorks, also uses the "unstable" 4.0 version internally, but they guarantee, that you can upgrade between different LucidWorks versions (they provide an index upgrade tool). About the Geo features: I am not familiar with the current status of Solr's Geo support. Maybe somebody else can answer, if the mentioned query type works with the stable version 3.x. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: solrnovice [mailto:manisha...@yahoo.com] > Sent: Friday, September 09, 2011 5:21 AM > To: general@lucene.apache.org > Subject: RE: Upgrade solr > > hi Uwe, i havent heard from Lucid, so can you please let me know what is the > latest stable version of SOLR. On apache's site its mentioned that SOLR 4.0 is > not ready, but the nightly build is available ( but i dont know how stable that > version is ) . I want to make the geodist work, but with a stable release. Looks > like the geodist doesnt work in prior release of solr. > When i mean, it doesnt work, i mean returning as a pseudo column, even if i do > not perform a longitude / latitude query. > > I had the geodist returned when i did a City search by using SOLR4 revision > from the nightly build. What is the latest Release candidate for SOLR. Can you > please point me to the right download site? > > > thanks > SN > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Upgrade- > solr-tp3311066p3321837.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Upgrade solr
Hi, Please ask this question to Lucid Imagination support staff. The Lucene/Solr community is not responsible for releases of LucidWorks Enterprise and we don't know which versions of Lucene/Solr they use. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: solrnovice [mailto:manisha...@yahoo.com] > Sent: Monday, September 05, 2011 4:40 PM > To: general@lucene.apache.org > Subject: Upgrade solr > > hi , We are trying to use LucidImagination ( lucidimagination.com) for our > search, it comes with a version of SOLR. When i use geodist(), or pseudo > columns, they dont seem to be working. > > Now my question is, can i upgrade just the solr under lucid imagination install? > Has anybody tried that, if so can you please share some information. I > downloaded SOLR 4 from the nightly build, but the lucidimagination, > schema.xml, doesnt work in that solr, as its closely tied to Lucid. I had it > working after i removed references to Lucid's classes...etc. But I lose some of > Lucid's datatypes like "comma-seperated" > analyzers...etc. If anybody tried upgrading the Solr to the latest version, please > share your thoughts. > > > thanks > sN > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Upgrade- > solr-tp3311066p3311066.html > Sent from the Lucene - General mailing list archive at Nabble.com.
[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7
Hello Apache Lucene & Apache Solr users, Hello users of other Java-based Apache projects, Oracle released Java 7 today. Unfortunately it contains hotspot compiler optimizations, which miscompile some loops. This can affect code of several Apache projects. Sometimes JVMs only crash, but in several cases, results calculated can be incorrect, leading to bugs in applications (see Hotspot bugs 7070134 [1], 7044738 [2], 7068051 [3]). Apache Lucene Core and Apache Solr are two Apache projects, which are affected by these bugs, namely all versions released until today. Solr users with the default configuration will have Java crashing with SIGSEGV as soon as they start to index documents, as one affected part is the well-known Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be miscompiled, too, leading to index corruption (especially on Lucene trunk with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]). These problems were detected only 5 days before the official Java 7 release, so Oracle had no time to fix those bugs, affecting also many more applications. In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see [6]). This means you cannot use Apache Lucene/Solr with Java 7 releases before Update 2! If you do, please don't open bug reports, it is not the committers' fault! At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions. Please note: Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts It is strongly recommended not to use any hotspot optimization switches in any Java version without extensive testing! In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package! On behalf of the Lucene project, Uwe [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134 [2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738 [3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051 [4] https://issues.apache.org/jira/browse/LUCENE-3335 [5] https://issues.apache.org/jira/browse/LUCENE-3346 [6] http://s.apache.org/StQ - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: Experiencing High CPU load with 3.3
Hi Alan, We don't see such problems here on our machines, so first we need to know some things: Do you open a new index reader before each search and close it afterwards? In normal Lucene usage you keep the IndexReaders open all the time and only reopen them when your underlying index changed. If you keep the IndexReader open, such close() calls with doPrivileged will not happen so often. The problem in Lucene 3.3 is that the default on 64 bit Linux is to use memory mapping, which is implemented by the operating system like a swap file mapped into virtual address space. This makes searches very fast, as the whole index is somehow in cache RAM and can be accessed like memory swapped out into a swap file. This memory mapping is much more expensive than simple opening a file and closing it after usage. If you reopen the indexreader all the time, this expensive mapping cost is what you see all the time. So I would suggest you check: a) Only open/close IndexReaders when you really need it. You can use one IndexReader and even handle multiple parallel searches on it (it's thread safe). b) If you need to reopen quite often, don't use the defaults FSDirectory.open(), but instead choose a Directory implementation yourself. To get the behavior of Lucene 3.2, use new NIOFSDirectory(...). But on Linux 64, searches will be slower than with MMap (if you keep readers open). c) Is your JVM somehow closed down in terms of security? These doPrivileged mappings are only effective, when you have a web application container that restricts web applications to run with low privileges (something like Windows Vista/7 UAC). In this case, doPrivileged may be expensive and is not simply a noop. For Lucene/Solr it's always recommened not to run it in restricted Java environments. Maybe OpenBD does this - I have no idea. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Alan Williamson (aw2.0 cloud experts) [mailto:a...@aw20.co.uk] > Sent: Saturday, July 23, 2011 9:42 AM > To: general@lucene.apache.org > Subject: Experiencing High CPU load with 3.3 > > Good day all. > > We area a long term user of Lucene with it powering our search engine inside > of the Java CFML runtime OpenBD (http://openbd.org). We've had no > complaints from it whatsoever. Until now that is. > > Since moving to 3.3 we have been experiencing extremely high CPU load > when searching against an index. The index is only 19MB in size with all of > 40,000 items in it. > > We have produced some very useful stack traces with CPU % load times. > Some other data points. This is not happening to just one machine, this is > happening to all the machines in the web farm. The actual traffic on the > machine is low, the stack traces i give you, the only threads that are actually > doing anything, are the ones were Lucene has gone into a tizzy. The > machines have all 3.5GB of memory. Running Java 1.6.23 > > Thanks > > alan > > --- Sample #1 - > > "qtp274064735-1462" prio=10 tid=0x2aaacaa62800 nid=19115 runnable > [0x4174d000] > java.lang.Thread.State: RUNNABLE > at java.security.AccessController.doPrivileged(Native Method) > at > org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.jav > a:158) > at > org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDir > ectory.java:383) > at > org.apache.lucene.index.CompoundFileReader.close(CompoundFileReader.j > ava:137) > - locked<0x000795b86520> (a > org.apache.lucene.index.CompoundFileReader) > at > org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:302) > at > org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:359) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo > s.java:750) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo > s.java:589) > at > org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:355) > at > org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.j > ava:476) > at > org.apache.lucene.index.DirectoryReader.initialize(DirectoryReader.java:347 > ) > at > org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:130) > at > org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:8 > 3) > at > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo > s.java:750) > at > org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) > at > org.apache.lucene.index.IndexReader.open(IndexReader.java:428) > at
RE: Special Board Report for May 2011
> > I think the current state of logging only #lucene-dev is good. > > Yeah, except no one is on it other than a few people even though many of > them (committers that is) are on #lucene I haven't seen any technical discussions anymore on #lucene. I was discussing with simon and mike on #lucene-dev the past days and had some work going on for the IndexUpgrader tool and MergePolicies. The discussions were even linked on JIRA issues. > > I go to #lucene-dev now. I think only IRC channel(s) that are Lucene/Solr > internal development in nature need to be logged -- and that's just #lucene- > dev. So just because you have observed many developers are on #lucene > instead of #lucene-dev doesn't indicate a problem, so long as no design > decisions for Lucene/Solr take place on #lucene or #solr. #lucene and #solr is > where users get to ask questions, much like how it is on the user mailing lists. > So *if* (I don't know if it happens) internal Lucene / Solr design decisions are > taking place on #lucene or #solr then obviously that must stop. I'd rather > these channels not get logged so that we can have an expectation of a single > place for these discussions on IRC and have that place be clear of user > support questions. > > > > RE refactoring / modularization, it's good to finally see a sense of > agreement on how to move forward. Yeah that ok, I have nothing to add to that (and don't want anymore, it's a soap opera). > >> 3. Put in the automated patch checking system that Hadoop uses. > Volunteers? Perhaps we can knock this out at Lucene Revolution? Who logs the stuff there? In my opinion, a meeting on Lucene-Rev is also "private" - or is this different somehow? What's difference between a private talk between two or three people in a bar at Lucene Revolution without somebody writing down a log? A log can also be written if somebody else talks with me in a private Skype chat! Uwe
RE: Querying in Solr
Hi, http://wiki.apache.org/solr/Solrj This is the Java client that talks either to an embedded solr server or via http to a separate installation. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Romi [mailto:romijain3...@gmail.com] > Sent: Tuesday, May 10, 2011 11:55 AM > To: general@lucene.apache.org > Subject: Re: Querying in Solr > > I have to use Solr with Java > > - > Romi > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Querying-in-Solr- > tp2922058p2922495.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: [VOTE] Create Solr TLP
Hi, Strong -1 to unmerge. Many of you know that I was originally against the merge, but once I saw the possibilities (especially refactoring the analysis stuff), I started to also actively support it. I helped together with lots of other previous-only-Lucene committers to move the svn together and rewrite parts of the build system. After that we started to move analyzers to one place, added Solr by factories for *all* Lucene analyzers available and vice versa opened Solr analyzers to Lucene users. We removed lots of deprecated code usage (which made Solr move from Lucene 2.9 to 3.0). This was especially the work of Lucene committers who originally developed the new analysis API. Solr had at this time not many active developers, so help from Lucene users was welcome. So at this time, the merge helped both projects. Problems started at that time, when some of us suggested to "remove" features from Solr and move it to Lucene Core, means faceting (I mentioned that first on a conference to the public, which disagreed some people), function queries, schema support, clustering, dismax. From my point of view as originally only a "Lucene Committer" is, that Solr was and is still somehow dominated by one person who is afraid of losing functionality in Solr that was originally developed by him and this could reduce the power of Solr on the market (yes, there is also a company behind, that mainly wants to sell consulting to Solr users [as this is of course easier to do], but that's just a side note). I think instead of splitting again, Lucene TLP should consider thinking about better communication between the committers, allow different opinions for Solr's later development and maybe vote a new PMC (as the current PMC was simply merged from Solr and Lucene, where conflicts are programmed). If the merged Lucene+Solr is not what the dominating person wants to have, it is free to fork Solr from Apache (yes it's open source and you can sell/provide a forked version to customers with only huper-duper features that separates from Lucene, but I think this is already done - LW-Enterprise). But if most committers here want to help to bring both Lucene+Solr to the top of search engines, they are free to do it at the ASF with discussion and also lots of code refactoring - we are using SVN, so we always have the track what was done. Reverting or not reverting is only political, nothing technical. And disagreement is also valid in an open source project, but disagreeing people should sometimes revise their opinion - this applies to a few more people here, I am also not always the best discussion partner (police is the executive... *g*). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik > Seeley > Sent: Tuesday, April 26, 2011 8:50 PM > To: general@lucene.apache.org > Subject: [VOTE] Create Solr TLP > > A single merged project works only when people are relatively on the same > page, and when people feel it's mutually beneficial. Recent events make it > clear that that is no longer the case. > > Improvements to Solr have been recently blocked and reverted on the > grounds that the new functionality was not immediately available to non-Solr > users. > This was obviously never part of the original idea (well actually - it was > considered but rejected as too onerous). But the past doesn't matter as > much as the present - about how people chose to act and interpret things > today. > > https://issues.apache.org/jira/browse/SOLR-2272 > http://markmail.org/message/unrvjfudcbgqatsy > > Some people warned us against merging at the start, and I guess it turns out > they were right. > > I no longer feel it's in Solr's best interests to remain under the same PMC as > Lucene-Java, and I know some other committers who have said they feel like > Lucene got the short end of the stick. But rather than arguing about who's > right (maybe both?) since enough of us feel it's no longer mutually beneficial, > we should stop fighting and just go our separate ways. > > Please VOTE to create a new Apache Solr TLP. > > Here's my +1 > > -Yonik
RE: Unable to download Snowball !
Hi, Lucene does not contain a class sbStemmer. You have to choose one of the following: http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/tartarus/snowb all/ext/package-summary.html Your configuration of Snowball seems to be wrong, so it tries to load a class that does not exist (sbStemmer). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Neil Ghosh [mailto:neil.gh...@gmail.com] > Sent: Sunday, April 17, 2011 3:27 PM > To: general@lucene.apache.org; Jakarta General List > Subject: Re: Unable to download Snowball ! > > I added the library in contrib directory > (lucene-3.1.0/contrib/analyzers/common/lucene-analyzers-3.1.0.jar) > > But getting the following runtime exception > > Exception in thread "main" java.lang.RuntimeException: > java.lang.ClassNotFoundException: org.tartarus.snowball.ext.sbStemmer > > On Sun, Apr 17, 2011 at 6:43 PM, Neil Ghosh wrote: > > > Rahul, What is this link about ? > > > > > > On Sun, Apr 17, 2011 at 6:42 PM, Rahul Akolkar > wrote: > > > >> On Sun, Apr 17, 2011 at 8:49 AM, Neil Ghosh > wrote: > >> > I have already downloaded lucene but where is the snowball analyzer > >> > in > >> that > >> > ? > >> > The one in contrib directory is throwing runtime exception > >> > > >> > >> > >> http://lucene.apache.org/mail.html > >> > >> -Rahul > >> > >> > >> > On Sun, Apr 17, 2011 at 6:16 PM, Rahul Akolkar > >> > >> > > >> > wrote: > >> >> > >> >> On Sun, Apr 17, 2011 at 8:34 AM, Neil Ghosh > >> wrote: > >> >> > Hi, > >> >> > > >> >> > Unable to download snowball analyzer I am trying to use snowball > >> >> > analyzer for my search engine but unable > >> to > >> >> > download the library. > >> >> > > >> >> > >> >> > >> >> http://lucene.apache.org/ > >> >> > >> >> -Rahul > >> >> > >> >> > >> >> > Please help > >> >> > > >> >> > -- > >> >> > Thanks and Regards > >> >> > Neil > >> >> > http://neilghosh.com > >> >> > > >> > > >> > > >> > > >> > -- > >> > Thanks and Regards > >> > Neil > >> > http://neilghosh.com > >> > > >> > > >> > > >> > > >> > > > > > > > > -- > > Thanks and Regards > > Neil > > http://neilghosh.com > > > > > > > > > > > -- > Thanks and Regards > Neil > http://neilghosh.com
RE: Range query rewrite incorrect
That's all correct, the rewrite is done like this, what comes out after rewrite is purely internal and may change from query to query or index to index (as the best rewrite method is chosen from your index contents). Your query has one problem: Lucene only works with "string" ranges that way how you use it, so a pure numeric range without padding the terms cannot work. If you want real numeric queries, use NumericRangeQuery in combination with NumericField. But QueryParser cannot handle the so you have to build the queries from code (instantiate NumericRangeQuerxy in your code). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: jpogorman [mailto:jp.ogor...@documatics.com] > Sent: Monday, March 21, 2011 6:14 PM > To: general@lucene.apache.org > Subject: Range query rewrite incorrect > > Hello, > > I have a range query like this... > TYPE:DOCUMENT AND docSize:[0 TO 1048576] > > I'm using the MultiFieldQueryParser and when I call Rewrite on this I get back > the following text... > +TYPE:document +ConstantScore(QueryWrapperFilter()) > > This text will not return any results and I am not sure where the > QueryWrapperFilter() portion of text has come from. It has something to do > with the 1048576 value and similar values. > > If I search using TYPE:DOCUMENT AND docSize:[0 TO 500] the rewrite will > generate a query that works... > +TYPE:document +ConstantScore(docSize:[0 TO 500]) > > Can anyone shed some light on why this is happening and how to avoid it? > > Thanks, > JP > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Range- > query-rewrite-incorrect-tp2710807p2710807.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Extending Lucene Query Parser for XML-based queris
Hi, Maybe you are looking for that: http://lucene.apache.org/java/3_0_3/api/contrib-xml-query-parser/index.html By the way: Lucene 2.3.2 is very old (you referenced it in [1]). Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: aneuryzma [mailto:patrick.divia...@gmail.com] > Sent: Monday, February 28, 2011 10:28 AM > To: general@lucene.apache.org > Subject: Extending Lucene Query Parser for XML-based queris > > I need to extend Lucene Query Parser[1] to deal with XML-based queries. > > Is there any ready implementation I can use for it? > > thanks > > [1]: http://lucene.apache.org/java/2_3_2/queryparsersyntax.html > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Extending-Lucene-Query-Parser-for- > XML-based-queris-tp2593571p2593571.html > Sent from the Lucene - General mailing list archive at Nabble.com.
RE: Can I use MatchAllDocsQuery and and specify terms to search in multiple fields of my documents ?
A MatchAllDocsQuery simply matches all non-deleted documents exactly as the name suggests, so what are you trying to do? It makes no sense to add a restriction to this Query as it returns always everything. The parameter is only used for scoring to sort the documents according to the boost factor for the given field name. If you want to search for multiple terms in different Fields combine one or more TermQuery in a BooleanQuery. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: aneuryzma [mailto:patrick.divia...@gmail.com] > Sent: Monday, February 28, 2011 2:13 PM > To: general@lucene.apache.org > Subject: Can I use MatchAllDocsQuery and and specify terms to search in > multiple fields of my documents ? > > Can I use MatchAllDocsQuery and and specify terms to search in multiple > fields of my documents ? > > I've seen that this query takes only 1 parameter: MatchAllDocsQuery(String > normsField), so I was wondering if I can search for multiple terms on multiple > fields. > > thanks > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Can-I- > use-MatchAllDocsQuery-and-and-specify-terms-to-search-in-multiple-fields- > of-my-documents-tp2594905p2594905.html > Sent from the Lucene - General mailing list archive at Nabble.com.