RE: Can't find Japanese words ending with numbers

2019-04-17 Thread Uwe Schindler
Please check here, you have to do it on your own:
http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Gareth Harper 
> Sent: Wednesday, April 17, 2019 12:45 PM
> To: general@lucene.apache.org
> Subject: RE: Can't find Japanese words ending with numbers
> 
> Could someone please take me off this mailing list.
> 
> -Original Message-
> From: Antonio Facciorusso 
> Sent: 17 April 2019 11:05
> To: us...@jackrabbit.apache.org; general@lucene.apache.org
> Subject: Can't find Japanese words ending with numbers
> 
> Dear all,
> 
> I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.
> 
> I have a node of type "mynodetype" having a property named "description"
> having the following value: "横浜第2センタ". If I perform a full-text search
> using "jcr:contains" like:
> 
> jcr:contains(., '*')
> 
> this query returns 0 results:
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"
> 
> while all of the following work correctly and return at least one result:
> 
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"
> 
> I tried using both the default analyzer and the Japanese one
> (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/j
> a/JapaneseAnalyzer.html).
> 
> This is the content of my indexingConfiguration.xml file:
> 
> 
>  "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd";>
> http://www.jcp.org/jcr/nt/1.0";>
> 
> 
> .*:[^_]+
> 
> .*:resources_data_[^_]+
> 
> .*:resources_data[^_]+
> .*:resources_(?!data)[^_]+
> 
> .*:resources[^_]+_[^_]+
> 
> .*:(?!resources)[^_]+_[^_]+
> 
> 
> 
> Should I use a different configuration/analyzer? Is it a bug?
> 
> Thank you.
> 
> Best regards,
> Antonio.
> [https://westpole.it/firma/logo.png]
> 
> Antonio Facciorusso
> WebRainbow(r) Software Analyst & Developer
> 
> P +39 051 8550 562
> M +39 335 1219330
> E a.faccioru...@westpole.it
> W https://westpole.webex.com/meet/A.Facciorusso
> A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno
> 
> [https://westpole.it/firma/sito.png]<https://westpole.it>
> [https://westpole.it/firma/twitter.png]
> <https://twitter.com/WESTPOLE_SPA>
> [https://westpole.it/firma/facebook.png]
> <https://www.facebook.com/WESTPOLESPA/>
> [https://westpole.it/firma/linkedin.png]
> <https://www.linkedin.com/company/westpole/>
> 
> 
> This email for the D.lgs.196/2003 (Privacy Code) and European Regulation
> 679/2016/UE (GDPR) may contain confidential and/or privileged information
> for the exclusive use of the intended recipient. Any review or distribution by
> others is strictly prohibited. If you are not the intended recipient, you must
> not use, copy, disclose or take any action based on this message or any
> information here. If you have received this email in error, please contact us
> (email:priv...@westpole.it) by reply email and delete all copies. Legal
> privilege is not waived because you have read this email. Thank you for your
> cooperation.
> 
> 
> [https://westpole.it/firma/ambiente.png] Please consider the environment
> before printing this email
> 
> 
> 
> 
> This e-mail has been scanned for all viruses by Claranet. The service is
> powered by MessageLabs. For more information on a proactive anti-virus
> service working around the clock, around the globe, visit:
> http://www.claranet.co.uk
> 
> 
> 
> 
> 
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs - For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> 
> 



Re: Noticed performance degrade from lucene-7.5.0 to lucene-8.0.0

2019-04-14 Thread Uwe Schindler
Without further information we can't help here. So we would need the type of 
queries (conjunction, disjunction, phrase,...). There are significant changes 
which may cause some queries to be slower, but others like 50 times faster if 
the exact number of results are not needed, see 
https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand

Uwe

Am April 14, 2019 2:22:59 PM UTC schrieb Khurram Shehzad :
>Hi All,
>
>I have recently updated from lucene-7.5.0 to lucene-8.0.0. But I
>noticed considerable performance degrade. Queries that used to be
>executed in 18 to 24 milliseconds now taking 74 to 110 milliseconds.
>
>Any suggestion please?
>
>Regards,
>Khurram

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

RE: [ANNOUNCE] Apache Lucene 8.0.0 released

2019-03-14 Thread Uwe Schindler
Yeh! It's finally done. I am a bit sad, that the new query short circutting 
is not useable from Solr at the moment.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: jim ferenczi 
> Sent: Thursday, March 14, 2019 1:16 PM
> To: general@lucene.apache.org; d...@lucene.apache.org; java-
> u...@lucene.apache.org
> Subject: [ANNOUNCE] Apache Lucene 8.0.0 released
> 
> 14 March 2019, Apache Lucene™ 8.0.0 available
> 
> The Lucene PMC is pleased to announce the release of Apache Lucene 8.0.0.
> 
> Apache Lucene is a high-performance, full-featured text search engine
> library written entirely in Java. It is a technology suitable for nearly
> any application that requires full-text search, especially cross-platform.
> 
> This release contains numerous bug fixes, optimizations, and improvements,
> some of which are highlighted below. The release is available for immediate
> download at:
> 
> http://lucene.apache.org/core/mirrors-core-latest-redir.html
> 
> Lucene 8.0.0 Release Highlights:
> 
> Query execution
> Term queries, phrase queries and boolean queries introduced new
> optimization that enables efficient skipping over non-competitive documents
> when the total hit count is not needed. Depending on the exact query and
> data distribution, queries might run between a few percents slower and
> many
> times faster, especially term queries and pure disjunctions.
> In order to support this enhancement, some API changes have been made:
>  * TopDocs.totalHits is no longer a long but an object that gives a lower
> bound of the actual hit count.
>  * IndexSearcher's search and searchAfter methods now only compute total
> hit counts accurately up to 1,000 in order to enable this optimization by
> default.
>  * Queries are now required to produce non-negative scores.
> 
> Codecs
>  * Postings now index score impacts alongside skip data. This is how term
> queries optimize collection of top hits when hit counts are not needed.
>  * Doc values introduced jump tables, so that advancing runs in constant
> time. This is especially helpful on sparse fields.
>  * The terms index FST is now loaded off-heap for non-primary-key fields
> using MMapDirectory, reducing heap usage for such fields.
> 
> Custom scoring
> The new FeatureField allows efficient integration of static features such
> as a pagerank into the score. Furthermore, the new
> LongPoint#newDistanceFeatureQuery and
> LatLonPoint#newDistanceFeatureQuery
> methods allow boosting by recency and geo-distance respectively. These new
> helpers are optimized for the case when total hit counts are not needed.
> For instance if the pagerank has a significant weight in your scores, then
> Lucene might be able to skip over documents that have a low pagerank
> value.
> 
> Further details of changes are available in the change log available at:
> http://lucene.apache.org/core/8_0_0/changes/Changes.html
> 
> Please report any feedback to the mailing lists (
> http://lucene.apache.org/core/discussion.html)
> 
> Note: The Apache Software Foundation uses an extensive mirroring network
> for distributing releases. It is possible that the mirror you are using may
> not have replicated the release yet. If that is the case, please try
> another mirror. This also applies to Maven access.



Re: Customize Nested Query

2018-12-23 Thread Uwe Schindler
Why not TermInSetQuery?

Uwe

Am December 23, 2018 5:10:02 AM UTC schrieb Khurram Shehzad 
:
>Hi,
>
>I've a requirement of customized match of an string field against the
>list of 0.5M elements.
>
>FunctionQuery and FunctionMatchQuery look appropriate for my need. It
>seems that it only supports Double whereas I need String support.
>
>Any idea please?
>
>Regards,
>Khurram

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

2018-07-04 Thread Uwe Schindler
CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload
(exchange rate provider config / enum field config / TIKA parsecontext)

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.4
Solr 7.0.0 to 7.3.1

Description:
The details of this vulnerability were reported by mail to the Apache
security mailing list.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (currency.xml, enumsConfig.xml referred from schema.xml,
TIKA parsecontext config file). In addition, Xinclude functionality provided
in these config files is also affected in a similar way. The vulnerability can
be used as XXE using file/ftp/http protocols in order to read arbitrary
local files from the Solr server or the internal network. The manipulated
files can be uploaded as configsets using Solr's API, allowing to exploit
that vulnerability. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Yuyang Xiao, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12450
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




[SECURITY] CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

2018-05-21 Thread Uwe Schindler
CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.3
Solr 7.0.0 to 7.3.0

Description:
The details of this vulnerability were reported internally by one of Apache
Solr's committers.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (solrconfig.xml, schema.xml, managed-schema). In addition,
Xinclude functionality provided in these config files is also affected in a
similar way. The vulnerability can be used as XXE using file/ftp/http
protocols in order to read arbitrary local files from the Solr server or the
internal network. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.4 or Solr 7.3.1 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.4 or Solr 7.3.1 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Ananthesh, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12316
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




Re: Manipulate stored string in Lucene

2018-05-08 Thread Uwe Schindler
Oh it's Solr? Then it's not easy possible. Plain Lucene works like that.

Uwe

Am May 9, 2018 6:09:42 AM UTC schrieb Uwe Schindler :
>Hi,
>
>You don't need a second field name, but you can once add the indexed
>field with stored=false and then add a second instance with same field
>name and the original stored content, but not indexed. If you want to
>have docvalues, the same can be done for docvalues. Internally, Lucene
>does it like that anyways. Adding a field to store and index at same
>time is just for convenience.
>
>Uwe
>
>Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian"
>:
>>Dear all,
>>
>>currently I am reading text fields that contain xml text. Hence, the
>>solr input may look like this:
>>
>><sec sec-type="Introduction"
>>id="SECID0E4F">
>><title>Introduction</title>
>></sec>
>>
>>
>>With all “<” and “>” escaped.
>>I wrote a tokenizer that indexes the tag attributes (e.g.
>>sec-type=”Introduction”) on the position of the tagged word
>>(“Introduction” in this case) and hence I need the HTML tags when
>>indexing. However, I want to strip the HTML in the stored string that
>>is shown to the user on a query. So far, I figured out that the index
>>and the stored string a separated. Thus, I thought it should be
>>possible to manipulate the stored string either after indexing.
>>
>>Is there a way to do so? I would prefer to manipulate the stored
>string
>>and not introduce a second field with the plain text in the input
>file.
>>
>>I am glad for any help!
>>
>>Best Regards,
>>
>>Adrian
>>
>>---
>>Adrian Pachzelt
>>- Fachinformationsdienst Biodiversitaetsforschung -
>>- Hosting von Open Access-Zeitschriften -
>>Universitaetsbibliothek Johann Christian Senckenberg
>>Bockenheimer Landstr. 134-138
>>60325 Frankfurt am Main
>>Tel. 069/798-39382
>>a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de>
>>---
>
>--
>Uwe Schindler
>Achterdiek 19, 28357 Bremen
>https://www.thetaphi.de

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Re: Manipulate stored string in Lucene

2018-05-08 Thread Uwe Schindler
Hi,

You don't need a second field name, but you can once add the indexed field with 
stored=false and then add a second instance with same field name and the 
original stored content, but not indexed. If you want to have docvalues, the 
same can be done for docvalues. Internally, Lucene does it like that anyways. 
Adding a field to store and index at same time is just for convenience.

Uwe

Am May 9, 2018 5:57:40 AM UTC schrieb "Pachzelt, Adrian" 
:
>Dear all,
>
>currently I am reading text fields that contain xml text. Hence, the
>solr input may look like this:
>
><sec sec-type="Introduction"
>id="SECID0E4F">
><title>Introduction</title>
></sec>
>
>
>With all “<” and “>” escaped.
>I wrote a tokenizer that indexes the tag attributes (e.g.
>sec-type=”Introduction”) on the position of the tagged word
>(“Introduction” in this case) and hence I need the HTML tags when
>indexing. However, I want to strip the HTML in the stored string that
>is shown to the user on a query. So far, I figured out that the index
>and the stored string a separated. Thus, I thought it should be
>possible to manipulate the stored string either after indexing.
>
>Is there a way to do so? I would prefer to manipulate the stored string
>and not introduce a second field with the plain text in the input file.
>
>I am glad for any help!
>
>Best Regards,
>
>Adrian
>
>---
>Adrian Pachzelt
>- Fachinformationsdienst Biodiversitaetsforschung -
>- Hosting von Open Access-Zeitschriften -
>Universitaetsbibliothek Johann Christian Senckenberg
>Bockenheimer Landstr. 134-138
>60325 Frankfurt am Main
>Tel. 069/798-39382
>a.pachz...@ub.uni-frankfurt.de<mailto:a.pachz...@ub.uni-frankfurt.de>
>---

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

[SECURITY] CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter

2018-04-08 Thread Uwe Schindler
CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request 
parameter

Severity: Major

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 1.2 to 6.6.2
Solr 7.0.0 to 7.2.1

Description:
The details of this vulnerability were reported to the Apache Security mailing 
list. 

This vulnerability relates to an XML external entity expansion (XXE) in the
`&dataConfig=` parameter of Solr's DataImportHandler. It can be
used as XXE using file/ftp/http protocols in order to read arbitrary local
files from the Solr server or the internal network. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.3 or Solr 7.3.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases disable external entities in anonymous XML files
passed through this request parameter. 

If users are unable to upgrade to Solr 6.6.3 or Solr 7.3.0 then they are
advised to disable data import handler in their solrconfig.xml file and
restart their Solr instances. Alternatively, if Solr instances are only used
locally without access to public internet, the vulnerability cannot be used
directly, so it may not be required to update, and instead reverse proxies or
Solr client applications should be guarded to not allow end users to inject
`dataConfig` request parameters. Please refer to [2] on how to correctly
secure Solr servers.

Credit:
麦 香浓郁

References:
[1] https://issues.apache.org/jira/browse/SOLR-11971
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




Re: Register custom TokenizerFactory with Maven

2018-03-22 Thread Uwe Schindler
Hi,

You have to add a META-INF/services file into the JAR file with all factories 
contained in the JAR file listed. More info in the documentation of 
java.util.ServiceLoader: 
https://docs.oracle.com/javase/8/docs/api/java/util/ServiceLoader.html

The name to use for lookup is generated from the simple class name, basically 
it lowercases and removes "(Token)FilterFactory" from it.

Uwe

Am March 22, 2018 12:25:41 PM UTC schrieb "Pachzelt, Adrian" 
:
>Hi everybody,
>
>I am currently writing a custom TokenizerFactory for Lucene. However,
>as far as I understood, Tokenizer are called by their name like this
>for the StandardTokenizer:
>tokenizerFactory("Standard").create(newAttributeFactory());
>
>Accordingly, my code looks like this:
>tokenizerFactory("TaggedText").create(newAttributeFactory());
>
>I apply Maven for compiling my code. Where do I need to register my
>Factory class, since, as expected, I get the compiling error:
>
>“ A SPI class of type org.apache.lucene.analysis.util.TokenizerFactory
>with name ‘TaggedText’ does not exist. You need to add the
>corresponding JAR file supporting this SPI to your classpath.”
>
>How can I do this?
>
>I am glad for any advice!
>
>Thanks a lot!
>
>Cheers,
>
>Adrian

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

FOSS Backstage Micro Summit on Monday in Berlin

2017-11-17 Thread Uwe Schindler
Hi,

It's already a bit late, but all people who are visiting Germany next week and 
want to do a short trip to Berlin: There are still slots free on the FOSS 
Backstage Micro Summit. It is a mini conference conference on everything 
related to governance, collaboration, legal and economics within the scope of 
FOSS. The main event will take place as part of berlinbuzzwords 2018. We have a 
lot of speakers invited - also from ASF!

https://www.foss-backstage.de/

Program:
https://www.foss-backstage.de/news/micro-summit-program-online-now

I hope to see you there,
Uwe

-----
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




RE: RE: RE: Would docvalues be loaded into jvm?

2017-06-15 Thread Uwe Schindler
Yes.

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: wangqinghuan [mailto:1095193...@qq.com]
> Sent: Thursday, June 15, 2017 12:21 PM
> To: general@lucene.apache.org
> Subject: Re: RE: RE: Would docvalues be loaded into jvm?
> 
> Does "hotspot" reffers to java virtual machine?
> 
> 
> 
> ---Original---
> From: "Uwe Schindler [via
> Lucene]"
> Date: 2017/6/15 17:03:46
> To: "wangqinghuan"<1095193...@qq.com>;
> Subject: RE: RE: Would docvalues be loaded into jvm?
> 
> 
>  Hi,
> 
> There is no design document about that. Lucene uses MMAP for all index
> files since a long time ago. DocValues is just another implementation.
> Basically it uses IndexInput's methods to access the underlying data, which is
> memory mapped if you are on 64 bit platforms. For DocValues there are also
> positional reads available. There is not much stuff specifically for 
> docvalues,
> it is just a file format that supports column based access with positional
> reads. The mmap implementation is separated from this and a bit lower in
> the I/O layer of Lucene. Sorting is just a use case of DocValues, but it does
> not sort directly on the mmapped files, there are several abstractions
> inbetween (which are of course removed by the Hotspot optimizer).
> 
> Some information (a bit older, but still valid) is here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> Uwe
> 
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
> 
> > -Original Message-
> > From: wangqinghuan [mailto:[hidden email]]
> > Sent: Thursday, June 15, 2017 10:41 AM
> > To: [hidden email]
> > Subject: Re: RE: Would docvalues be loaded into jvm?
> >
> > hi
> > Is there any  design document on this aspect (sorting algorithm off mmap)?
> >
> >
> >
> > ---Original---
> > From: "Uwe Schindler [via
> > Lucene]"<[hidden email]>
> > Date: 2017/6/15 14:39:30
> > To: "wangqinghuan"<[hidden email]>;
> > Subject: RE: Would docvalues be loaded into jvm?
> >
> >
> >  Hi
> >
> > It works directly off  the mmapped files. It is not fully loaded into heap,
> only
> > some small control structures are allocated on heap. During sorting the
> > TopDocsCollector uses the memory mapped structures to uncompress and
> > lookup the sort values.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: [hidden email]
> >
> > > -Original Message-
> > > From: wangqinghuan [mailto:[hidden email]]
> > > Sent: Thursday, June 15, 2017 4:36 AM
> > > To: [hidden email]
> > > Subject: Would docvalues be loaded into jvm?
> > >
> > > hi
> > > I know that data is written into disk with the style of column-store if I
> > > enable doc-values for certain field.
> > > But I don't understand why sorting with docvalues doesn't increase the
> > load
> > > of jvm. whatever sorting algorithm , data would be loaded into  jvm to
> sort.
> > > This should be a high load for jvm when I sort all index  , but  no change
> > > for jvm in fact.  How does lucene sort with docvalues ? Can sort algorithm
> > > work directly based on the file (Mmap) ?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> http://lucene.472066.n3.nabble.com/Would-
> > > docvalues-be-loaded-into-jvm-tp4340644.html
> > > Sent from the Lucene - General mailing list archive at Nabble.com.
> >
> >
> >
> >
> >   If you reply to this email, your message will be added to the discussion
> > below:
> >  http://lucene.472066.n3.nabble.com/Would-docvalues-be-loaded-into-
> jvm-
> > tp4340644p4340659.html
> >   To unsubscribe from Would docvalues be loaded into jvm?, click here.
> >  NAML
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Would-
> > docvalues-be-loaded-into-jvm-tp4340644p4340667.html
> > Sent from the Lucene - General mailing list archive at Nabble.com.
> 
> 
> 
> 
>   If you reply to this email, your message will be added to the discussion
> below:
>  http://lucene.472066.n3.nabble.com/Would-docvalues-be-loaded-into-jvm-
> tp4340644p4340678.html
>   To unsubscribe from Would docvalues be loaded into jvm?, click here.
>  NAML
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Would-
> docvalues-be-loaded-into-jvm-tp4340644p4340689.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: RE: Would docvalues be loaded into jvm?

2017-06-15 Thread Uwe Schindler
Hi,

There is no design document about that. Lucene uses MMAP for all index files 
since a long time ago. DocValues is just another implementation. Basically it 
uses IndexInput's methods to access the underlying data, which is memory mapped 
if you are on 64 bit platforms. For DocValues there are also positional reads 
available. There is not much stuff specifically for docvalues, it is just a 
file format that supports column based access with positional reads. The mmap 
implementation is separated from this and a bit lower in the I/O layer of 
Lucene. Sorting is just a use case of DocValues, but it does not sort directly 
on the mmapped files, there are several abstractions inbetween (which are of 
course removed by the Hotspot optimizer).

Some information (a bit older, but still valid) is here: 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: wangqinghuan [mailto:1095193...@qq.com]
> Sent: Thursday, June 15, 2017 10:41 AM
> To: general@lucene.apache.org
> Subject: Re: RE: Would docvalues be loaded into jvm?
> 
> hi
> Is there any  design document on this aspect (sorting algorithm off mmap)?
> 
> 
> 
> ---Original---
> From: "Uwe Schindler [via
> Lucene]"
> Date: 2017/6/15 14:39:30
> To: "wangqinghuan"<1095193...@qq.com>;
> Subject: RE: Would docvalues be loaded into jvm?
> 
> 
>  Hi
> 
> It works directly off  the mmapped files. It is not fully loaded into heap, 
> only
> some small control structures are allocated on heap. During sorting the
> TopDocsCollector uses the memory mapped structures to uncompress and
> lookup the sort values.
> 
> Uwe
> 
> -
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
> 
> > -Original Message-
> > From: wangqinghuan [mailto:[hidden email]]
> > Sent: Thursday, June 15, 2017 4:36 AM
> > To: [hidden email]
> > Subject: Would docvalues be loaded into jvm?
> >
> > hi
> > I know that data is written into disk with the style of column-store if I
> > enable doc-values for certain field.
> > But I don't understand why sorting with docvalues doesn't increase the
> load
> > of jvm. whatever sorting algorithm , data would be loaded into  jvm to sort.
> > This should be a high load for jvm when I sort all index  , but  no change
> > for jvm in fact.  How does lucene sort with docvalues ? Can sort algorithm
> > work directly based on the file (Mmap) ?
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Would-
> > docvalues-be-loaded-into-jvm-tp4340644.html
> > Sent from the Lucene - General mailing list archive at Nabble.com.
> 
> 
> 
> 
>   If you reply to this email, your message will be added to the discussion
> below:
>  http://lucene.472066.n3.nabble.com/Would-docvalues-be-loaded-into-jvm-
> tp4340644p4340659.html
>   To unsubscribe from Would docvalues be loaded into jvm?, click here.
>  NAML
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Would-
> docvalues-be-loaded-into-jvm-tp4340644p4340667.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Would docvalues be loaded into jvm?

2017-06-14 Thread Uwe Schindler
Hi

It works directly off  the mmapped files. It is not fully loaded into heap, 
only some small control structures are allocated on heap. During sorting the 
TopDocsCollector uses the memory mapped structures to uncompress and lookup the 
sort values.

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: wangqinghuan [mailto:1095193...@qq.com]
> Sent: Thursday, June 15, 2017 4:36 AM
> To: general@lucene.apache.org
> Subject: Would docvalues be loaded into jvm?
> 
> hi
> I know that data is written into disk with the style of column-store if I
> enable doc-values for certain field.
> But I don't understand why sorting with docvalues doesn't increase the load
> of jvm. whatever sorting algorithm , data would be loaded into  jvm to sort.
> This should be a high load for jvm when I sort all index  , but  no change
> for jvm in fact.  How does lucene sort with docvalues ? Can sort algorithm
> work directly based on the file (Mmap) ?
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Would-
> docvalues-be-loaded-into-jvm-tp4340644.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Developing experimental "more advanced" analyzers

2017-05-30 Thread Uwe Schindler
Hi,

as you are using Elasticsearch, there is no need to implement an Analyzer 
instance. In general, this is never needed in Lucene, too, as there is the 
class CustomAnalyzer that uses a builder pattern to construct an analyzer like 
Elasticsearch or Solr are doing.

For your use-case you need to implement a custom Tokenizer and/or several 
TokenFilters. In addition you need to create the corresponding factory classes 
and bundle everything as an Elasticsearch plugin. I'd suggest to ask on the 
Elasticsearch mailing lists about this. After that you can define your analyzer 
in the Elasticsearch mapping/index config.

The Tokenizer and TokenFilters can be implemented, e.g. like Robert Muir was 
telling you. The sentence stuff can be done as a segmenting tokenizer subclass. 
Keep in mind, that many tasks can be done with already existing TokenFilters 
and/or Tokenizers.

Lucene has no index support for POS tags, they are only used in the analysis 
chain. To somehow add them to the index, you may use TokenFilters as last stage 
that adds the POS tags to the term (e.g., term "Windmill", pos "subject" could 
be combined in the last TokenFilter to a term called "Windmill#subject" and 
indexed like that). For keeping track of POS tags during the analysis (between 
the tokenfilters and tokenizers), you may need to define custom attributes.

Check the UIMA analysis module for more information how to do this.

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Christian Becker [mailto:christian.frei...@gmail.com]
> Sent: Monday, May 29, 2017 2:37 PM
> To: general@lucene.apache.org
> Subject: Developing experimental "more advanced" analyzers
> 
> Hi There,
> 
> I'm new to lucene (in fact im interested in ElasticSearch but in this case
> its related to lucene) and I want to make some experiments with some
> enhanced analyzers.
> 
> Indeed I have an external linguistic component which I want to connect to
> Lucene / EleasticSearch. So before I'm producing a bunch of useless code, I
> want to make sure that I'm going the right way.
> 
> The linguistic component needs at least a whole sentence as Input (at best
> it would be the whole text at once).
> 
> So as far as I can see I would need to create a custom Analyzer and
> overrride "createComponents" and "normalize".
> 
> Is that correct or am I on the wrong track?
> 
> Bests
> Chris



RE: Java version set to 1.8 for SOLR 6.4.0

2017-02-13 Thread Uwe Schindler
Hi,

SOLR_JAVA_HOME must point to the directory of the JDK, not the "java" command:

SOLR_JAVA_HOME = "/opt/wml/jdk1.8.0_66"

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Uchit Patel [mailto:uchitspa...@yahoo.com.INVALID]
> Sent: Monday, February 13, 2017 11:14 AM
> To: general@lucene.apache.org; solr-u...@lucene.apache.org;
> jan@cominvent.com
> Subject: Re: Java version set to 1.8 for SOLR 6.4.0
> 
> 
> I have updated SOLR_JAVA_HOME in following file.
> /opt/wml/solr-6.4.0/bin/solr.in.sh
> SOLR_JAVA_HOME = "/opt/wml/jdk1.8.0_66/jre/bin/java"
> But it is not working.
> Regards,
> Uchit Patel
> I have installed SOLR 6.4.0 on Linux box. I have Java 1.7.0 and 1.8.0 both on
> the box. By default it point to 1.7.0. Some other applications using 1.7.0 
> Java.
> I want to set Java 1.8.0 only for SOLR 6.4.0. What should I need to update for
> only SOLR 6.4.0 to hit Java 1.8.0. I don't want to remove Java 1.7.0 because
> some other applications using Java 1.7.0.
> Thanks.
> Regards,
> Uchit Patel
> 
>   From: Uchit Patel 
>  To: "general@lucene.apache.org" ; "solr-
> u...@lucene.apache.org" ;
> "jan@cominvent.com" 
>  Sent: Monday, February 13, 2017 3:38 PM
>  Subject: Re: Java version set to 1.8 for SOLR 6.4.0
> 
> Hi ,
> I tried SOLR_JAVA_HOME = "/opt/wml/jdk1.8.0_66/jre/bin/java" but it is not
> working.
> Regards,
> Uchit Patel
> I have installed SOLR 6.4.0 on Linux box. I have Java 1.7.0 and 1.8.0 both on
> the box. By default it point to 1.7.0. Some other applications using 1.7.0 
> Java.
> I want to set Java 1.8.0 only for SOLR 6.4.0. What should I need to update for
> only SOLR 6.4.0 to hit Java 1.8.0. I don't want to remove Java 1.7.0 because
> some other applications using Java 1.7.0.
> Thanks.
> Regards,
> Uchit Patel
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



RE: LongField when searched using classic QueryParser doesnot yield results

2017-01-11 Thread Uwe Schindler
Hi,

this is indeed related to this.

The problem is a missing "schema" in Lucene. If you index values using several 
different field types (like TextField vs. IntField/Float/Double...) this 
information how they were indexed is completely unknown to the query parser. 
The default query parser is using legacy code to create numeric ranges or 
numeric terms: It is just treating them as text! If it searches on a numeric 
field using text terms, it won't find anything.

Solr and Elasticsearch are maintaining a schema of the index. So they subclass 
the query parser and override getRangeQuery and getFieldQuery protected methods 
and using their schema to create the correct query types depending on the 
schema. The default is to create TermQuery and TermRangeQuery, which won't work 
on numeric fields.

To fix this in your code you have to do something similar. YOU are the person 
who knows what the type of Field XY is. If XY is a numeric field, the query 
parser must check the field name and then build the correct query 
(NumericRangeQuery).

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> Sent: Wednesday, January 11, 2017 9:52 AM
> To: general@lucene.apache.org
> Cc: java-u...@lucene.apache.org
> Subject: Re: LongField when searched using classic QueryParser doesnot yield
> results
> 
> Hi Jaspreet,
> 
> Not sure whether this helps to answer your question as I didn't try to run
> the code:
> 
> From official guide:
> 
> > Within Lucene, each numeric value is indexed as a *trie* structure, where
> > each term is logically assigned to larger and larger pre-defined brackets
> > (which are simply lower-precision representations of the value). The step
> > size between each successive bracket is called the precisionStep,
> > measured in bits. Smaller precisionStep values result in larger number of
> > brackets, which consumes more disk space in the index but may result in
> > faster range search performance. The default value, 4, was selected for a
> > reasonable tradeoff of disk space consumption versus performance
> 
> 
> > If you only need to sort by numeric value, and never run range
> > querying/filtering, you can index using a precisionStep of
> > Integer.MAX_VALUE
> >
> <http://download.oracle.com/javase/6/docs/api/java/lang/Integer.html?is-
> external=true#MAX_VALUE>.
> > This will minimize disk space consumed.



RE: Lucene filter

2016-12-02 Thread Uwe Schindler
Hi,

You could use 2 query parsers, e.g., one for the user input and another one for 
the filters. Finally combine the 2 results into one query by combining them 
with an outer BooleanQuery. Having everything in one single string is quite 
uncommon for typical search application logic.

Uwe

-
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Reda Kouba [mailto:redateksys...@gmail.com]
> Sent: Friday, December 2, 2016 8:33 AM
> To: general@lucene.apache.org
> Subject: Re: Lucene filter
> 
> Hi Mikhail,
> 
> Do you have any suggestion to transform a string to a query object?
> thanks,
> reda
> 
> 
> > On 2 Dec. 2016, at 18:26, Mikhail Khludnev  wrote:
> >
> > Hello,
> >
> > I don't think # is supported in query parsers, although it would be great.
> > So, far I saw them in only in toString().
> >
> > On Fri, Dec 2, 2016 at 9:30 AM, Bouadjenek mohamed reda <
> > redateksys...@gmail.com> wrote:
> >
> >> Hi All,
> >>
> >>
> >> I wanna use a filter into a query (BooleanClause.Occur.FILTER). For
> >> example, my query is:
> >>
> >> #repository:clinicaltrials +title:multipl
> >>
> >> It looks like when I build the query from this String, the filter is not
> >> working. In other words, the total hits in the first example below is
> >> different from total hits in the second example below. Please, does
> anyone
> >> know what wrong with this simple example?
> >>
> >> Example 1:
> >> String query = "#repository:clinicaltrials +title:multipl";
> >> QueryParser qr = new QueryParser("", new StandardAnalyzer());
> >> TopDocs hits = is.search(qr.parse(query), 1);
> >>
> >> Example 2:
> >> String[] fields = new String[]{"repository", "title"};
> >> BooleanClause.Occur[] allflags = new
> >> BooleanClause.Occur[]{BooleanClause.Occur.FILTER,
> >> BooleanClause.Occur.MUST};
> >> String[] query_text = new String[]{"clinicaltrials", "multipl"};
> >> Query finalQuery = MultiFieldQueryParser.parse(query_text, fields,
> >> allflags, new StandardAnalyzer());
> >> TopDocs hits = is.search(finalQuery, 1);
> >>
> >>
> >> thanks,
> >>
> >>
> >> Best,
> >> reda
> >>
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev



RE: Doing Range/NUmber queries

2016-08-15 Thread Uwe Schindler
Hi,

I don't understand your question. Filter queries no longer exit since Lucene 6! 
If you want to filter, use any query and pass it as BooleanClause.Occur.FILTER 
to aBooleanQuery. Done.

Those FILTER clauses may (depending on the query type) perform better than MUST 
clauses, because no score is calculated.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: lukes [mailto:mail2lok...@gmail.com]
> Sent: Saturday, August 13, 2016 11:04 PM
> To: general@lucene.apache.org
> Subject: RE: Doing Range/NUmber queries
> 
> Thanks Uwe,
> 
>   Quick follow up questions, would Filter query perform any better ? I hope
> performance would still be same, but just out of curosity.
> 
> Regards.
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Doing-
> Range-NUmber-queries-tp4290722p4291666.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Doing Range/NUmber queries

2016-08-10 Thread Uwe Schindler
Hi,

Since Lucene 6, Filters as a separate class are gone (deprecated in Lucene 5). 
Every query can now be used as a filter. There are 2 possibilities:

- To apply as a filter next to other scoring queries, use a BooleanQuery with 
filter clauses (BooleanClause.Occur.FILTER) next to the standard scoring 
clauses (MUST or SHOULD). The FILTER clauses are standard queries.
- To execute a single query without scoring (constant score of 1), use 
ConstantScoreQuery

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: lukes [mailto:mail2lok...@gmail.com]
> Sent: Wednesday, August 10, 2016 1:44 AM
> To: general@lucene.apache.org
> Subject: Re: Doing Range/NUmber queries
> 
> Hi Michael,
> 
>   Do you know, if Filtered Queries are supported in Lucene ? I tried to
> find, but couldn't find anything relevant. So i have some queries which i
> want to apply as filter, don't want to contribute in the scoring.Would
> filter queries speed up the query process ? Can i combine filter and queries
> together ? Also during indexing do i need to mention anything special for
> fields that can be used in filters ?
> 
> Regards.
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Doing-
> Range-NUmber-queries-tp4290722p4291058.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



Apache Solr and Tika used to index Panama Papers

2016-04-06 Thread Uwe Schindler
Hi all,

I just wanted to repost the following by Chris Mattman on the TIKA list:

If you have been following the news you’ve seen the Panama papers and how the 
world’s rich and elite have been storing all their money offshore to hide it. 
Two of the ASF’s key technologies were used in uncovering that story and 
showing the world what was going on: Apache Tika and Apache Solr.

Solr was used for making the Terabytes of Panama Papers available to 
journalists. The preprocessing of the documents for indexing was done with Tika 
(maybe through the contrib/extraction module).

Here is the article by Forbes about that:
http://www.forbes.com/sites/thomasbrewster/2016/04/05/panama-papers-amazon-encryption-epic-leak

Uwe

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Solr- Fuzzy Search

2016-03-07 Thread Uwe Schindler
Hi,

see here: https://cwiki.apache.org/confluence/display/solr/Highlighting

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: akshaymall [mailto:akshay.m...@eclerx.com]
> Sent: Monday, March 07, 2016 2:16 PM
> To: general@lucene.apache.org
> Subject: RE: Solr- Fuzzy Search
> 
> Hi Uwe,
> 
> Thanks for the reply.
> 
> Could you please help us with the code also?
> 
> We are using C# as the base language and the solr query we have built is as
> follows:
> 
>var facetResults = solr.Query(new SolrQuery("text: agree~2"), new
> QueryOptions
> {
> Rows = 10,
> Facet = new FacetParameters
> {
> Queries = new[]{
> new SolrFacetFieldQuery("extension"){
> Prefix = "",
> Limit = 5
> }
> }
> }
> });
> 
> Regards,
> 
> Akshay Mall
> 
> Senior Analyst,
> Financial Services Product Development,
> eClerx Services Limited
> 
> Phone: +91 9173435462 | +91 9167368827
> eClerx Services Limited [www.eClerx.com]
> [cid:image001.png@01CFD376.89BEAC90]
> 
> From: Uwe Schindler [via Lucene] [mailto:ml-
> node+s472066n4262093...@n3.nabble.com]
> Sent: Monday, March 07, 2016 6:36 PM
> To: Akshay Mall 
> Subject: RE: Solr- Fuzzy Search
> 
> Hi,
> 
> you can use the highlighter functionality. It will "mark" the hits in the
> document text.
> 
> Uwe
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden
> email]
> 
> > -Original Message-
> > From: akshaymall [mailto:[hidden
> email]]
> > Sent: Monday, March 07, 2016 1:46 PM
> > To: [hidden email]
> > Subject: Solr- Fuzzy Search
> >
> > I want to try the fuzzy search in Solr with a specific query.
> >
> > For example, I want to search all the words that match this query: "agree
> > ~2".
> >
> > Now using a simple query, I can find the documents that have the words
> > matching the above query. But I want to know the words that Solr has
> found
> > in the document.
> >
> > Example:
> >
> > Search Result:
> > 1. Sample1.pdf
> > 2. Sample2.pdf
> >
> > What I want as a result:
> >
> > 1. Sample1.pdf : agree, agrea
> > 2. Sample2.pdf : agref, agret
> >
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.nabble.com/Solr-
> > Fuzzy-Search-tp4262092.html
> > Sent from the Lucene - General mailing list archive at Nabble.com.
> 
> 
> 
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Solr-Fuzzy-Search-
> tp4262092p4262093.html
> To unsubscribe from Solr- Fuzzy Search, click
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro
> =unsubscribe_by_code&node=4262092&code=YWtzaGF5Lm1hbGxAZWNsZX
> J4LmNvbXw0MjYyMDkyfC00OTQ1Nzk3MjE=>.
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?mac
> ro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.
> naml.namespaces.BasicNamespace-
> nabble.view.web.template.NabbleNamespace-
> nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscrib
> ers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-
> send_instant_email%21nabble%3Aemail.naml>
> 
> 
> image001.png (21K)
> <http://lucene.472066.n3.nabble.com/attachment/4262097/0/image001.png
> >
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-
> Fuzzy-Search-tp4262092p4262097.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Solr- Fuzzy Search

2016-03-07 Thread Uwe Schindler
Hi,

you can use the highlighter functionality. It will "mark" the hits in the 
document text.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: akshaymall [mailto:akshay.m...@eclerx.com]
> Sent: Monday, March 07, 2016 1:46 PM
> To: general@lucene.apache.org
> Subject: Solr- Fuzzy Search
> 
> I want to try the fuzzy search in Solr with a specific query.
> 
> For example, I want to search all the words that match this query: "agree
> ~2".
> 
> Now using a simple query, I can find the documents that have the words
> matching the above query. But I want to know the words that Solr has found
> in the document.
> 
> Example:
> 
> Search Result:
> 1. Sample1.pdf
> 2. Sample2.pdf
> 
> What I want as a result:
> 
> 1. Sample1.pdf : agree, agrea
> 2. Sample2.pdf : agref, agret
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-
> Fuzzy-Search-tp4262092.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: find out rank of document within a lucene query (with custom sorting)

2016-01-06 Thread Uwe Schindler
Hi,

If you only want to get that information for a single, certain document, you 
can use the "explain" functionality - but you need the internal document ID for 
that. Alternatively execute the query once (for whole result set) and an 
additional time with a filter on your external ID applied. The only result 
would be the filtered document, but with the same score as in first result set.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Anton [mailto:anton.te...@gmail.com]
> Sent: Wednesday, January 06, 2016 10:27 AM
> To: general@lucene.apache.org
> Subject: find out rank of document within a lucene query (with custom
> sorting)
> 
> Hi,
> 
> I am interested to find out what rank a document holds within a search
> query among all documents within the index.
> 
> In more detail: I would like to create a query with a sorting. But I am not
> interested in getting, for instance, the top 10 search hits for that query
> and sorting. I am only interested what rank a certain document from the
> index would be in the result of that query and sorting. It could be the
> 42nd from 1024 documents in that result. I could identify the document via
> an ID field.
> 
> Is there a possibility to do get that rank number efficiently?
> (A simple, but probably time consuming, solution would be: Get all
> documents according to query and sorting. Loop through the result list and
> find the specific document. Return the counter of the loop.)
> 
> Here is a similar question on stackoverflow (without a satisfying answer):
> http://stackoverflow.com/questions/7924146/is-there-a-way-for-solr-
> lucene-to-return-the-ranks-of-selected-documents-instead
> 
> Have a nice day,
> Anton.



CfP about Geospatial Track at ApacheCon, Vancouver

2016-01-04 Thread Uwe Schindler
Hi Committers, hi Lucene users,

On the next ApacheCon in Vancouver, Canada (May 9 - 13 2016), there will be a 
track about geospatial data. The track is organized by Chris Mattmann together 
with George Percivall of the OGC (Open Geospatial Consortium). As I am also a 
member of OGC, they invited me to ask the Lucene Community to propose talks. 
Apache Lucene, Solr, and Elasticsearch have great geospatial features, this 
would be a good idea to present them. This is especially important because the 
current OGC standards are very RDBMS-focused (like filter definitions, 
services,...), so we can use the track to talk with OGC representatives to 
better match OGC standards with full text search.

I am not sure if I can manage to get to Vancouver, but the others are kindly 
invited to submit talks. It is not yet sure if the track will be part of 
ApacheCon Core or ApacheCon BigData. I will keep you informed. If you have talk 
suggestions, please send them to me or Chris Mattmann. Alternatively, submit 
them to the Big Data track @ 
http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp
 (and mention geospatial track).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de





RE: Lucene TypeAttribute not used during querying

2015-09-23 Thread Uwe Schindler
Not as attributes.

As said before, you have to write a separate TokenFilter at the end of your 
indexing chain, that collects the attributes you want to index and add them to 
the term:
- Append the type to term like: TokenFilter's incrementToken does something 
like: termAtt.append('#').append(typeAtt);
- Create a payload out of it: see payload package in analyzers-common for 
examples.

After that you can query using the "extended term" or using payload queries.

You may ask the question about how to query then: appending the type on the 
term itself (see above like "term#type") is no problem during search, because 
also on search side the analyzer is used. The search query gets analyzed and 
the last TokenFilter of the analyzer will add the type to the term as described 
before. The query will then hit all terms with that type.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Paul Bedaride [mailto:paul.bedar...@xilopix.com]
> Sent: Wednesday, September 23, 2015 11:38 AM
> To: general@lucene.apache.org
> Subject: Re: Lucene TypeAttribute not used during querying
> 
> Ok so it is not possible to store other part of information in the index ? 
> like
> part-of-speach ?
> 
> Thanks for the fast answer
> 
> Paul
> 
> On 23/09/2015 11:21, Uwe Schindler wrote:
> > Hi,
> >
> > The type attribute is not stored in index. The main intention behind this
> attribute is to use it inside the analysis chain. E.g. you have some
> tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter
> before indexing may then change the term accordingly (e.g. adding the type
> as a payload, or append it to the term itsself) to get the information into
> index - but this is mainly your task. The same applies for other language
> specific attributes (like Japanese ones). The keyword attribute is another
> example, it is also not indexed, but is solely used to control behavior of 
> later
> TokenFilters (e.g. prevent stemming).
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> >> -Original Message-
> >> From: Paul Bedaride [mailto:paul.bedar...@xilopix.com]
> >> Sent: Wednesday, September 23, 2015 11:16 AM
> >> To: general@lucene.apache.org
> >> Subject: Lucene TypeAttribute not used during querying
> >>
> >> Hello,
> >>
> >> I wonder why the TypeAttribute is not used for queries ?
> >> It seems that it is used only during analysis.
> >> Why it is not used in org.apache.lucene.index.Term ?
> >>
> >> Paul Bédaride



RE: Lucene TypeAttribute not used during querying

2015-09-23 Thread Uwe Schindler
Hi,

The type attribute is not stored in index. The main intention behind this 
attribute is to use it inside the analysis chain. E.g. you have some 
tokenizer/stemmer/whatever that sets the attribute. The last TokenFilter before 
indexing may then change the term accordingly (e.g. adding the type as a 
payload, or append it to the term itsself) to get the information into index - 
but this is mainly your task. The same applies for other language specific 
attributes (like Japanese ones). The keyword attribute is another example, it 
is also not indexed, but is solely used to control behavior of later 
TokenFilters (e.g. prevent stemming).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Paul Bedaride [mailto:paul.bedar...@xilopix.com]
> Sent: Wednesday, September 23, 2015 11:16 AM
> To: general@lucene.apache.org
> Subject: Lucene TypeAttribute not used during querying
> 
> Hello,
> 
> I wonder why the TypeAttribute is not used for queries ?
> It seems that it is used only during analysis.
> Why it is not used in org.apache.lucene.index.Term ?
> 
> Paul Bédaride



RE: Lucene 5 Custom FieldComparator

2015-08-13 Thread Uwe Schindler
You have to index as Docvalues since 5.0 to do that type of query. FieldCache 
is gone, see MIGRATE.txt.
MultiFields does not help here, it is more to view the whole index as a single 
LeafReader although it contains of multiple segments (LeafReaders). Its also 
used for merging, but user code should not use it.

The doSetNextReader is provided in the API because the collecting of results is 
done per index segment (means per LeafReader) and the document ids reported to 
collect() are relative to those readers, not valid globally. In setNextReader 
you have to fetch the docvalues from the index using LeafReader and access them 
later in the compare methods using the local docids.

Uwe

P.S.: FieldCache is still available as a reader wrapper in misc modules 
'uninverting' package, but the API no longer returns arrays. You just get back 
a DocValues emulation, which is random access. You still have to do this per 
index segment (setNextReader).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Pablo Mincz [mailto:pmi...@gmail.com]
> Sent: Thursday, August 13, 2015 5:35 PM
> To: general@lucene.apache.org
> Subject: Lucene 5 Custom FieldComparator
> 
> Hi,
> 
> I'm doing a migration from Lucene 3.6.1 to 5.2.1 and I have a custom
> FieldComparator that sort the search for availables discounts. For this, 
> first I
> check that the date range is valid and later sort by the discount amount.
> 
> I did this in Lucene 3.6.1 but now in 5.2.1 version, the FieldComparator has
> the method doSetNextReader that has a LeafReaderContext and I do not
> know how to read all the fields from the LeafReader because I did not
> indexed this field with DocValues.
> 
> I tried with MultiFields but I got only one result instead of an array, and 
> some
> values are Floats.
> 
> Someone know how to do this?
> 
> Thanks a lot for the help.
> 
> Regards,
> Pablo.



RE: Controlling size of matched results in Lucene

2015-05-13 Thread Uwe Schindler
There is one possibility to get *all* documents, if you are happy with the fact 
that non-matching documents get a score factor of 0.0:

Scores of SHOULD clauses in a Boolean query get "added". You can combine your 
original query ("yourOriginalQuery"; with real scores) with another query 
("fakeQuery") matching all documents and having score = 0.0 in a single final 
BooleanQuery ("finalQuery"=, you get score 0.0 for non-matching documents, and 
score 0.0 + realScore => realScore for all others:

BooleanQuery finalQuery = new BooleanQuery(true); // BooleanQuery without coord 
factors, so it just adds scores nothing else
finalQuery.add(yourOriginalQuery, BooleanClause.Occur.SHOULD);

Query fakeQuery = new MatchAllDocsQuery();
fakeQuery.setBoost(0); // tune the boost, so this query always returns a score 
of 0
finalQuery.add(fakeQuery, BooleanClause.Occur.SHOULD);

// execute finalQuery!

Hope that helps!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Susrutha Gongalla [mailto:susrutha.gonga...@gmail.com]
> Sent: Tuesday, May 12, 2015 12:28 PM
> To: general@lucene.apache.org
> Subject: Controlling size of matched results in Lucene
> 
> Hello,
> 
> I am developing a matching algorithm using Lucene 4.10.0 My index consists
> of about 2000 documents.
> When I use the 'search' method on a query term, I get about n results that
> point to the n documents in index along with their corresponding scores.
> What I would like to get is - All the 2000 documents with their lucene scores
> along with whether they are matched/unmatched.
> I would like to control the size of results that lucene returns when I search 
> for
> a query term.
> 
> I have tried altering the default similarity used in lucene by overriding the
> score methods.
> However, this did not affect the size of results generated by lucene.
> 
> I also tried explicitly given 'null' value for filter, when calling the 
> 'search'
> method.
> This also did not affect the size of results.
> 
> I just started working with Lucene.
> Would appreciate any help in this regard!
> 
> Best,
> Susrutha Gongalla



ApacheCon NA 2015 in Austin, Texas

2015-03-19 Thread Uwe Schindler
Dear Apache Lucene/Solr enthusiast,

In just a few weeks, we'll be holding ApacheCon in Austin, Texas, and we'd love 
to have you in attendance. You can save $300 on admission by registering NOW, 
since the early bird price ends on the 21st.

Register at http://s.apache.org/acna2015-reg

ApacheCon this year celebrates the 20th birthday of the Apache HTTP Server, and 
we'll have Brian Behlendorf, who started this whole thing, keynoting for us, 
and you'll have a chance to meet some of the original Apache Group, who will be 
there to celebrate with us.

We also have talks about Apache Lucene and Apache Solr in 7 tracks of great 
talks, as well as BOFs, the Apache BarCamp, project-specific hack events, and 
evening events where you can deepen your connection with the larger Apache 
community. See the full schedule at http://apacheconna2015.sched.org/

And if you have any questions, comments, or just want to hang out with us 
before and during the event, follow us on Twitter - @apachecon - or drop by 
#apachecon on the Freenode IRC network.

Hope to see you in Austin!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: If I put a cached filter, do i still need to set the max clause count?

2015-03-11 Thread Uwe Schindler
Hi,

This question is about Lucene.NET, which is not part of the Apache Lucene 
project. Please send your questions to the Lucene.NET mailing list: 
u...@lucenenet.apache.org

Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/


> -Original Message-
> From: doglin82 [mailto:karen@bellmedia.ca]
> Sent: Wednesday, March 11, 2015 4:21 PM
> To: general@lucene.apache.org
> Subject: If I put a cached filter, do i still need to set the max clause 
> count?
> 
> We are using Lucene for our web site and as the index grew, we got the
> following exception.
> 
> the Message: maxClauseCount is set to 1024 Stack Trace:
> at Lucene.Net.Search.BooleanQuery.Add(BooleanClause clause) at
> Lucene.Net.Search.BooleanQuery.Add(Query query, Occur occur) Instead of
> using Range Query, I am using a Range Filter now, and wrapped it with a
> Cached Filter,
> 
>   RangeFilter dateFilter = new RangeFilter("documentpublishfrom",
> "210100",
> DateTime.Now.AddYears(10).ToString("MMddHHmmss"),
> true, true);
> 
> 
>   CachingWrapperFilter cachingFilter = new CachingWrapperFilter(dateFilter );
> 
> 
> var results = _searcher.Search(bq, cachingFilter, sortBy); So, now that I 
> am
> using filters intsead, do i need to set max clause count still? Please advise
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/If-I-put-
> a-cached-filter-do-i-still-need-to-set-the-max-clause-count-tp4192413.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: how to use CachingWrapperFilter correctly and effectively in Lucene

2015-03-11 Thread Uwe Schindler
Hi,

This question is about Lucene.NET, which is not part of the Apache Lucene 
project. Please send your questions to the Lucene.NET mailing list: 
u...@lucenenet.apache.org

Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

> -Original Message-
> From: doglin82 [mailto:karen@bellmedia.ca]
> Sent: Wednesday, March 11, 2015 8:23 PM
> To: general@lucene.apache.org
> Subject: how to use CachingWrapperFilter correctly and effectively in Lucene
> 
> We are using Lucene for our web site and as the index grew, we got the
> following exception due to too many clause.
> 
> the Message: maxClauseCount is set to 1024 Stack Trace:
> at Lucene.Net.Search.BooleanQuery.Add(BooleanClause clause) at So I did
> some research and added a CachingWrapperFilter , my code now looks like
> this
> 
> BooleanQuery bq = new BooleanQuery();
> 
> //publishedQuery is set to BooleanQuery
> bq.Add(publishedQuery, BooleanClause.Occur.MUST);
> 
>  var sortBy = customSort ?? new Sort(Sort.RELEVANCE.GetSort());
> BooleanQuery.SetMaxClauseCount(4096);
> 
>Filter filter = new QueryFilter(bq);
> CachingWrapperFilter cachingFilter = new 
> CachingWrapperFilter(filter);
> 
> var results = _searcher.Search(bq, cachingFilter,sortBy); I want 
> to know
> 1) If I am using the CachingWrapperFilter correct and effectively
> 2) Do I still need to set the Max Clause to 4096 if I am using CachingWrapper
> Filter, default is 1024 for max clause count
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/how-to-
> use-CachingWrapperFilter-correctly-and-effectively-in-Lucene-
> tp4192491.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: If I put a cached filter, do i still need to set the max clause count?

2015-03-11 Thread Uwe Schindler
Hi,

This question is about Lucene.NET, which is not part of the Apache Lucene 
project. Please send your questions to the Lucene.NET mailing list: 
u...@lucenenet.apache.org

Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/


> -Original Message-
> From: doglin82 [mailto:karen@bellmedia.ca]
> Sent: Wednesday, March 11, 2015 4:21 PM
> To: general@lucene.apache.org
> Subject: If I put a cached filter, do i still need to set the max clause 
> count?
> 
> We are using Lucene for our web site and as the index grew, we got the
> following exception.
> 
> the Message: maxClauseCount is set to 1024 Stack Trace:
> at Lucene.Net.Search.BooleanQuery.Add(BooleanClause clause) at
> Lucene.Net.Search.BooleanQuery.Add(Query query, Occur occur) Instead of
> using Range Query, I am using a Range Filter now, and wrapped it with a
> Cached Filter,
> 
>   RangeFilter dateFilter = new RangeFilter("documentpublishfrom",
> "210100",
> DateTime.Now.AddYears(10).ToString("MMddHHmmss"),
> true, true);
> 
> 
>   CachingWrapperFilter cachingFilter = new CachingWrapperFilter(dateFilter );
> 
> 
> var results = _searcher.Search(bq, cachingFilter, sortBy); So, now that I 
> am
> using filters intsead, do i need to set max clause count still? Please advise
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/If-I-put-
> a-cached-filter-do-i-still-need-to-set-the-max-clause-count-tp4192413.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Reminder: FOSDEM 2015 - Open Source Search Dev Room

2014-12-03 Thread Uwe Schindler
Hello everyone,

We have extended the deadline for submissions to the FOSDEM 2015 Open Source 
Search Dev
Room to Monday, 9 December at 23:59 CET.

We are looking forward to your talk proposal!

Cheers,
Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

> -Original Message-
> From: Uwe Schindler [mailto:uschind...@apache.org]
> Sent: Monday, November 24, 2014 9:33 AM
> To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
> u...@lucene.apache.org; general@lucene.apache.org
> Subject: Reminder: FOSDEM 2015 - Open Source Search Dev Room
> 
> Hi,
> 
> We host a Dev-Room about "Open Source Search" on this year's FOSDEM
> 2015 (https://fosdem.org/2015/), taking place on January 31th and February
> 1st, 2015, in Brussels, Belgium. There is still one more week to submit your
> talks, so hurry up and submit your talk early!
> 
> Here is the full CFP as posted a few weeks ago:
> 
> Search has evolved to be much more than simply full-text search. We now
> rely on “search engines” for a wide variety of functionality:
> search as navigation, search as analytics and backend for data visualization
> and sometimes, dare we say it, as a data store. The purpose of this dev room
> is to explore the new world of open source search engines: their enhanced
> functionality, new use cases, feature and architectural deep dives, and the
> position of search in relation to the wider set of software tools.
> 
> We welcome proposals from folks working with or on open source search
> engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
> or technologies that heavily depend upon search (e.g.
> NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
> presentations on search algorithms, machine learning, real-world
> implementation/deployment stories and explorations of the future of
> search.
> 
> Talks should be 30-60 minutes in length, including time for Q&A.
> 
> You can submit your talks to us here:
> https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
> 8G0OxSfp84A/viewform
> 
> Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
> cannot guarantee we will have the opportunity to review submissions made
> after the deadline, so please submit early (and often)!
> 
> Should you have any questions, you can contact the Dev Room
> organizers: opensourcesearch-devr...@lists.fosdem.org
> 
> Cheers,
> LH on behalf of the Open Source Search Dev Room Program Committee*
> 
> * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
> Curdt, Uwe Schindler
> 
> -
> Uwe Schindler
> uschind...@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org



Reminder: FOSDEM 2015 - Open Source Search Dev Room

2014-11-24 Thread Uwe Schindler
Hi,

We host a Dev-Room about "Open Source Search" on this year's FOSDEM 2015 
(https://fosdem.org/2015/), taking place on January 31th and February 1st, 
2015, in Brussels, Belgium. There is still one more week to submit your talks, 
so hurry up and submit your talk early!

Here is the full CFP as posted a few weeks ago:

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for Q&A.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
Hi,

forgot to mention:
FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See 
also: https://fosdem.org/2015/

I hope to see you there!
Uwe

> -Original Message-
> From: Uwe Schindler [mailto:uschind...@apache.org]
> Sent: Monday, November 03, 2014 1:29 PM
> To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
> u...@lucene.apache.org; general@lucene.apache.org
> Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room
> 
> ***Please forward this CFP to anyone who may be interested in
> participating.***
> 
> Hi,
> 
> Search has evolved to be much more than simply full-text search. We now
> rely on “search engines” for a wide variety of functionality:
> search as navigation, search as analytics and backend for data visualization
> and sometimes, dare we say it, as a data store. The purpose of this dev room
> is to explore the new world of open source search engines: their enhanced
> functionality, new use cases, feature and architectural deep dives, and the
> position of search in relation to the wider set of software tools.
> 
> We welcome proposals from folks working with or on open source search
> engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
> or technologies that heavily depend upon search (e.g.
> NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
> presentations on search algorithms, machine learning, real-world
> implementation/deployment stories and explorations of the future of
> search.
> 
> Talks should be 30-60 minutes in length, including time for Q&A.
> 
> You can submit your talks to us here:
> https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
> 8G0OxSfp84A/viewform
> 
> Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
> cannot guarantee we will have the opportunity to review submissions made
> after the deadline, so please submit early (and often)!
> 
> Should you have any questions, you can contact the Dev Room
> organizers: opensourcesearch-devr...@lists.fosdem.org
> 
> Cheers,
> LH on behalf of the Open Source Search Dev Room Program Committee*
> 
> * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
> Curdt, Uwe Schindler
> 
> -
> Uwe Schindler
> uschind...@apache.org
> Apache Lucene PMC Member / Committer
> Bremen, Germany
> http://lucene.apache.org/
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
***Please forward this CFP to anyone who may be interested in participating.***

Hi,

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for Q&A.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-----
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

2014-08-18 Thread Uwe Schindler
Hallo Apache Solr Users,

the Apache Lucene PMC wants to make the users of Solr aware of  the following 
issue:

Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its 
binary release tarball. This version (and all previous ones) of Apache POI are 
vulnerable to the following issues:

= CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML 
parser =
Type: Information disclosure
Description: Apache POI uses Java's XML components to parse OpenXML files 
produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that 
accept such files from end-users are vulnerable to XML External Entity (XXE) 
attacks, which allows remote attackers to bypass security restrictions and read 
arbitrary files via a crafted OpenXML document that provides an XML external 
entity declaration in conjunction with an entity reference.

= CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML 
parser =
Type: Denial of service
Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse 
OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). 
Applications that accept such files from end-users are vulnerable to XML Entity 
Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume 
large amounts of CPU resources.

The Apache POI PMC released a bugfix version (3.10.1) today.

Solr users are affected by these issues, if they enable the "Apache Solr 
Content Extraction Library (Solr Cell)" contrib module from the folder 
"contrib/extraction" of the release tarball.

Users of Apache Solr are strongly advised to keep the module disabled if they 
don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can 
update the affected libraries by replacing the vulnerable JAR files in the 
distribution folder. Users of previous versions have to update their Solr 
release first, patching older versions is impossible.

To replace the vulnerable JAR files follow these steps:

- Download the Apache POI 3.10.1 binary release: 
http://poi.apache.org/download.html#POI-3.10.1
- Unzip the archive
- Delete the following files in your "solr-4.X.X/contrib/extraction/lib" 
folder: 
# poi-3.10-beta2.jar
# poi-ooxml-3.10-beta2.jar
# poi-ooxml-schemas-3.10-beta2.jar
# poi-scratchpad-3.10-beta2.jar
# xmlbeans-2.3.0.jar
- Copy the following files from the base folder of the Apache POI distribution 
to the "solr-4.X.X/contrib/extraction/lib" folder: 
# poi-3.10.1-20140818.jar
# poi-ooxml-3.10.1-20140818.jar
# poi-ooxml-schemas-3.10.1-20140818.jar
# poi-scratchpad-3.10.1-20140818.jar
- Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the 
"solr-4.X.X/contrib/extraction/lib" folder.
- Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any 
files with version number "3.10-beta2".
- Verify that the folder contains one xmlbeans JAR file with version 2.6.0.

If you just want to disable extraction of Microsoft Office documents, delete 
the files above and don't replace them. "Solr Cell" will automatically detect 
this and disable Microsoft Office document extraction.

Coming versions of Apache Solr will have the updated libraries bundled.

Happy Searching and Extracting,
The Apache Lucene Developers

PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting 
these issues!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





Close of Apache Lucene's Open Relevance sub-project

2014-06-11 Thread Uwe Schindler
Hi,

the PMC decided in a vote, that the Apache Lucene sub-project "Open Relevance" 
will be discontinued. There was no activity during the last years and the 
project made no releases at all. I send this as the last message to the already 
existing mailing lists, so people are aware that we no longer provide 
infrastructure like mailing lists. Any discussion in the "open relevance" 
context should in the future be directed to: general@lucene.apache.org

The already existing data collections in SVN will be kept alive, because 
Subversion never forgets. Please use them, if you are willing to do so.

Thank you to all committers for their support in this project! I very much like 
the Wiki page discussing all the problems with available collections. We should 
maybe move over these pages to a static web page on the "attic" Lucene project 
page or copy them into the Lucene Wiki. The CWIKI should then be closed, too.

Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/





[ANNOUNCE] Apache Lucene 4.8.0 released

2014-04-28 Thread Uwe Schindler
28 April 2014, Apache Lucene™ 4.8.0 available

The Lucene PMC is pleased to announce the release of Apache Lucene 4.8.0

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release contains numerous bug fixes, optimizations, and
improvements, some of which are highlighted below. The release
is available for immediate download at:
  http://lucene.apache.org/core/mirrors-core-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 4.8.0 Release Highlights:

* Apache Lucene now requires Java 7 or greater (recommended is
  Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions
  have known JVM bugs affecting Lucene).

* Apache Lucene is fully compatible with Java 8.

* All index files now store end-to-end checksums, which are
  now validated during merging and reading. This ensures that
  corruptions caused by any bit-flipping hardware problems or bugs
  in the JVM can be detected earlier.  For full detection be sure
  to enable all checksums during merging (it's disabled by default).

* Lucene has a new Rescorer/QueryRescorer API to perform second-pass
  rescoring or reranking of search results using more expensive scoring
  functions after first-pass hit collection.

* AnalyzingInfixSuggester now supports near-real-time autosuggest.

* Simplified impact-sorted postings (using SortingMergePolicy and
  EarlyTerminatingCollector) to use Lucene's Sort class
  to express the sort order.

* Bulk scoring and normal iterator-based scoring were separated,
  so some queries can do bulk scoring more effectively.

* Switched to MurmurHash3 to hash terms during indexing.

* IndexWriter now supports updating of binary doc value fields.

* HunspellStemFilter now uses 10 to 100x less RAM. It also loads
  all known OpenOffice dictionaries without error.

* Lucene now also fsyncs the directory metadata on commits, if the
  operating system and file system allow it (Linux, MacOSX are
  known to work).

* Lucene now uses Java 7 file system functions under the hood,
  so index files can be deleted on Windows, even when readers are
  still open.

* A serious bug in NativeFSLockFactory was fixed, which could
  allow multiple IndexWriters to acquire the same lock.  The
  lock file is no longer deleted from the index directory
  even when the lock is not held.

* Various bugfixes and optimizations since the 4.7.2 release.

Please read CHANGES.txt for a full list of new features.

Please report any feedback to the mailing lists
(http://lucene.apache.org/core/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/





[ANNOUNCE] Apache Solr 4.8.0 released

2014-04-28 Thread Uwe Schindler
28 April 2014, Apache Solr™ 4.8.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.8.0 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.8.0 Release Highlights:

* Apache Solr now requires Java 7 or greater (recommended is
  Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions
  have known JVM bugs affecting Solr).

* Apache Solr is fully compatible with Java 8.

*  and  tags have been deprecated from schema.xml.
  There is no longer any reason to keep them in the schema file,
  they may be safely removed. This allows intermixing of ,
   and  definitions if desired.

* The new {!complexphrase} query parser supports wildcards, ORs etc.
  inside Phrase Queries. 

* New Collections API CLUSTERSTATUS action reports the status of
  collections, shards, and replicas, and also lists collection
  aliases and cluster properties.
 
* Added managed synonym and stopword filter factories, which enable
  synonym and stopword lists to be dynamically managed via REST API.

* JSON updates now support nested child documents, enabling {!child}
  and {!parent} block join queries. 

* Added ExpandComponent to expand results collapsed by the
  CollapsingQParserPlugin, as well as the parent/child relationship
  of nested child documents.

* Long-running Collections API tasks can now be executed
  asynchronously; the new REQUESTSTATUS action provides status.

* Added a hl.qparser parameter to allow you to define a query parser
  for hl.q highlight queries.

* In Solr single-node mode, cores can now be created using named
  configsets.

* New DocExpirationUpdateProcessorFactory supports computing an
  expiration date for documents from the "TTL" expression, as well as
  automatically deleting expired documents on a periodic basis. 

Solr 4.8.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/




Attention: Lucene 4.8 and Solr 4.8 will require minimum Java 7

2014-03-12 Thread Uwe Schindler
Hi,

the Apache Lucene/Solr committers decided with a large majority on the vote to 
require Java 7 for the next minor release of Apache Lucene and Apache Solr 
(version 4.8)!
Support for Java 6 by Oracle  already ended more than a year ago and Java 8 is 
coming out in a few days.

The next release will also contain some improvements for Java 7:
- Better file handling (especially on Windows) in the directory 
implementations. Files can now be deleted on windows, although the index is 
still open - like it was always possible on Unix environments (delete on last 
close semantics).
- Speed improvements in sorting comparators: Sorting now uses Java 7's own 
comparators for integer and long sorts, which are highly optimized by the 
Hotspot VM..

If you want to stay up-to-date with Lucene and Solr, you should upgrade your 
infrastructure to Java 7. Please be aware that you must use at least use Java 
7u1.
The recommended version at the moment is Java 7u25. Later versions like 7u40, 
7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 
prerelease, which has fixed this bug. Once 7u60 is out, this will be the 
recommended version.
In addition, there is no Oracle/BEA JRockit available for Java 7, use the 
official Oracle Java 7. JRockit was never working correctly with Lucene/Solr 
(causing index corrumption), so this should not be an issue for you. Please 
also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs

Apache Lucene and Apache Solr were also heavily tested with all prerelease 
versions of Java 8, so you can also give it a try! Looking forward to the 
official Java 8 release next week - I will run my indexes with that version for 
sure!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de





RE: VOTE: Release apache-solr-ref-guide-4.7.pdf (RC1)

2014-03-02 Thread Uwe Schindler
Hi,

 

I did not read the whole guide, but the SHA1 and signature are correct. I 
checked the documentation index and searched for outdated version numbers, 
looks like nothing is seriously broken.

 

+1 to release.

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Cassandra Targett [mailto:casstarg...@gmail.com] 
Sent: Wednesday, February 26, 2014 5:59 PM
To: d...@lucene.apache.org
Cc: Lucene mailing list
Subject: VOTE: Release apache-solr-ref-guide-4.7.pdf (RC1)

 

I generated a new release candidate for the Solr Reference Guide. This fixes 
the page numbering problem and a few other minor edits folks made yesterday 
after I generated RC0.

https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.7-RC1/

+1 from me.

Cassandra



September 2013 Lucene report to the ASF Board

2013-09-02 Thread Uwe Schindler
Hi,

I checked in a draft version of the board report for the September ASF board 
meeting: http://s.apache.org/xz3

Please change & commit, if you notice anything missing/wrong. If everybody is 
happy, I will submit this at latest: Wednesday, 11 September 2013.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de




RE: Welcome Cassandra Targett as Lucene/Solr committer

2013-07-31 Thread Uwe Schindler
Welcome Cassandra!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Thursday, August 01, 2013 12:48 AM
> To: d...@lucene.apache.org; Lucene mailing list
> Cc: Cassandra Targett
> Subject: Welcome Cassandra Targett as Lucene/Solr committer
> 
> I'm pleased to announce that Cassandra Targett has accepted to join our
> ranks as a committer.
> 
> Cassandra worked on the donation of the new Solr Reference Guide [1] and
> getting things in order for its first official release [2].
> Cassandra, it is tradition that you introduce yourself with a brief bio.
> 
> Welcome!
> 
> P.S. As soon as your SVN access is setup, you should then be able to add
> yourself to the committers list on the website as well.
> 
> [1]
> https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+
> Guide
> [2] https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/



RE: How to start with Lucene

2013-07-07 Thread Uwe Schindler
It depends what you are intending to do:
- If you want to write an application that uses Apache Lucene, you can use the 
binary tgz file. It will contain all needed JARs to build an application. 
Alternatively, setup a Maven project and add the Lucene dependencies (core, 
analyzers,  queryparser,...).
- If you want to hack Apache Lucene itself (changing implementation, submitting 
patches to the developers), check out SVN and start with running "ant" to 
build. To setup in IDEs use "ant eclipse" or "ant idea", which builds project 
files that you can import into your IDE - You only need to do this if you want 
to modify Lucene, not to use it
- If you don't want to code and just use a Server with Lucene (like a database 
server), you can start with Apache Solr or ElasticSearch.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Vinh Đặng [mailto:dqvin...@gmail.com]
> Sent: Sunday, July 07, 2013 11:51 AM
> To: general@lucene.apache.org
> Subject: How to start with Lucene
> 
> Hi everyone,
> 
> I am a new with Lucene, with basic programming Java knowledge, but not
> export.
> 
> Now I am very confused, because there are three kinds of Lucene I can
> download: from svn, TAR.GZ file and .ZIP file, but the problem is they seems
> different.
> 
> Could you give me an instruction to setup Lucene with some kinds of IDE,
> such as Eclipse, to start



PMC Chair change

2013-06-22 Thread Uwe Schindler
Hi all,

most of you will already know it: Since June 19, 2013, I am the new Project 
Management Committee Chair, replacing Steve Rowe. I am glad to manage all the 
legal stuff for new committers or contributions from external entities - and 
also preparing the board reports. All this, of course and as always, with 
included but not deliberate @UweSays quotations.

Many thanks to all PMC members who have voted for me!
Many thanks to Steve for the help and hints to all the things I have to know in 
my new role!

Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/




RE: problem with solr 4.3.1 installation

2013-06-19 Thread Uwe Schindler
Hi,

 

It depends on your configuration! The installation of jetty shipped with Solr 
is optimized for the typical usage pattern of Solr.

Webapp containers like JBoss often have additional monitoring modules that may 
have an impact on the performance (I know that Jboss often has crazy plugins in 
the JVM for that, which have a large impact on garbage collection). Sometimes 
the web app container ships with malfunctioning  Java versions, so take care.

Also some containers use incorrect charsets, so the UTF-8 decoding of %-encoded 
query parameters is broken. We have a workaround for that in later Solr 
versions (4.1+), but in general all this is not tested with foreign servlet 
containers, so we cannot give any support.

 

Uwe

 

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Kuldeep Sharma [mailto:kuldeep.sha...@hcl.com] 
Sent: Wednesday, June 19, 2013 6:20 PM
To: general@lucene.apache.org
Cc: Uwe Schindler
Subject: RE: problem with solr 4.3.1 installation

 

Hi Uwe,

 

Is there any performance degradation or other limitation if we use JBOSS 
instead of jetty for deploying Solr?

Currently, we are using Jboss AS in Production and don’t seems any issue till 
now since last 2-3 months.

 

Thanks!

Kuldeep

 

-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Wednesday, June 19, 2013 10:49 AM
To: general@lucene.apache.org
Subject: RE: problem with solr 4.3.1 installation

 

Hi,

 

 

> Thanks Uwe for your quick reply.

> 

> I got the the problem of setting classpath now.

> 

> But i have few question based on your reply. May not be related to the topic

> of the thread.

> 

> Question 1 related to the point - *"In general it is not recommended to 
> install

> Solr inside a custom webapp container"* where will solr runs in production

> environments? i was thinking that it has to run on some web containers.

> Jetty is only for playing around/testing.

> Please correct me if i am wrong.

 

See Solr like a database server (MySQL or Postgres). Do you install MySQL 
inside a servlet container? - no it runs as a separate service in a separate 
process!

Jetty has nothing to do with "playing around". Jetty is just the web connector 
of Solr and is the officially supported HTTP interface. Future versions of Solr 
may replace jetty by e.g. the netty library for high performance select-based 
I/O.

 

Your custom application talks to Solr using the HTTP protocol, but that does 
not mean that Solr must run in your webapp container. Solr runs (like a MySQL 
database server) ideally as a separate JVM instance. That's the recommended 
installation.

 

> Question 2 related to the point - "*Future versions of Solr may no longer ship

> with a WAR file because it causes too many problems, because Solr does not

> work well with other webapps in the same JVM"*

> *

> *

> how can we use solr with any app say my own web app (typically, a spring

> mvc or any EE app) if it is not shipped as war? Can't we use Solr for solving

> problem of complex/advance search implementation (time consuming

> search queries from RDBMS) that normally exists in any web app? say any

> inventory management or warehouse management apps.

 

See above.

 

> My requirement is for inputting data to reporting engine like crystal/jasper

> and to generate analytic chars for our dashboards.

> 

> After seeing your reply, i started thinking that solr is not the one for my

> requirement. Please clarify.

> 

> Thanks a ton.

> 

> Pradeep

> *

> *

> 

> 

> On Wed, Jun 19, 2013 at 8:22 PM, Uwe Schindler < <mailto:u...@thetaphi.de> 
> u...@thetaphi.de> wrote:

> 

> > Hi,

> >

> > See  <http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home> 
> > http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home  for

> > instructions (scroll down and look for the JNDI options). The problem

> > is that SOLR_HOME must be known to JBoss, otherwise the webapp cannot

> > locate any files from the config.

> >

> > In general it is not recommended to install Solr inside a custom

> > webapp container (means installing the WAR file in tomcat, jboss or

> > whatever). You should use the included web engine (provided by jetty)

> > with a recent JDK version. The example folder has a start.jar. You can

> > start the correctly configured Jetty engine with solr by running "java

> > -jar start.jar" from the example folder.

> >

> > Future versions of Solr may no longer ship with a WAR file because it

> > causes too many problems, because Solr does not work well with other

> > webapps in the same JVM (it has very spe

RE: problem with solr 4.3.1 installation

2013-06-19 Thread Uwe Schindler
Hi,


> Thanks Uwe for your quick reply.
> 
> I got the the problem of setting classpath now.
> 
> But i have few question based on your reply. May not be related to the topic
> of the thread.
> 
> Question 1 related to the point - *"In general it is not recommended to 
> install
> Solr inside a custom webapp container"* where will solr runs in production
> environments? i was thinking that it has to run on some web containers.
> Jetty is only for playing around/testing.
> Please correct me if i am wrong.

See Solr like a database server (MySQL or Postgres). Do you install MySQL 
inside a servlet container? - no it runs as a separate service in a separate 
process!
Jetty has nothing to do with "playing around". Jetty is just the web connector 
of Solr and is the officially supported HTTP interface. Future versions of Solr 
may replace jetty by e.g. the netty library for high performance select-based 
I/O.

Your custom application talks to Solr using the HTTP protocol, but that does 
not mean that Solr must run in your webapp container. Solr runs (like a MySQL 
database server) ideally as a separate JVM instance. That's the recommended 
installation.

> Question 2 related to the point - "*Future versions of Solr may no longer ship
> with a WAR file because it causes too many problems, because Solr does not
> work well with other webapps in the same JVM"*
> *
> *
> how can we use solr with any app say my own web app (typically, a spring
> mvc or any EE app) if it is not shipped as war? Can't we use Solr for solving
> problem of complex/advance search implementation (time consuming
> search queries from RDBMS) that normally exists in any web app? say any
> inventory management or warehouse management apps.

See above.

> My requirement is for inputting data to reporting engine like crystal/jasper
> and to generate analytic chars for our dashboards.
> 
> After seeing your reply, i started thinking that solr is not the one for my
> requirement. Please clarify.
> 
> Thanks a ton.
> 
> Pradeep
> *
> *
> 
> 
> On Wed, Jun 19, 2013 at 8:22 PM, Uwe Schindler  wrote:
> 
> > Hi,
> >
> > See http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home  for
> > instructions (scroll down and look for the JNDI options). The problem
> > is that SOLR_HOME must be known to JBoss, otherwise the webapp cannot
> > locate any files from the config.
> >
> > In general it is not recommended to install Solr inside a custom
> > webapp container (means installing the WAR file in tomcat, jboss or
> > whatever). You should use the included web engine (provided by jetty)
> > with a recent JDK version. The example folder has a start.jar. You can
> > start the correctly configured Jetty engine with solr by running "java
> > -jar start.jar" from the example folder.
> >
> > Future versions of Solr may no longer ship with a WAR file because it
> > causes too many problems, because Solr does not work well with other
> > webapps in the same JVM (it has very special garbage collection and
> > memory requirements), so it should run as a separate server in a separate
> VM.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: pradeep kumar [mailto:pradeepkuma...@gmail.com]
> > > Sent: Wednesday, June 19, 2013 3:56 PM
> > > To: general@lucene.apache.org
> > > Subject: problem with solr 4.3.1 installation
> > >
> > > hello all,
> > >
> > > I have a problem with installing solr 4.31.
> > >
> > > Giving you a background of what i am doing:
> > >
> > > I am trying to evaluate Solr as our search in engine for my project
> > where we
> > > have requirement multiple complex search functionality, reporting
> > > and analytics. We deal with lakhs of records from RDBMS tables which
> > > are
> > linked.
> > > Just to give you an idea, Order, item,  item_details, files, etc. i
> > proposed  solr
> > > and told about lucidworks to rest of my technical team and were
> > impressed.
> > >
> > > My reasons for using solr is to achive fastrer search, input data to
> > report
> > > engine and analytic graphs for our dashboard.
> > >
> > > Other alternative to my solr approach is off-line db with star
> > > schema and from that, get data fro reports or analytics.
> > >
> > > But, I am facing few problems in installing. I so

RE: problem with solr 4.3.1 installation

2013-06-19 Thread Uwe Schindler
Hi,

See http://wiki.apache.org/solr/SolrJBoss#Configuring_Solr_Home  for 
instructions (scroll down and look for the JNDI options). The problem is that 
SOLR_HOME must be known to JBoss, otherwise the webapp cannot locate any files 
from the config.

In general it is not recommended to install Solr inside a custom webapp 
container (means installing the WAR file in tomcat, jboss or whatever). You 
should use the included web engine (provided by jetty) with a recent JDK 
version. The example folder has a start.jar. You can start the correctly 
configured Jetty engine with solr by running "java -jar start.jar" from the 
example folder.

Future versions of Solr may no longer ship with a WAR file because it causes 
too many problems, because Solr does not work well with other webapps in the 
same JVM (it has very special garbage collection and memory requirements), so 
it should run as a separate server in a separate VM.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: pradeep kumar [mailto:pradeepkuma...@gmail.com]
> Sent: Wednesday, June 19, 2013 3:56 PM
> To: general@lucene.apache.org
> Subject: problem with solr 4.3.1 installation
> 
> hello all,
> 
> I have a problem with installing solr 4.31.
> 
> Giving you a background of what i am doing:
> 
> I am trying to evaluate Solr as our search in engine for my project where we
> have requirement multiple complex search functionality, reporting and
> analytics. We deal with lakhs of records from RDBMS tables which are linked.
> Just to give you an idea, Order, item,  item_details, files, etc. i proposed  
> solr
> and told about lucidworks to rest of my technical team and were impressed.
> 
> My reasons for using solr is to achive fastrer search, input data to report
> engine and analytic graphs for our dashboard.
> 
> Other alternative to my solr approach is off-line db with star schema and
> from that, get data fro reports or analytics.
> 
> But, I am facing few problems in installing. I some how feel that installation
> guide is not clear.
> 
> I am missing something?
> 
> 
> 
> Downloaded solr 4.3.1 binaries. Extracted to my local drive. Set SOLR_HOME
> class path in evn variables pointing C:\solr-4.3.1\example\solr and
> SOLR_HOME/bin in path variable.
> 
> Copied solr-4.3.1.war from SOLR_HOME/dist/ to my local jboss instance.
> 
> Started my jboss server.
> 
>  Here is the log
> 
>  16:05:13,715 ERROR [org.apache.solr.core.CoreContainer]
> (coreLoadExecutor-3-thread-1) Unable to create core: collection1:
> org.apache.solr.common.SolrException: Could not load config for
> solrconfig.xml
> 
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:91
> 9)
> [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33]
> 
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
> [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33]
> 
> at
> org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33]
> 
> at
> org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33]
> 
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [rt.jar:1.6.0_29]
> 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [rt.jar:1.6.0_29]
> 
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> [rt.jar:1.6.0_29]
> 
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> [rt.jar:1.6.0_29]
> 
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> [rt.jar:1.6.0_29]
> 
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut
> or.java:886)
> [rt.jar:1.6.0_29]
> 
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
> [rt.jar:1.6.0_29]
> 
> at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_29]
> 
> Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in
> classpath or 'solr\collection1\conf/', cwd=C:\jboss-as-7.1.1.Final\bin
> 
> at
> org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoade
> r.java:337)
> [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-06-09 12:15:33]
> 
> at
> org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.ja
> va:302)
> [solr-core-4.3.1.jar:4.3.1 1491148 - shalinmangar - 2013-

RE: Securing stored data using Lucene

2013-06-18 Thread Uwe Schindler
Hi,

> My name is Rafaela and I am just starting to work with Lucene for a project
> that involves quite a few security aspects.
> 
> I am working on an app that aims to manage data by using Lucene on a
> mobile device. However, my application will require data to be confidential
> (users will need to be logged in and have certain permissions regarding the
> data).
> I am currently trying to find a way to make this possible and still keep using
> Lucene without having a very high performance drop-down.
> 
> I was searching around and I found the patch from
> https://issues.apache.org/jira/browse/LUCENE-2228. Since it seems to be
> quite a bit old and the issue is not marked as resolved, I wanted to ask about
> the status of this. Is this something that could work for securing the
> information? Or is there another better solution already implemented?

You can still use the Directory implementation posted in this issue with minor 
modifications. Lucene always had and still has the abstract Directory interface 
and yes, you can use it, to implement a block-based encryption below Lucene's 
storage layer.

In any case, you still have to cope with the performance degrade introduced by 
this additional layer. Another idea is to make the encryption completely 
invisible to lucene by using a Linux loop device that encrypts everything 
written / read from it.

Uwe



RE: XSS Issue

2013-06-18 Thread Uwe Schindler
Hi,

I already said that you should report your issue to priv...@lucene.apache.org
The thing I wanted to say is: Everything in Solr is insecure by default, an 
additional XSS or whatever XFOOBAR does not matter at all because Solr should 
only run on a completely secured private network. So any issue like this has no 
great impact at all.

The main issue of triggering stateful GET requests can only be fixed by 
redesigning Solr's public and documented APIs. This is impossible for bug fix 
releases, also major releases need to keep backwards, so fixing all issues that 
involve triggering stateful GET requests to the public API (through whatever 
mechanism) is far out

> XSS is a large more problem than CSRF because you can execute JavaScript
> code on the user browser that can lead to a compromission.

In your original report you were telling about XSS and also in the same email 
the IMG-based links a user may get with his email. I was solely referring to 
the latter ones - which are unfixable without changing the REST API.

You were also referring to:

> Yes he can do that but as I said the same problem can occur without his 
> consent (and without a click)
> if he's on an arbitrary website which hosts a HTML IMG pointing to the 
> vulnerable page of the solr
> administrator interface (like  src="http://X.X.X.X/solr/admin/xss_vulnerable_page/> )

This is again not related to XSS at all!

I was telling you to report the XSS to the above mail address, you did not do 
that until now, so I assume you were only talking about similar things like the 
funny web page I was referring to.

Finally: The XSS issues are low priority, because the admin web interface of 
Solr should never ever be in a network where you have access from browsers that 
have access to the internet. This is why I referred to SolrSecurity Wiki page.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gregory draperi [mailto:gregory.drap...@gmail.com]
> Sent: Tuesday, June 18, 2013 11:04 PM
> To: general
> Subject: Re: XSS Issue
> 
> I speak about XSS not CSRF.
> 
> The way to fix XSS is to encode tainted data like user's inputs.
> 
> For the CSRF problem there are techniques to prevent them in REST API (cf
> OWASP or NSA document) but I understand that it may not be done due to
> impacts http://fr.slideshare.net/johnwilander/advanced-csrf-and-stateless-
> anticsrf
> http://www.nsa.gov/ia/_files/support/guidelines_implementation_rest.pdf
> 
> XSS is a large more problem than CSRF because you can execute JavaScript
> code on the user browser that can lead to a compromission.
> 
> 
> 
> 
> 2013/6/18 Uwe Schindler 
> 
> > The issue from the webpage I posted cannot be fixed because it would
> > break all clients out there, because the REST API is the official API
> > to Solr implemented by millions of clients... This is what I mean
> > with: Reinvent Solr to fix this.
> > The issue here is that it allows GET requests to modify the index. But
> > as said before, it is unfixable unless you want to break all client
> > software outside.
> >
> > If you want to prevent this, use e.g. ElasticSearch, which has a
> > better, standards conform-designed REST API (which does not allow GET
> > requests to modify anything).
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > Sent: Tuesday, June 18, 2013 6:43 PM
> > > To: general
> > > Subject: Re: XSS Issue
> > >
> > > Yes, it works because it exploits a CSRF issue and in my opinion it
> > should also
> > > be fixed like XSS vulnerabilities in the application.
> > >
> > > I think we don't understand each other.
> > >
> > > I'm going to send details to the private mailing list and I won't
> > > waste
> > your
> > > time more.
> > >
> > > Regards,
> > >
> > >
> > > 2013/6/18 Uwe Schindler 
> > >
> > > > Have fun with this web page:
> > > >
> > > > http://www.thetaphi.de/nukeyoursolrindex.html
> > > >
> > > > It really works, if you have a default Solr instance running on
> > > > your local machine on default port with default collection, and
> > > > you open this web page
> > > > -> this nukes your index. This has nothing to do with the Admin
> > interface.
> > > >
> > > > Uwe

RE: XSS Issue

2013-06-18 Thread Uwe Schindler
The issue from the webpage I posted cannot be fixed because it would break all 
clients out there, because the REST API is the official API to Solr implemented 
by millions of clients... This is what I mean with: Reinvent Solr to fix this.
The issue here is that it allows GET requests to modify the index. But as said 
before, it is unfixable unless you want to break all client software outside.

If you want to prevent this, use e.g. ElasticSearch, which has a better, 
standards conform-designed REST API (which does not allow GET requests to 
modify anything).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gregory draperi [mailto:gregory.drap...@gmail.com]
> Sent: Tuesday, June 18, 2013 6:43 PM
> To: general
> Subject: Re: XSS Issue
> 
> Yes, it works because it exploits a CSRF issue and in my opinion it should 
> also
> be fixed like XSS vulnerabilities in the application.
> 
> I think we don't understand each other.
> 
> I'm going to send details to the private mailing list and I won't waste your
> time more.
> 
> Regards,
> 
> 
> 2013/6/18 Uwe Schindler 
> 
> > Have fun with this web page:
> >
> > http://www.thetaphi.de/nukeyoursolrindex.html
> >
> > It really works, if you have a default Solr instance running on your
> > local machine on default port with default collection, and you open
> > this web page
> > -> this nukes your index. This has nothing to do with the Admin interface.
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > Sent: Tuesday, June 18, 2013 6:27 PM
> > > To: general
> > > Subject: Re: XSS Issue
> > >
> > > This is a Cross-Site Request Forgery issue (not a XSS) and should be
> > fixed by
> > > example by adding an impredictible parameter to the request.
> > >
> > > I'm going to send to priv...@lucene.apache.org what I have found.
> > >
> > > Best regards,
> > >
> > > Grégory
> > >
> > > 2013/6/18 Uwe Schindler 
> > >
> > > > Just to show this without the admin interface: Add these two
> > > > images to any web page like this:
> > > >
> > > > http://localhost:8983/solr/collection1/update?stream.body=%3Cdelete%
> > > 3E %3Cquery%3E*:*%3C/query%3E%3C/delete%3E"
> > > > />
> > > > http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/
> > > %3
> > > E"
> > > > />
> > > >
> > > > Anybody who visits this web page would nuke the index of his
> > > > running solr server on the local machine - there is not even the
> > > > admin web interface involved. Any REST API on earth has this
> > > > problem, it is not specific to Solr!
> > > >
> > > > Uwe
> > > >
> > > > -
> > > > Uwe Schindler
> > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > eMail: u...@thetaphi.de
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Uwe Schindler [mailto:u...@thetaphi.de]
> > > > > Sent: Tuesday, June 18, 2013 6:01 PM
> > > > > To: general@lucene.apache.org
> > > > > Cc: 'gregory draperi'
> > > > > Subject: RE: XSS Issue
> > > > >
> > > > > Hi,
> > > > >
> > > > > you can of course send your investigation to
> > > > > priv...@lucene.apache.org,
> > > > we
> > > > > greatly appreciate this.
> > > > > An XSS problem in the Solr Admin interface can for sure be solved
> > > > somehow,
> > > > > but would not help to make Solr secure. Without the admin interface
> > > > > you
> > > > can
> > > > > still add some image into any web page that executes a "delete whole
> > > > index
> > > > > request" on the Solr server.
> > > > >
> > > > > If you want to prevent this, you can add HTTP basic authentication
> > > > > to
> > > > your
> > > > > web container, as described in the solr wiki.
> > > > >
> > > > > In general: 

RE: XSS Issue

2013-06-18 Thread Uwe Schindler
Have fun with this web page:

http://www.thetaphi.de/nukeyoursolrindex.html

It really works, if you have a default Solr instance running on your local 
machine on default port with default collection, and you open this web page -> 
this nukes your index. This has nothing to do with the Admin interface. 

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gregory draperi [mailto:gregory.drap...@gmail.com]
> Sent: Tuesday, June 18, 2013 6:27 PM
> To: general
> Subject: Re: XSS Issue
> 
> This is a Cross-Site Request Forgery issue (not a XSS) and should be fixed by
> example by adding an impredictible parameter to the request.
> 
> I'm going to send to priv...@lucene.apache.org what I have found.
> 
> Best regards,
> 
> Grégory
> 
> 2013/6/18 Uwe Schindler 
> 
> > Just to show this without the admin interface: Add these two images to
> > any web page like this:
> >
> > http://localhost:8983/solr/collection1/update?stream.body=%3Cdelete%3E
> %3Cquery%3E*:*%3C/query%3E%3C/delete%3E"
> > />
> > http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3
> E"
> > />
> >
> > Anybody who visits this web page would nuke the index of his running
> > solr server on the local machine - there is not even the admin web
> > interface involved. Any REST API on earth has this problem, it is not
> > specific to Solr!
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: Uwe Schindler [mailto:u...@thetaphi.de]
> > > Sent: Tuesday, June 18, 2013 6:01 PM
> > > To: general@lucene.apache.org
> > > Cc: 'gregory draperi'
> > > Subject: RE: XSS Issue
> > >
> > > Hi,
> > >
> > > you can of course send your investigation to
> > > priv...@lucene.apache.org,
> > we
> > > greatly appreciate this.
> > > An XSS problem in the Solr Admin interface can for sure be solved
> > somehow,
> > > but would not help to make Solr secure. Without the admin interface
> > > you
> > can
> > > still add some image into any web page that executes a "delete whole
> > index
> > > request" on the Solr server.
> > >
> > > If you want to prevent this, you can add HTTP basic authentication
> > > to
> > your
> > > web container, as described in the solr wiki.
> > >
> > > In general: If you have e.g. an EC2 coud of solr servers, add an
> > > extra
> > security
> > > group to your cloud and limit all access from outside. Then also no
> > admin can
> > > access this.
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > >
> > > > -Original Message-
> > > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > > Sent: Tuesday, June 18, 2013 5:46 PM
> > > > To: Uwe Schindler
> > > > Cc: general
> > > > Subject: Re: XSS Issue
> > > >
> > > > Yes he can do that but as I said the same problem can occur without
> > > > his consent (and without a click) if he's on an arbitrary website
> > > > which hosts a HTML IMG pointing to the vulnerable page of the solr
> > > > administrator interface (like  > > > src="http://X.X.X.X/solr/admin/xss_vulnerable_page/> )
> > > >
> > > > I'm thankful for your quick responses despite I don't understand this
> > > > philosophy. I note the point.
> > > >
> > > > Regards,
> > > >
> > > > Grégory DRAPERI
> > > >
> > > >
> > > > 2013/6/18 Uwe Schindler 
> > > >
> > > > > He can also delete his whole index with a single click on a http
> > > > > link referring to his Solr server. This is his problem. Never click
> > > > > on links from eMail.
> > > > > Solr is, as said already, not secured at all. If you want a "secure"
> > > > > Solr server, rewrite the whole thing. The same applies to other
> > > > > Lucene based products like ElasticSearch that have no "security"
> > included.
> > > > >
> > > > > --

RE: XSS Issue

2013-06-18 Thread Uwe Schindler
Just to show this without the admin interface: Add these two images to any web 
page like this:

http://localhost:8983/solr/collection1/update?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3E";
 />
http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3E"; />

Anybody who visits this web page would nuke the index of his running solr 
server on the local machine - there is not even the admin web interface 
involved. Any REST API on earth has this problem, it is not specific to Solr!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Tuesday, June 18, 2013 6:01 PM
> To: general@lucene.apache.org
> Cc: 'gregory draperi'
> Subject: RE: XSS Issue
> 
> Hi,
> 
> you can of course send your investigation to priv...@lucene.apache.org, we
> greatly appreciate this.
> An XSS problem in the Solr Admin interface can for sure be solved somehow,
> but would not help to make Solr secure. Without the admin interface you can
> still add some image into any web page that executes a "delete whole index
> request" on the Solr server.
> 
> If you want to prevent this, you can add HTTP basic authentication to your
> web container, as described in the solr wiki.
> 
> In general: If you have e.g. an EC2 coud of solr servers, add an extra 
> security
> group to your cloud and limit all access from outside. Then also no admin can
> access this.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> 
> > -----Original Message-
> > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > Sent: Tuesday, June 18, 2013 5:46 PM
> > To: Uwe Schindler
> > Cc: general
> > Subject: Re: XSS Issue
> >
> > Yes he can do that but as I said the same problem can occur without
> > his consent (and without a click) if he's on an arbitrary website
> > which hosts a HTML IMG pointing to the vulnerable page of the solr
> > administrator interface (like  > src="http://X.X.X.X/solr/admin/xss_vulnerable_page/> )
> >
> > I'm thankful for your quick responses despite I don't understand this
> > philosophy. I note the point.
> >
> > Regards,
> >
> > Grégory DRAPERI
> >
> >
> > 2013/6/18 Uwe Schindler 
> >
> > > He can also delete his whole index with a single click on a http
> > > link referring to his Solr server. This is his problem. Never click
> > > on links from eMail.
> > > Solr is, as said already, not secured at all. If you want a "secure"
> > > Solr server, rewrite the whole thing. The same applies to other
> > > Lucene based products like ElasticSearch that have no "security" included.
> > >
> > > -
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > eMail: u...@thetaphi.de
> > >
> > >
> > > > -Original Message-
> > > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > > Sent: Tuesday, June 18, 2013 5:26 PM
> > > > To: Uwe Schindler
> > > > Cc: general
> > > > Subject: Re: XSS Issue
> > > >
> > > > Hi Uwe,
> > > >
> > > > Thank you for your quick response.
> > > >
> > > > I'm a little bit surprised because XSS is not a problem of making
> > > > solr
> > > accessible
> > > > or not to Internet because this a reflected XSS. If an administrator
> > > receives a
> > > > mail with a malicious link pointing to the solr administrator
> > > > interface
> > > and
> > > > containing a malicious payload he will execute the JavaScript if he
> > > clicks on it.
> > > >
> > > > There also others techniques that can be used to make an solr
> > > administrator
> > > > executing this link without his consent (HTML IMG TAG pointing to
> > > > the
> > > solr
> > > > administration interface and hosted on a malicious website)  and
> > > > that
> > > will
> > > > bypass network based protection.
> > > >
> > > > Regards,
> > > >
> > > > Grégory DRAPERI
> > > >
> > > >
> > > > 2013/6/18 Uwe Schindler 
> > > >
> > > > > Hi Grégory,
> > > > >

RE: XSS Issue

2013-06-18 Thread Uwe Schindler
Hi,

you can of course send your investigation to priv...@lucene.apache.org, we 
greatly appreciate this.
An XSS problem in the Solr Admin interface can for sure be solved somehow, but 
would not help to make Solr secure. Without the admin interface you can still 
add some image into any web page that executes a "delete whole index request" 
on the Solr server.

If you want to prevent this, you can add HTTP basic authentication to your web 
container, as described in the solr wiki.

In general: If you have e.g. an EC2 coud of solr servers, add an extra security 
group to your cloud and limit all access from outside. Then also no admin can 
access this.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gregory draperi [mailto:gregory.drap...@gmail.com]
> Sent: Tuesday, June 18, 2013 5:46 PM
> To: Uwe Schindler
> Cc: general
> Subject: Re: XSS Issue
> 
> Yes he can do that but as I said the same problem can occur without his
> consent (and without a click) if he's on an arbitrary website which hosts a
> HTML IMG pointing to the vulnerable page of the solr administrator interface
> (like http://X.X.X.X/solr/admin/xss_vulnerable_page/> )
> 
> I'm thankful for your quick responses despite I don't understand this
> philosophy. I note the point.
> 
> Regards,
> 
> Grégory DRAPERI
> 
> 
> 2013/6/18 Uwe Schindler 
> 
> > He can also delete his whole index with a single click on a http link
> > referring to his Solr server. This is his problem. Never click on
> > links from eMail.
> > Solr is, as said already, not secured at all. If you want a "secure"
> > Solr server, rewrite the whole thing. The same applies to other Lucene
> > based products like ElasticSearch that have no "security" included.
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > Sent: Tuesday, June 18, 2013 5:26 PM
> > > To: Uwe Schindler
> > > Cc: general
> > > Subject: Re: XSS Issue
> > >
> > > Hi Uwe,
> > >
> > > Thank you for your quick response.
> > >
> > > I'm a little bit surprised because XSS is not a problem of making
> > > solr
> > accessible
> > > or not to Internet because this a reflected XSS. If an administrator
> > receives a
> > > mail with a malicious link pointing to the solr administrator
> > > interface
> > and
> > > containing a malicious payload he will execute the JavaScript if he
> > clicks on it.
> > >
> > > There also others techniques that can be used to make an solr
> > administrator
> > > executing this link without his consent (HTML IMG TAG pointing to
> > > the
> > solr
> > > administration interface and hosted on a malicious website)  and
> > > that
> > will
> > > bypass network based protection.
> > >
> > > Regards,
> > >
> > > Grégory DRAPERI
> > >
> > >
> > > 2013/6/18 Uwe Schindler 
> > >
> > > > Hi Grégory,
> > > >
> > > > Solr should be always only listen on private networks, never make
> > > > it accessible to the internet. This is officially documented; for
> > > > more Information about this, see:
> > > > http://wiki.apache.org/solr/SolrSecurity
> > > > Solr uses HTTP as its programming API and you can do everything
> > > > Java allows via HTTP, but HTTP does not mean it must be open to
> > > > the internet. By opening a Solr server to the internet you are
> > > > somehow wrapping everything Java allows to the internet, so it is
> > > > not recommeneded. Solr also has no security features at all;
> > > > managing this is all up to the front-end, sitting on internet or 
> > > > insecure
> networks.
> > > >
> > > > There are already some issues open to limit some XSS and similar
> > access:
> > > > https://issues.apache.org/jira/browse/SOLR-4882
> > > >
> > > > Uwe
> > > >
> > > > -
> > > > Uwe Schindler
> > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > eMail: u...@thetaphi.de
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > > > Sent: Tuesday, June 18, 2013 3:13 PM
> > > > > To: general@lucene.apache.org
> > > > > Subject: XSS Issue
> > > > >
> > > > > Dear Solr project members,
> > > > >
> > > > > I think I have found a XSS (Cross-Site Scripting) issue in the 3.6.2
> > > > version of
> > > > > Solr.
> > > > >
> > > > > How can I give you more details?
> > > > >
> > > > > Regards,
> > > > >
> > > > > --
> > > > > Grégory Draperi
> > > >
> > > >
> > >
> > >
> > > --
> > > Grégory Draperi
> >
> >
> 
> 
> --
> Grégory Draperi



RE: XSS Issue

2013-06-18 Thread Uwe Schindler
He can also delete his whole index with a single click on a http link referring 
to his Solr server. This is his problem. Never click on links from eMail.
Solr is, as said already, not secured at all. If you want a "secure" Solr 
server, rewrite the whole thing. The same applies to other Lucene based 
products like ElasticSearch that have no "security" included.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gregory draperi [mailto:gregory.drap...@gmail.com]
> Sent: Tuesday, June 18, 2013 5:26 PM
> To: Uwe Schindler
> Cc: general
> Subject: Re: XSS Issue
> 
> Hi Uwe,
> 
> Thank you for your quick response.
> 
> I'm a little bit surprised because XSS is not a problem of making solr 
> accessible
> or not to Internet because this a reflected XSS. If an administrator receives 
> a
> mail with a malicious link pointing to the solr administrator interface and
> containing a malicious payload he will execute the JavaScript if he clicks on 
> it.
> 
> There also others techniques that can be used to make an solr administrator
> executing this link without his consent (HTML IMG TAG pointing to the solr
> administration interface and hosted on a malicious website)  and that will
> bypass network based protection.
> 
> Regards,
> 
> Grégory DRAPERI
> 
> 
> 2013/6/18 Uwe Schindler 
> 
> > Hi Grégory,
> >
> > Solr should be always only listen on private networks, never make it
> > accessible to the internet. This is officially documented; for more
> > Information about this, see: http://wiki.apache.org/solr/SolrSecurity
> > Solr uses HTTP as its programming API and you can do everything Java
> > allows via HTTP, but HTTP does not mean it must be open to the
> > internet. By opening a Solr server to the internet you are somehow
> > wrapping everything Java allows to the internet, so it is not
> > recommeneded. Solr also has no security features at all; managing this
> > is all up to the front-end, sitting on internet or insecure networks.
> >
> > There are already some issues open to limit some XSS and similar access:
> > https://issues.apache.org/jira/browse/SOLR-4882
> >
> > Uwe
> >
> > -
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > > -Original Message-
> > > From: gregory draperi [mailto:gregory.drap...@gmail.com]
> > > Sent: Tuesday, June 18, 2013 3:13 PM
> > > To: general@lucene.apache.org
> > > Subject: XSS Issue
> > >
> > > Dear Solr project members,
> > >
> > > I think I have found a XSS (Cross-Site Scripting) issue in the 3.6.2
> > version of
> > > Solr.
> > >
> > > How can I give you more details?
> > >
> > > Regards,
> > >
> > > --
> > > Grégory Draperi
> >
> >
> 
> 
> --
> Grégory Draperi



RE: XSS Issue

2013-06-18 Thread Uwe Schindler
Hi Grégory,

Solr should be always only listen on private networks, never make it accessible 
to the internet. This is officially documented; for more Information about 
this, see: http://wiki.apache.org/solr/SolrSecurity
Solr uses HTTP as its programming API and you can do everything Java allows via 
HTTP, but HTTP does not mean it must be open to the internet. By opening a Solr 
server to the internet you are somehow wrapping everything Java allows to the 
internet, so it is not recommeneded. Solr also has no security features at all; 
managing this is all up to the front-end, sitting on internet or insecure 
networks.

There are already some issues open to limit some XSS and similar access: 
https://issues.apache.org/jira/browse/SOLR-4882

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gregory draperi [mailto:gregory.drap...@gmail.com]
> Sent: Tuesday, June 18, 2013 3:13 PM
> To: general@lucene.apache.org
> Subject: XSS Issue
> 
> Dear Solr project members,
> 
> I think I have found a XSS (Cross-Site Scripting) issue in the 3.6.2 version 
> of
> Solr.
> 
> How can I give you more details?
> 
> Regards,
> 
> --
> Grégory Draperi



RE: Best way to construct term?

2013-05-10 Thread Uwe Schindler
Very simple:
new Term(fieldName, termText)

The reason for the extra constructor and createTerm() in Lucene 3.x and before 
was the extra cost of interning (String.intern()) the field name. In Lucene 4.0 
field names are no longer interned, because the index structure changed and 
field<->field comparisons in term enumerations is no longer needed. So just 
create a term by using the constructor.

In general Term is just a light wrapper and no longer a fundamental component 
of Lucene, it is just used for "backwards compatibility" with earlier versions 
and mainly only used for constructing Query like new TermQuery(Term), From 
the implementation point of view, in Lucene 4.x every field is like a separate 
index, the terms of each field are represented by the new class BytesRef, which 
is a slice out of a larger byte[] array containing the data of many terms of a 
field in the index.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: bbarani [mailto:bbar...@gmail.com]
> Sent: Friday, May 10, 2013 7:15 PM
> To: general@lucene.apache.org
> Subject: Best way to construct term?
> 
> Hi,
> 
> I am currently constructing a term using the below steps,
> 
> Final Static (class level): Term t=new Term(fieldName);
> 
> Inside some function(s):
> 
> t.createTerm(termText);
> 
> 
> It seems like createTerm method has been removed from Lucene 4.3.0 API, I
> just thought of checking the best / efficient way to create a Term. Can
> someone please guide me on that?
> 
> Thanks,
> BB
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Best-
> way-to-construct-term-tp4062388.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Welcome David Smiley to the PMC

2013-03-18 Thread Uwe Schindler
Welcome!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Steve Rowe [mailto:sar...@gmail.com]
> Sent: Monday, March 18, 2013 3:14 PM
> To: d...@lucene.apache.org
> Cc: general@lucene.apache.org
> Subject: Welcome David Smiley to the PMC
> 
> I'm pleased to announce that David Smiley has accepted the PMC's invitation
> to join.
> 
> Welcome David!
> 
> - Steve=



RE: Singular to plural

2013-02-07 Thread Uwe Schindler
You have to do stemming on both indexing and query side! If the query submitted 
by the user is also stemmed the plurals get singular and a result is found. The 
important rule for Lucene is: Use the same Analyzer for indexing and query 
parsing (which is in most cases true, there are some special cases, but not 
related to stemming).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: amita [mailto:amita_bhak...@persistent.co.in]
> Sent: Thursday, February 07, 2013 8:54 AM
> To: general@lucene.apache.org
> Subject: Singular to plural
> 
> Hi,
> 
> I am using Snowball Analyzer with English Stemmer. It stems the plural term
> to singular and shows the proper search result however there is problem
> with singular to plural.
> My requirement is if document title is "guest room", it should be shown in
> search result upon searching for "rooms". The terms indexed for my
> document title are "guest" and "room". Since "rooms" is not indexed for this
> document, it's not being shown currently.
> Is there any way to achieve this?
> 
> Regards,
> Amita
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Singular-
> to-plural-tp4038931.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Indexer search with filter and sort

2013-01-22 Thread Uwe Schindler
This is because you tokenized/analyzed the field. It is sorted against the 
lowest term in the field ("sundheimer" is the largest value of all terms, so 
sorted first). This is why you cannot tokenize/analyze fields to sort against.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: johnbesel [mailto:johnbe...@web.de]
> Sent: Monday, January 21, 2013 2:29 PM
> To: general@lucene.apache.org
> Subject: Re: Indexer search with filter and sort
> 
> I made  a Junit Test with some values. I sort values DESC and get following:
> 
> im sundheimer fort 1-9
> krammwinkel/surkampstr.78
> sundweg
> bray-sur-seine-str. 1
> bray-sur-seine-str. 1
> berck-sur-mer-str. 1
> 
> 
> this is not DESC, but also not ASC. what is this
> SortField field=new SortField(sortKey, SortField.STRING, true);
> List luceneSortFields = new ArrayList();
> luceneSortFields.add(field);
> 
> Sort sort = new Sort(luceneSortFields.toArray(new
> SortField[luceneSortFields.size()]));
> topDocs = indexSearcher.search(booleanQuery, Integer.MAX_VALUE, sort);
> 
> has anybody any Idea??
> what is wrong in my implementation?
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexer-
> search-with-filter-and-sort-tp4034733p4035047.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Indexer search with filter and sort

2013-01-22 Thread Uwe Schindler
You have to index the field 2 times with different names. One time for search 
(analyzed) and one time for sorting (not analyzed).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: johnbesel [mailto:johnbe...@web.de]
> Sent: Saturday, January 19, 2013 10:55 AM
> To: general@lucene.apache.org
> Subject: Indexer search with filter and sort
> 
> Hello together,
> 
> I work since 2 weeks with lucene and developed an application which put
> some values into index and get it from it.
> 
> it works very good.
> 
> now I want to sort values, which I put into index.
> I used a SortField to sort values:
> new SortField(sortKey, SortField.STRING, SortDirection.DESC);
> 
> It didn't work.
> I found in internet that when I want to use a SortField with type String, I
> should build Index with Index.NOT_ANALYZED.
> (http://blog.richeton.com/2009/05/12/lucene-sort-tips/)
> 
> I tested it, so now I could sort my values correct, BUT I could not search for
> them :(
> 
> How can I search and sort String values 
> 
> thank you for your help.
> 
> P.S.
> I use StandardAnalyzer and Lucene 3.6.
> I put values into index
> doc.add(new Field(KEY_CUSTOMER_NUMBER, customer.getKunnr() != null ?
> customer.getKunnr() : "", Store.YES, Index.ANALYZED));
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexer-
> search-with-filter-and-sort-tp4034733.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Welcome Sami Siren to the PMC

2012-12-13 Thread Uwe Schindler
Welcome Sami!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Mark Miller [mailto:markrmil...@gmail.com]
> Sent: Wednesday, December 12, 2012 9:17 PM
> To: d...@lucene.apache.org; general@lucene.apache.org
> Subject: Welcome Sami Siren to the PMC
> 
> I'm please to announce that Sami Siren has accepted the PMC's invitation to
> join.
> 
> Welcome Sami!
> 
> - Mark



RE: Lucene and Solr installation problems

2012-11-10 Thread Uwe Schindler
Lucene / Solr does definitely not work with any GCJ or other CLASSPATH-based 
JVM.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Saturday, November 10, 2012 7:11 PM
> To: general@lucene.apache.org
> Subject: Re: Lucene and Solr installation problems
> 
> On Sat, Nov 10, 2012 at 12:49 PM, David Alyea  wrote:
> >
> > Any ideas why lucene won't run on my server?
> 
> $ java -version
> java version "1.5.0"
> gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4)
> 
> I don't think Lucene will work with GCJ... you will need a fully functional 
> JVM.



RE: Welcome Alan Woodward as Lucene/Solr committer

2012-10-16 Thread Uwe Schindler
Welcome Alan!

It's good to have you on board!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Wednesday, October 17, 2012 7:37 AM
> To: d...@lucene.apache.org; Lucene mailing list
> Subject: Welcome Alan Woodward as Lucene/Solr committer
> 
> I'm pleased to announce that the Lucene PMC has voted Alan as a
> Lucene/Solr committer.
> 
> Alan has been contributing patches on various tricky stuff: positions 
> iterators,
> span queries, highlighters, codecs, and so on.
> 
> Alan: its tradition that you introduce yourself with your background.
> 
> I think your account is fully working and you should be able to add yourself 
> to
> the who we are page on the website as well.
> 
> Congratulations!
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org



RE: Custom Filter Indexing Slow

2012-09-14 Thread Uwe Schindler
The problem ist hat your transformation method needs Strings, but your 
incrementToken method also has a serious bug: It does not respect the length of 
the buffer, so it may hit additional garbage!


The easiest way to do this in lots less code and not having those bugs:

 public boolean incrementToken() throws IOException { 
if (!input.incrementToken()) {
return false;
}
final String normalizedLCcallnum = 
getLCShelfkey(charTermAttr.toString());
charTermAttr.setEmpty().append(normalizedLCcallnum);
return true;
 }

This fixes part of your performance problem: It does not 2 times convert the 
result of your transformation between char arrays, Strings,..

To further improve speed, make the method getLCShelfKey directly operatate on 
char[] and length.

Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk]
> Sent: Friday, September 14, 2012 11:58 AM
> To: general@lucene.apache.org
> Subject: Custom Filter Indexing Slow
> 
> Hi Folks,
> 
> I have a custom filter which does everything I need it to but it has reduced 
> my
> indexing speed to a crawl. Are there any methods I need to call to clear / 
> clean
> things up once my script (details below) has done it's work?
> 
> Thanks,
> 
> Luke
> 
>   public LCCNormalizeFilter(TokenStream input)
> {
> super(input);
> this.charTermAttr = addAttribute(CharTermAttribute.class);
> }
> 
> public boolean incrementToken() throws IOException {
> 
>   if (!input.incrementToken()) {
>   return false;
>   }
> 
>   char[] buffer = charTermAttr.buffer();
>   String rawLCcallnum = new String(buffer);
>   String normalizedLCcallnum = getLCShelfkey(rawLCcallnum);
>   char[] newBuffer = normalizedLCcallnum.toCharArray();
> charTermAttr.setEmpty();
> charTermAttr.copyBuffer(newBuffer, 0, newBuffer.length);
> return true;
> }=



RE: charFilter

2012-09-13 Thread Uwe Schindler
Hi,

You must *implement* the protected method correct(int offset) in your own 
charFilter, that does the following: call super.correct(offset) - (this is 
important if you chain several filters) and then return a corrected offset 
according to the transformations you did in your own charfilter. If e.g. the 
character at offset 3 corresponds to offset 5 in the filtered data, you must 
return 5 when the given offset (after calling super) is 3.

Unrelated to that: Catching the IOException and printing it to system out is 
suboptimal to implement such a filter. Just make your constructor throw 
IOException itself, so it bubbles up to Solr. In the factory you can re-throw a 
SolrException. Your code would silently index nonsense or NPE later.

In general, a CharFilter should *not* read the whole input up-front in 
constructor and then transform it, instead it should implement the read(...) 
methods and transform the input on-the-fly.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk]
> Sent: Thursday, September 13, 2012 12:43 PM
> To: general@lucene.apache.org
> Subject: RE: charFilter
> 
> Hi Folks,
> 
> I'm getting the following error after using a custom filter:
> 
> SEVERE: org.apache.solr.common.SolrException:
> org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token PR
> 2823.00 A0.20 S0.819880 exceeds length of provided text sized 15
> 
> As the error suggests, the input value is PR2823.A2S81988 (15 chars). I have
> been informed that correctOffset() method of the CharFilter class can be used
> to resolve this issue but as far as I can tell, all that does is return the 
> value - it
> doesn't set it.
> 
> I have included some details below.
> 
> Kind Regards,
> 
> Luke
> 
> In my schema I have:
> 
>  sortMissingLast="true" omitNorms="true">
> 
>class="com.test.solr.analysis.LukesTestCharFilterFactory"/>
>   
> 
> 
> 
> and the method is:
> 
> public class LukesTestCharFilterFactory extends BaseCharFilterFactory {
> 
>   public CharStream create(CharStream input) {
>   return new LukesTestCharFilter(input);
>   }
> }
> 
> public final class LukesTestCharFilter extends BaseCharFilter {  ...
>   public LukesTestCharFilter(CharStream input)  {
> super(input);
> try {
>   // Load the whole input into a string
>   StringBuilder sb = new StringBuilder();
>   char[] buf = new char[1024];
> 
>   int len;
>   while ((len = input.read(buf)) >= 0) {
>   sb.append(buf, 0, len);
>   }
> 
>   String original = sb.toString();
>   String modified = getLCShelfkey(original);
>   CharStream result = CharReader.get(new StringReader(modified));
> 
>   this.input = result;
>   this.input.correctOffset(modified.length());
>   } catch (IOException e) {
>   System.err.println("There was a problem parsing input.  Skipping.");
>   }
>   }
>  ...
> }
> =



RE: file handle leaks appearing on Index files

2012-09-13 Thread Uwe Schindler
Hi,

this is not a leak, lsof will report "deleted" files with filehandles still 
open, when at the time when the changes were committed by IndexWriter, in 
parallel another IndexReader is open, that stays on an older snapshot of the 
files. In that case, this IndexReader still uses files, not yet completely 
deleted by the file system (inode is still there, but directory entry is 
already deleted).
You have to use IndexReader.openIfChanged() or open a completely new instance 
with IndexReader.open() to get an updated view on the index after committing 
changes to IndexWriter. Don't forget to close the old IndexReader! If you don't 
do this, the older snapshot view is still references, preventing files from 
being completely deleted and disappearing from lsof.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Jiajun Chen [mailto:cjjvict...@gmail.com]
> Sent: Thursday, September 13, 2012 6:36 AM
> To: general@lucene.apache.org
> Subject: file handle leaks appearing on Index files
> 
> heapdump show the instance of IndexReader hava 164 at Thu Sep 13 12:04:00
> CST 2012.
> 
> 
> $ lsof  |grep deleted
> 
> reports the following:
> ..
> java  13436 uu 5086r  REG9,4   7573  79311898
> /var/index/full/20120910/_v68n.fdt (deleted)
> java  13436 uu 5087r  REG9,4   2340  79311970
> /var/index/full/20120910/_v68n.fdx (deleted)
> java  13436 uu 5088r  REG9,4   2058  79311887
> /var/index/full/20120910/_v68o.fdt (deleted)
> java  13436 uu 5089r  REG9,4636  79311854
> /var/index/full/20120910/_v68o.fdx (deleted)
> java  13436 uu 5090w REG9,4   8038  79312040
> /var/index/full/20120910/_v68p.fdt (deleted)
> java  13436 uu 5091r  REG9,4   2476  79312050
> /var/index/full/20120910/_v68p.fdx (deleted)
> java  13436 uu 5092r  REG9,4   7015  79312087
> /var/index/full/20120910/_v68q.fdt (deleted)
> java  13436 uu 5093r  REG9,4   2332  79312091
> /var/index/full/20120910/_v68q.fdx (deleted)
> java  13436 uu 5094r  REG9,4648  79312128
> /var/index/full/20120910/_v68r.fdt (deleted) .
> 
> $ lsof  |grep deleted |wc -l ;date
> 
> reports the following:
> 
> 494
> Wed Sep 12 23:11:40 CST 2012
> 
> 506
> Wed Sep 12 23:22:57 CST 2012
> 
> 560
> Wed Sep 12 23:34:56 CST 2012
> 
> 560
> Wed Sep 12 23:46:29 CST 2012
> 
> 560
> Wed Sep 12 23:49:56 CST 2012
> 
> 566
> Wed Sep 12 23:56:08 CST 2012
> 
> 4275
> Thu Sep 13 12:04:00 CST 2012



RE: charFilter

2012-09-12 Thread Uwe Schindler
You have to implement the correctOffset method to take care of deleted or added 
chars.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Osullivan L. [mailto:l.osulli...@swansea.ac.uk]
> Sent: Wednesday, September 12, 2012 7:25 PM
> To: general@lucene.apache.org
> Subject: charFilter
> 
> Hi Folks,
> 
> I have created a custom charFilter for use in Solr which does everything I 
> need
> it to with one exception - it kills Solr when highlighting is used.
> 
> I am modifying the input with the following:
> 
> public myCharFilter (ChearStream input) { super(input);
> 
> ...
> 
> CharStream result = CharReader.get(new StringReader(modified)); this.input =
> result
> 
> }
> 
> Is there any way of modifying the input offset to that it doesn't throw the 
> error?
> 
> Thanks,
> 
> Luke



RE: OOM with Lucene 3.6 & Jrockit

2012-09-10 Thread Uwe Schindler
Hi,

Have you read 
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html ? 
Especially the last part about configuring your system correctly for mmap is 
important. Mmap can handle index files with hundreds of Gigabytes on systems 
with less physical RAM, you just have to define the ulimit settings correctly.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Snehal Chennuru [mailto:snehal.ku...@gmail.com]
> Sent: Tuesday, September 11, 2012 12:52 AM
> To: general@lucene.apache.org
> Subject: Re: OOM with Lucene 3.6 & Jrockit
> 
> It turns out that using MMapDirectory was causing OOM exception as the index
> size was over 20GB. Changed it to use SimpleFSDirectory avoids this issue.
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/OOM-with-
> Lucene-3-6-Jrockit-tp4006487p4006747.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



[ANNOUNCE] Apache Solr 3.6.1 released

2012-07-22 Thread Uwe Schindler
22 July 2012, Apache SolrT 3.6.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.6.1.

Solr is the popular, blazing fast open source enterprise search platform
from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search.
Solr is highly scalable, providing distributed search and index replication,
and it powers the search and navigation features of many of the world's
largest internet sites.

This release is a bug fix release for version 3.6.0. It contains numerous
bug fixes, optimizations, and improvements, some of which are highlighted
below.  The release is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see
note below).

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.6.1 Release Highlights:

 * The concurrency of MMapDirectory was improved, which caused
   a performance regression in comparison to Solr 3.5.0. This affected
   users with 64bit platforms (Linux, Solaris, Windows) or those
   explicitely using MMapDirectoryFactory.

 * ReplicationHandler "maxNumberOfBackups" was fixed to work if backups are
   triggered on commit.

 * Charset problems were fixed with HttpSolrServer, caused by an upgrade to
   a new Commons HttpClient version in 3.6.0.

 * Grouping was fixed to return correct count when not all shards are
   queried in the second pass. Solr no longer throws Exception when using
   result grouping with main=true and using wt=javabin.

 * Config file replication was made less error prone.

 * Data Import Handler threading fixes.

 * Various minor bugs were fixed.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Uwe Schindler (release manager)
& all Lucene/Solr developers

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





[ANNOUNCE] Apache Lucene 3.6.1 released

2012-07-22 Thread Uwe Schindler
22 July 2012, Apache LuceneT 3.6.1 available

The Lucene PMC is pleased to announce the release of Apache Lucene 3.6.1.

Apache Lucene is a high-performance, full-featured text search engine
library written entirely in Java. It is a technology suitable for nearly
any application that requires full-text search, especially cross-platform.

This release is a bug fix release for version 3.6.0. It contains numerous
bug fixes, optimizations, and improvements, some of which are highlighted
below.  The release is available for immediate download at:
   http://lucene.apache.org/core/mirrors-core-3x-redir.html (see
note below).

See the CHANGES.txt file included with the release for a full list of
details.

Lucene 3.6.1 Release Highlights:

 * The concurrency of MMapIndexInput.clone() was improved, which caused
   a performance regression in comparison to Lucene 3.5.0.

 * MappingCharFilter was fixed to return correct final token positions.

 * QueryParser now supports +/- operators with any amount of whitespace.

 * DisjunctionMaxScorer now implements visitSubScorers().

 * Changed the visibility of Scorer#visitSubScorers() to
   public, otherwise it's impossible to implement Scorers outside
   the Lucene package. This is a small backwards break, affecting a few
   users who implemented custom Scorers.

 * Various analyzer bugs where fixed: Kuromoji to not produce invalid
   token graph due to UNK with punctuation being decompounded, invalid 
   position length in SynonymFilter, loading of Hunspell dictionaries that
   use aliasing, be consistent with closing streams when loading
   Hunspell affix files.

 * Various bugs in FST components were fixed: Offline sorter minimum
   buffer size, integer overflow in sorter, FSTCompletionLookup missed
   to close its sorter.

 * Fixed a synchronization bug in handling taxonomies in facet module.

 * Various minor bugs were fixed: BytesRef/CharsRef copy methods
   with nonzero offsets and subSequence off-by-one, TieredMergePolicy
   returned wrong-scaled floor segment setting.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Uwe Schindler (release manager)
& all Lucene/Solr developers

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Error message running: ant clean test

2012-07-19 Thread Uwe Schindler
You don't need to download asm at all. Just use a plain default ANT
installation as extracted from ANT's zip file. Also Nuke your user's ~/.ant
directory if possible. ANT does that automatiucally when building
Lucene/Solr. The problem you had was a preexisting version in a global lib
folder that should not be there. ANT always prefers global lib folders over
local ones. So it did not respect Lucene's requirements.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: jellyman [mailto:colm_r...@hotmail.com]
> Sent: Thursday, July 19, 2012 4:51 PM
> To: general@lucene.apache.org
> Subject: RE: Error message running: ant clean test
> 
> Hey Uwe,
> 
>Thanks. I found that I'm asm running version 2.2.3! I'll uninstall and
download
> something 4.0.0+
> 
>I'm working on a windows box btw. I guess I can install ant anywhere...
> 
> Thanks
> jellym
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Error-
> message-running-ant-clean-test-tp3995956p3995981.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Error message running: ant clean test

2012-07-19 Thread Uwe Schindler
Hi,

>   I got latest version and rebuilt. That changed the error message a bit
> (included below). Can you please explain your second point in a bit more
detail
> please. I'm very new to ASM (in fact I don't even know what this means).
>For example how do I know that the ~/.ant/lib folder contains an
outdated
> and old ASM versions? This is not immediately obvious to me due to my
> ignorance of Java tech. Can you hand hold me a for a bit?

That is obvious from the exception messages. ASM 4.0 completely changed the
API in a backwards-incompatible way. The only chance that this can hit you
is when you have customized your ANT installation with extension modules (I
have no idea which).

Those modules could be installed in:
~/.ant/lib
$ANT_HOME/lib

I cannot say more, look into those directories and look for asm-xxx.jar
files. If there is any version < 4.0, you cannot build Lucene with this
configuration. I would recommend to uninstall ANT, reinstall a new ANT
version (downloaded from APACHE) and clean up you ~/.ant folder.

> BUILD FAILED
> C:\trunk\build.xml:55: The following error occurred while executing this
> line:
> C:\trunk\lucene\build.xml:176: java.lang.IncompatibleClassChangeError:
class
> org.apache.lucene.validation.ForbiddenApisCheckTask$ClassSignatureLookup$1
> has interface org.objectweb.asm.ClassVisitor as super class
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(Unknown Source)
>   at
>
org.apache.tools.ant.AntClassLoader.defineClassFromData(AntClassLoader.java
> :1128)
>   at
>
org.apache.tools.ant.AntClassLoader.getClassFromStream(AntClassLoader.java:
> 1299)
>   at
>
org.apache.tools.ant.AntClassLoader.findClassInComponents(AntClassLoader.ja
> va:1355)
>   at
> org.apache.tools.ant.AntClassLoader.findClass(AntClassLoader.java:1315)
>   at
> org.apache.tools.ant.AntClassLoader.loadClass(AntClassLoader.java:1068)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at
>
org.apache.lucene.validation.ForbiddenApisCheckTask$ClassSignatureLookup. nit>(ForbiddenApisCheckTask.java:457)
>   at
> org.apache.lucene.validation.ForbiddenApisCheckTask.getClassFromClassLoade
> r(ForbiddenApisCheckTask.java:92)
>   at
> org.apache.lucene.validation.ForbiddenApisCheckTask.addSignature(Forbidden
> ApisCheckTask.java:133)
>   at
>
org.apache.lucene.validation.ForbiddenApisCheckTask.parseApiFile(ForbiddenA
> pisCheckTask.java:170)
>   at
> org.apache.lucene.validation.ForbiddenApisCheckTask.execute(ForbiddenApisC
> heckTask.java:353)
>   at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at
>
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:392)
>   at org.apache.tools.ant.Target.performTasks(Target.java:413)
>   at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at
>
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckE
> xecutor.java:38)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
>   at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:303)
>   at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221)
>   at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at
>
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at
org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
>   at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at
>
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:392)
>   at org.apache.tools.ant.Target.performTasks(Target.java:413)
>   at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at
>
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.j
a
> va:41)
>   at org.apache.

RE: Error message running: ant clean test

2012-07-19 Thread Uwe Schindler
You should update your SVN, this class is no longer used in trunk! There
seems to be some classpath + source checkout confusion.

If you still get those errors, please check if you ~/.ant/lib folder contain
outdated and old ASM versions. Maybe you installed some ANT plugin in your
classpath that ships with outdated ASM.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: jellyman [mailto:colm_r...@hotmail.com]
> Sent: Thursday, July 19, 2012 3:26 PM
> To: general@lucene.apache.org
> Subject: Error message running: ant clean test
> 
> Hi,
> 
>I have downloaded the Solr and Lucene source code. I'm running the
> command: "ant clean test" from C:\trunk and am getting hte following error
> message:
> 
> 
> BUILD FAILED
> C:\trunk\build.xml:55: The following error occurred while executing this
> line:
> C:\trunk\lucene\build.xml:180: java.lang.NoSuchMethodError:
> org.objectweb.asm.tree.ClassNode.(I)V
>   at
> org.apache.lucene.validation.ForbiddenApisCheckTask.addSignature(Forbidden
> ApisCheckTask.java:128)
>   at
>
org.apache.lucene.validation.ForbiddenApisCheckTask.parseApiFile(ForbiddenA
> pisCheckTask.java:175)
>   at
> org.apache.lucene.validation.ForbiddenApisCheckTask.execute(ForbiddenApisC
> heckTask.java:301)
>   at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at
>
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:392)
>   at org.apache.tools.ant.Target.performTasks(Target.java:413)
>   at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at
>
org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckE
> xecutor.java:38)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442)
>   at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:303)
>   at org.apache.tools.ant.taskdefs.SubAnt.execute(SubAnt.java:221)
>   at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at
>
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at
org.apache.tools.ant.taskdefs.Sequential.execute(Sequential.java:68)
>   at
> org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>   at java.lang.reflect.Method.invoke(Unknown Source)
>   at
>
org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:392)
>   at org.apache.tools.ant.Target.performTasks(Target.java:413)
>   at
> org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at
>
org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.j
a
> va:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.Main.runBuild(Main.java:811)
>   at org.apache.tools.ant.Main.startAnt(Main.java:217)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> 
> Total time: 2 minutes 7 seconds
> 
> 
> I have checked the the file is there as the method (though it is private,
should
> that matter?).
> 
> Does anyone have any ideas/suggestions/ideas to help me fix this? Would
> greatly appreciate any comments.
> 
> Struggling,
> jellyman
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Error-
> message-running-ant-clean-test-tp3995956.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Just getting started with Eclipse & lucene source code -- please help

2012-07-11 Thread Uwe Schindler
Hi,

To compile your own project based on Lucene as library, you don't need to
import the whole Lucene source code. Just download the distribution JAR
files of Lucene by adding Maven or Ivy-dependencies (lucene-core.jar) to
your project.
Downloading and installing the source distribution of Lucene is not for own
projects, it is for "developing of Lucene/Solr itself".

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: jellyman [mailto:colm_r...@hotmail.com]
> Sent: Wednesday, July 11, 2012 6:22 PM
> To: general@lucene.apache.org
> Subject: Re: Just getting started with Eclipse & lucene source code --
please
> help
> 
> Actually I'm seeing a lot of red underline errors like: "the import
> org.apache.lucene cannot be resolved"
> 
> What's going on. I pointed the project to C:\trunk\solr
> 
> Slightly worried,
> jellyman
> 
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/Just-getting-
> started-with-Eclipse-lucene-source-code-please-help-tp3994398p3994423.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: TermFreqVector cannot be resolved to a type

2012-05-15 Thread Uwe Schindler
You should fix your classpath, it may contain different Lucene versions!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Aoi Morida [mailto:xu.xum...@gmail.com]
> Sent: Monday, May 14, 2012 10:19 PM
> To: general@lucene.apache.org
> Subject: TermFreqVector cannot be resolved to a type
> 
> Hi all,
> 
> I am using lucene to do index and I want to get the term frequency vector.
> 
> I use this code:
> Directory directory = FSDirectory.getDirectory(INDEX_DIRECTORY);
> IndexReader indexReader = IndexReader.open(directory); TermFreqVector
> vector=indexReader.getTermFreqVector(1, "subject");
> 
> But eclipse always tells me that TermFreqVector cannot be resolved to a
type.
> 
> I cannot figure out what's wrong.
> 
> Regards,
> 
> Aoi
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/TermFreqVector-cannot-be-resolved-to-a-
> type-tp3983748.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: CouchDB-Lucene Integration

2012-04-24 Thread Uwe Schindler
Hi,

I have no idea, about what are you talking about. Maybe that question is
better asked on the CouchDB mailing list, as we are only providing the
infrastructure, but CouchDB is just one of many *users* of Lucene / Solr?

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Fabian Seitz [mailto:se...@fzi.de]
> Sent: Tuesday, April 24, 2012 1:39 PM
> To: general@lucene.apache.org
> Subject: CouchDB-Lucene Integration
> 
> Hi,
> 
> 
> I installed Apache CouchDB ("version":"1.2.0") on my System (Windows XP
32-
> bit) and downloaded the Lucene-files (version: 3.6.0). I tried to
integrate Lucene
> with CouchDB but I didn't manage to get it work.
> 
> I need the Lucene extension for being able to run an existing project.
I've
> already pushed the project files to CouchDB but without Lucene it won't
work.
> 
> Since I'm new to CouchDB and Lucene and I've already tried several things
I
> don't know how to move on. Can anyone help me with that?
> 
> Fabian



RE: Welcome Martijn van Groningen to the Lucene PMC

2012-02-08 Thread Uwe Schindler
Welcome!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Wednesday, February 08, 2012 2:17 AM
> To: Lucene mailing list; d...@lucene.apache.org
> Subject: Welcome Martijn van Groningen to the Lucene PMC
> 
> Hello,
> 
> I'm pleased to announce that the Lucene PMC has voted to add Martijn as a
> PMC Member.
> 
> Congratulations Martijn!
> 
> --
> lucidimagination.com



RE: indexSearcher using NumericRangeQuery doesn't gives result. Any help ?

2011-12-08 Thread Uwe Schindler
To get the value you have to enable storing the NumericField. By default it
is only indexed: new NumericField(name, Field.Store.YES, true). You can then
get back the value as string with doc.get(name) from search results - or
since Lucene 3.4 as Number instance: ((NumericField)
doc.getFieldable(name)).getNumericValue().intValue()

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: gubs [mailto:gub...@gmail.com]
> Sent: Thursday, December 08, 2011 5:47 PM
> To: general@lucene.apache.org
> Subject: RE: indexSearcher using NumericRangeQuery doesn't gives result.
Any
> help ?
> 
> HI Uwe,
> 
> Thanks for your reply. I followed the same stuffs as suggested by you in
the NF
> and NRQ object creation. Still, i can see the count of the hits in docs.
> But, not able to fetch the value successfully and print it. Do you see any
other
> ways to get the value from the docs ?
> 
> Gubs
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/indexSearcher-using-
> NumericRangeQuery-doesn-t-gives-result-Any-help-tp3569338p3570762.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: indexSearcher using NumericRangeQuery doesn't gives result. Any help ?

2011-12-08 Thread Uwe Schindler
Hi,

> I am new to lucene library. I need to write some numeric field in the doc
using
> indexwriter and using index searcher i need to search if the value range
(age >
> 40) as example.
> 
> IndexWriter Snippet below. The value is available in the doc properly.
> (Document 0>>)
> Document doc = new Document();
>   NumericField numericField = new NumericField("title",
>   Integer.parseInt(value));
>   numericField.setIntValue(Integer.parseInt(value));
>   doc.add(numericField);
>   iw.addDocument(doc);
>iw.optimize();
>   iw.close();

This code is wrong. The 2nd value passed to NumericFields ctor is a
configuration constant and not the value. Just leave the default, so use
'new NumericField("title")' - otherwise RTFM.

> indexSearcher snippet below. IndexSearcher prints the hits length
correctly.
> Having said the result is not getting printed from the doc. Any help ? I
spend so
> much time and failed to find.
> 
> Query queryParser = NumericRangeQuery.newIntRange("title", 40, 6000, true,
> true);

This one is correct and uses the default precisionStep. Because of the
mismatch between precision steps on NumericField and NumericRangeQuery, no
results are returned. For simple use cases it's better to use the default
precisionStep (so don't pass it to neither NRQ nor NF).

>   // 3. Search
>   int hitsPerPage = 10;
>   IndexSearcher indexSearcher = new IndexSearcher(index,
true);
>   TopScoreDocCollector collector =
> TopScoreDocCollector.create(
>   hitsPerPage, true);
>   indexSearcher.search(queryParser, collector);
> 
>   ScoreDoc[] hits = collector.topDocs().scoreDocs;
> 
>   // 4. Display result
>   log.info("List of docs found : " + hits.length);
>   for (int i = 0; i < hits.length; i++) {
>   int docId = hits[i].doc;
>   System.out.println(docId);
>   Document doc = indexSearcher.doc(docId);
>   log.info(i + 1 + " . " + doc.get("title"));
>   }


Uwe



RE: MaxFieldLength in Lucene 3.4

2011-12-01 Thread Uwe Schindler
Hi,

This option is a safety thing in the case you cannot trust your input data.
Maybe you suddenly tokenize a binary file and produce millions of random
tokens. In that case only maybe 1 are generated. If you input data is
trusted and text-based (e.g. read from elements in XML files,
databases,...), then you don't need this filter.

> Maybe I am too far behind the times.  I was updating some pretty old
stuff.
> I think it was written originally with Lucene 1.4.  I seem to recall that
Lucene
> v1.x had analyzers where the default was "limited", because I learned
pretty
> early that I had to set that option during indexing.  Perhaps at some
point the

The limiting option was almost always on IndexWriter, but it defaulted to
1 tokens from the beginning. The analyzers had nothing to do with this
option.

The recent change removed the token counting from IndexWriter (as it only
makes the already complicated code more unreadable) and was moved to a
simple TokenFilter because it's much more reasonable to do it during
analysis.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Joe MA [mailto:mrj...@comcast.net]
> Sent: Thursday, December 01, 2011 9:24 AM
> To: general@lucene.apache.org
> Subject: RE: MaxFieldLength in Lucene 3.4
> 
> 
> > "of course all other analyzers are unlimited"
> 
> Maybe I am too far behind the times.  I was updating some pretty old
stuff.
> I think it was written originally with Lucene 1.4.  I seem to recall that
Lucene
> v1.x had analyzers where the default was "limited", because I learned
pretty
> early that I had to set that option during indexing.  Perhaps at some
point the
> switch was made to default unlimited.  Thanks your answer clears it up.
> 
> One question - why even have this option now? Are things more efficient
with a
> limited token field?  If you know your data is 'bounded', should you
always limit
> the token field to improve performance?
> 
> Thanks!
> 
> 
> -Original Message-
> From: Uwe Schindler [mailto:u...@thetaphi.de]
> Sent: Monday, November 28, 2011 2:41 AM
> To: general@lucene.apache.org
> Subject: RE: MaxFieldLength in Lucene 3.4
> 
> Hi,
> 
> The move is simple - LimitTokenCountAnalyzer is just a wrapper around any
> other Analyzer, so I don't really understand your question - of course all
other
> analyzers are unlimited. If you have myAnalyzer with myMaxFieldLengthValue
> used before, you can change your code as follows:
> 
> Before:
> new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34,
> myAnalyzer).setFoo().setBar().setMaxFieldLength(myMaxFieldLengthValue));
> 
> After:
> new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, new
> LimitTokenCountAnalyzer(myAnalyzer,
> myMaxFieldLengthValue)).setFoo().setBar());
> 
> You only have to do this on the indexing side, on the query side
> (QueryParser) just use myAnalyzer without wrapping. With the new code, the
> responsibilities for cutting the field after a specific number of tokens
was
> moved out out the indexing code in Lucene. This is now just an analysis
feature
> not a indexing feature anymore.
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> > -Original Message-
> > From: Joe MA [mailto:mrj...@comcast.net]
> > Sent: Monday, November 28, 2011 8:09 AM
> > To: general@lucene.apache.org
> > Subject: MaxFieldLength in Lucene 3.4
> >
> > While upgrading to Lucene 3.4, I noticed the MaxFieldLength values on
the
> > indexers are deprecated.   There appears to be a LimitTokenCountAnalyzer
> > that limits the tokens - so does that mean the default for all other
> analyzers is
> > unlimited?
> >
> > Thanks in advance -
> > JM




RE: MaxFieldLength in Lucene 3.4

2011-11-27 Thread Uwe Schindler
Hi,

The move is simple - LimitTokenCountAnalyzer is just a wrapper around any
other Analyzer, so I don't really understand your question - of course all
other analyzers are unlimited. If you have myAnalyzer with
myMaxFieldLengthValue used before, you can change your code as follows:
 
Before:
new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34,
myAnalyzer).setFoo().setBar().setMaxFieldLength(myMaxFieldLengthValue));

After:
new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_34, new
LimitTokenCountAnalyzer(myAnalyzer,
myMaxFieldLengthValue)).setFoo().setBar());

You only have to do this on the indexing side, on the query side
(QueryParser) just use myAnalyzer without wrapping. With the new code, the
responsibilities for cutting the field after a specific number of tokens was
moved out out the indexing code in Lucene. This is now just an analysis
feature not a indexing feature anymore.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Joe MA [mailto:mrj...@comcast.net]
> Sent: Monday, November 28, 2011 8:09 AM
> To: general@lucene.apache.org
> Subject: MaxFieldLength in Lucene 3.4
> 
> While upgrading to Lucene 3.4, I noticed the MaxFieldLength values on the
> indexers are deprecated.   There appears to be a LimitTokenCountAnalyzer
> that limits the tokens - so does that mean the default for all other
analyzers is
> unlimited?
> 
> Thanks in advance -
> JM




RE: Index Corruption in Lucene 2.9.3

2011-11-14 Thread Uwe Schindler
Hi,

In general it's a bad idea to use Lucene on network-mounted drives. E.g.,
NFS is heavily broken with the file locking used by Lucene (NIO does not
work at all, and file-based lock support fails because directory updated may
not be visible at all times, or are visible before files are flushed -
happens-before is violated).

This can lead to index corruption; you should use local disks, especially as
they are much faster.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Nishesh [mailto:nishesh.gu...@emc.com]
> Sent: Monday, November 14, 2011 8:47 PM
> To: general@lucene.apache.org
> Subject: Index Corruption in Lucene 2.9.3
> 
> We are seeing Index corruption very often with version 2.9.3. Our indexing
> process is on Linux ( centos 5 ). Index is created on a mounted drive
which is a
> shared drive from Windows 2008 server running in a VM. We generally see
> index corruption in merge or optimize after indexing runs continuously for
> 6-7 hrs with index size reaching around 7-8GB. To reproduce the
corruptions
> sooner, I have placed the merge ( maybemerge ) call immediately after
> addIndex is called. We have a final index which is in the mounted drive,
we
> always add documents to an local intermediate index and then call add
index
> and merge to the final index.
> 
> The exception that I get -
> 
> 2011-11-11 15:19:16,929 [MC:10.10.176.148-1321045422606-204
> FS:emag_393219_0] ERROR indexer  - MergeWithFiler: MC: 393219, shard 0,
> guid
> 10.10.176.148-1321045422606-204: Error in addIndex()/kazMaybeMerge():
> /sideline/fs_393219/cas/search/index_0/primary, java.io.IOException:
> background merge hit exception: _1t:c262436
> _10.10.176.148-1321045422606-204_0:cx4000 into _1u [optimize]
> [mergeDocStores] % STACK:
>  org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2932)
>  org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2867)
>  org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2837)
>  org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3652)
> 
> com.kazeon.search.indexingengine.context.MergerContext.mergeWithFilerInde
> x(MergerContext.java:1004)
> 
> com.kazeon.search.indexingengine.context.MergerContext.mergeWithFilerInde
> x(MergerContext.java:1094)
> 
> com.kazeon.search.indexingengine.context.MergerContext.mergeWithFiler(Me
> rgerContext.java:1140)
> 
> com.kazeon.search.indexingengine.statemachine.modifiers.merger.LocalIndex
> OptimizeAndCompressModifier.modifyStateAux(LocalIndexOptimizeAndCompr
> essModifier.java:375)
> 
> com.kazeon.search.indexingengine.statemachine.modifiers.merger.LocalIndex
> OptimizeAndCompressModifier.mergeAllICs(LocalIndexOptimizeAndCompress
> Modifier.java:181)
> 
> com.kazeon.search.indexingengine.statemachine.modifiers.merger.LocalIndex
> OptimizeAndCompressModifier.modifyState(LocalIndexOptimizeAndCompress
> Modifier.java:106)
>  com.kazeon.util.scoreboard.WorkerThread.run(WorkerThread.java:31)
> 
> 
> CheckIndex command shows the following output -
> 
> NOTE: testing will be more thorough if you run java with '-
> ea:org.apache.lucene...', so assertions are enabled
> 
> Opening index @ .
> 
> Segments file=segments_23 numSegments=1 version=FORMAT_DIAGNOSTICS
> [Lucene 2.9]
>   1 of 1: name=_1t docCount=262436
> compound=true
> hasProx=true
> numFiles=1
> size (MB)=937.835
> diagnostics = {optimize=true, mergeFactor=2,
os.version=2.6.18-92.1.18.el5,
> os=Linux, mergeDocStores=true, lucene.version=2.9.3-dev, source=merge,
> os.arch=i386, java.version=1.6.0_02, java.vendor=Sun Microsystems Inc.}
> no deletions
> test: open reader.OK
> test: fields..OK [79 fields]
> test: field norms.OK [79 fields]
> test: terms, freq, prox...ERROR [term fulltext:creativecommons: doc
> 262603 >= maxDoc 262436]
> java.lang.RuntimeException: term fulltext:creativecommons: doc 262603 >=
> maxDoc 262436
> at
org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:646)
> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:530)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:903)
> test: stored fields...OK [524872 total field count; avg 2 fields
per doc]
> test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields
> per doc] FAILED
> WARNING: fixIndex() would remove reference to this segment; full
> exception:
> java.lang.RuntimeException: Term Index test failed
> at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:543)
> at org.apache.lucene.index.CheckIndex.main(Chec

Java 7u1 fixes index corruption and crash bugs in Apache Lucene Core and Apache Solr

2011-10-26 Thread Uwe Schindler
Hi users of Apache Lucene Core and Apache Solr,

Oracle released Java 7u1 [1] on October 19. According to the release notes
and tests done by the Lucene committers, all bugs reported on July 28 are
fixed in this release, so code using Porter stemmer no longer crashes with
SIGSEGV. We were not able to experience any index corruption anymore, so it
is safe to use Java 7u1 with Lucene Core and Solr.

On the same day, Oracle released Java 6u29 [2] fixing the same problems
occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or
-XX:+OptimizeStringConcat were used. Of course, you should not use
experimental JVM options like -XX:+AggressiveOpts in production
environments! We recommend everybody to upgrade to this latest version 6u29.

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Apache Lucene/Solr committers,
Uwe Schindler

[1] http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html
[2] http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Question about file format

2011-10-04 Thread Uwe Schindler
Bytes since recent Lucene versions (on or after 2.4).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Lorenzo Luengo [mailto:lolue...@gmail.com]
> Sent: Tuesday, October 04, 2011 7:23 PM
> To: general@lucene.apache.org
> Subject: Question about file format
> 
> Hi all,
> 
> I'm trying to make my own reader for lucene files, in pure python (i
haven't
> found a suitable library for windows x64). And while reading docs, a
question
> arises.
> 
> In http://lucene.apache.org/java/3_4_0/fileformats.html#String it says
that the
> string is composed of an VInt and a sequence of modified UTF-8 encoded
chars.
> My question is: That VInt is the length of the string before encoding or
is the
> number of encoded bytes?
> 
> Regards.
> 
> --
> Lorenzo Luengo C.
> Ingeniero Civil Electrónico
> Cel: 98270385



RE: Having trouble getting QueryParser to work...

2011-09-21 Thread Uwe Schindler
Hi,

I would also use NumericField + NumericRangeQuery for date fields. There are
several possibilities to incorporate those, the easiest is to use a LONG
numeric field and encode Date.getTime() [milliseconds since
1970-01-01T00:00:00.000] into it. The flexible QueryParser in contrib can
directly use those encoded fields and parser the entered query string to
dates (with the corresponding configuration). See the testcase for a usage
example.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Michael Remijan [mailto:mjremi...@yahoo.com]
> Sent: Wednesday, September 21, 2011 7:34 PM
> To: general@lucene.apache.org
> Subject: Re: Having trouble getting QueryParser to work...
> 
> OK, that worked great.
> 
> Now, when I am indexing a date, I see that DateField is deprecated so I am
> using just a Field for this...
> 
> 
>                 new Field(
>                     "day"                                            //
The name of the field
>                     ,DateTools.dateToString(day, Resolution.DAY)    // The
string to
> process
>                     ,Field.Store.NO                                //
Whether value should be
> stored in the index
>                     ,Field.Index.ANALYZED                            //
Whether the field should
> be indexed, and if so, if it should be tokenized before indexing
>                 )
> 
> so when I'm searching a date field range, I would use a TermRangeQuery
> correct?
> 
> 
> 
> 
> From: Uwe Schindler 
> To: general@lucene.apache.org; 'Michael Remijan' 
> Sent: Wednesday, September 21, 2011 11:24 AM
> Subject: RE: Having trouble getting QueryParser to work...
> 
> Lucene 3.4 has NumericField support in it's flexible QueryParser
> (contrib/queryparser). The core Queryparser has no idea about numeric
fields
> and always produces TermQuery/TermRangeQuery.
> 
> To your code: In general you should use NumericRangeQuery always and not
> TermQuery with NumericUtils (which is internal expert class) on numeric
fields.
> Just use upper=lower, speed is same and it does not wrongly rank the
results.
> 
> Uwe
> 
> -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
> 
> > -Original Message-
> > From: Michael Remijan [mailto:mjremi...@yahoo.com]
> > Sent: Wednesday, September 21, 2011 6:20 PM
> > To: general@lucene.apache.org
> > Subject: Having trouble getting QueryParser to work...
> >
> > Sorry if this is an obvious question.
> >
> > I have the following BooleanQuery set up which works fine, and when I
> > say "fine" I mean if I change the search values to values which I know
> > are not
> in the
> > index then the search returns no results. So this works OK.
> >
> >         Query
> >              dinnerQuery     = new TermQuery(new Term("entry",
> > "dinner"))
> >                ,accountIdQuery = new TermQuery(new Term("accountid",
> > NumericUtils.intToPrefixCoded(1)))
> >         ;
> >         BooleanQuery
> >             query = new BooleanQuery();
> >               query.add(accountIdQuery, Occur.MUST);
> >               query.add(dinnerQuery, Occur.MUST);
> >
> > When I run the above code I get 1 result I am expecting:
> >
> > Found 1 hits
> > HIT #1
> >   accountid = 1
> >   journalid = 1
> >   id = 306
> >
> >
> >
> > Now I've been trying to convert this to use a QueryParser expression
> > but I
> have
> > not had any luck.  Here is my first attempt.
> >
> >         String str =
> >         "accountid:1 AND entry:dinner"
> >         ;
> >         Query query
> >             = parser.parse(str);
> >
> > When I execute this, I get no results:
> >
> > Found 0 hits
> >
> > So I changed the query to use NumericUtils thinking that might be the
> > problem...
> >
> >         String str =
> >             "accountid:" +NumericUtils.intToPrefixCoded(1)+ " AND
> > entry:dinner"
> >         ;
> >         Query query
> >             = parser.parse(str);
> >
> > When I execute this, I thought I got the results I was looking for
> > because
> the
> > query found the 1 hit it was suppose to, however, during testing I
> > found I
> could
> > put any value i want in for accountid and the search will always
> > return
> the 1
> > hit.
> >
> > So I'm not sure what I'm doing wrong and why QueryParser is giving the
> results
> > it is.



RE: Having trouble getting QueryParser to work...

2011-09-21 Thread Uwe Schindler
Lucene 3.4 has NumericField support in it's flexible QueryParser
(contrib/queryparser). The core Queryparser has no idea about numeric fields
and always produces TermQuery/TermRangeQuery.

To your code: In general you should use NumericRangeQuery always and not
TermQuery with NumericUtils (which is internal expert class) on numeric
fields. Just use upper=lower, speed is same and it does not wrongly rank the
results.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael Remijan [mailto:mjremi...@yahoo.com]
> Sent: Wednesday, September 21, 2011 6:20 PM
> To: general@lucene.apache.org
> Subject: Having trouble getting QueryParser to work...
> 
> Sorry if this is an obvious question.
> 
> I have the following BooleanQuery set up which works fine, and when I say
> "fine" I mean if I change the search values to values which I know are not
in the
> index then the search returns no results. So this works OK.
> 
>         Query
>              dinnerQuery     = new TermQuery(new Term("entry", "dinner"))
>                ,accountIdQuery = new TermQuery(new Term("accountid",
> NumericUtils.intToPrefixCoded(1)))
>         ;
>         BooleanQuery
>             query = new BooleanQuery();
>               query.add(accountIdQuery, Occur.MUST);
>               query.add(dinnerQuery, Occur.MUST);
> 
> When I run the above code I get 1 result I am expecting:
> 
> Found 1 hits
> HIT #1
>   accountid = 1
>   journalid = 1
>   id = 306
> 
> 
> 
> Now I've been trying to convert this to use a QueryParser expression but I
have
> not had any luck.  Here is my first attempt.
> 
>         String str =
>         "accountid:1 AND entry:dinner"
>         ;
>         Query query
>             = parser.parse(str);
> 
> When I execute this, I get no results:
> 
> Found 0 hits
> 
> So I changed the query to use NumericUtils thinking that might be the
> problem...
> 
>         String str =
>             "accountid:" +NumericUtils.intToPrefixCoded(1)+ " AND
> entry:dinner"
>         ;
>         Query query
>             = parser.parse(str);
> 
> When I execute this, I thought I got the results I was looking for because
the
> query found the 1 hit it was suppose to, however, during testing I found I
could
> put any value i want in for accountid and the search will always return
the 1
> hit.
> 
> So I'm not sure what I'm doing wrong and why QueryParser is giving the
results
> it is.



RE: Upgrade solr

2011-09-09 Thread Uwe Schindler
Hi SN,

The latest stable Solr release is Solr 3.3, with 3.4 coming this month. Solr
4.0 and Lucene 4.0 are both not yet stable, as we are still changing APIs
and optimizing things like the index format, algorithms. You can use Solr
4.0 snapshot, if you really need the new features and report back what you
find out. But you need to know, that we might change index formats from one
day to the other, so after an upgrade to a later trunk version, your indexes
may no longer be readable and throw scary Exceptions when reading indexes.
Upgrading 3.x index formats to 4.0 unstable is always possible, just not
4.0-old to 4.0-new, so be prepared to reindex your stuff after upgrade.
Otherwise, Lucene/Solr 4.0 seems quite stable if you don't upgrade.

The Solr version Lucid Imagination ships with LucidWorks, also uses the
"unstable" 4.0 version internally, but they guarantee, that you can upgrade
between different LucidWorks versions (they provide an index upgrade tool).

About the Geo features: I am not familiar with the current status of Solr's
Geo support. Maybe somebody else can answer, if the mentioned query type
works with the stable version 3.x.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: solrnovice [mailto:manisha...@yahoo.com]
> Sent: Friday, September 09, 2011 5:21 AM
> To: general@lucene.apache.org
> Subject: RE: Upgrade solr
> 
> hi Uwe, i havent heard from Lucid, so can you please let me know what is
the
> latest stable version of SOLR. On apache's site its mentioned that SOLR
4.0 is
> not ready, but the nightly build is available ( but i dont know how stable
that
> version is ) . I want to make the geodist work, but with a stable release.
Looks
> like the geodist doesnt work in prior release of solr.
> When i mean, it doesnt work, i mean returning as a pseudo column, even if
i do
> not perform a longitude / latitude query.
> 
> I had the geodist returned when i did a City search by using SOLR4
revision
> from the nightly build. What is the latest Release candidate for SOLR. Can
you
> please point me to the right download site?
> 
> 
> thanks
> SN
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-
> solr-tp3311066p3321837.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Upgrade solr

2011-09-05 Thread Uwe Schindler
Hi,

Please ask this question to Lucid Imagination support staff. The Lucene/Solr
community is not responsible for releases of LucidWorks Enterprise and we
don't know which versions of Lucene/Solr they use.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: solrnovice [mailto:manisha...@yahoo.com]
> Sent: Monday, September 05, 2011 4:40 PM
> To: general@lucene.apache.org
> Subject: Upgrade solr
> 
> hi , We are trying to use LucidImagination ( lucidimagination.com) for our
> search, it comes with a version of SOLR. When i use geodist(), or pseudo
> columns, they dont seem to be working.
> 
> Now my question is, can i upgrade just the solr under lucid imagination
install?
> Has anybody tried that, if so can you please share some information. I
> downloaded SOLR 4 from the nightly build, but the lucidimagination,
> schema.xml, doesnt work in that solr, as its closely tied to Lucid. I had
it
> working after i removed references to Lucid's classes...etc. But I lose
some of
> Lucid's datatypes like "comma-seperated"
> analyzers...etc.  If anybody tried upgrading the Solr to the latest
version, please
> share your thoughts.
> 
> 
> thanks
> sN
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-
> solr-tp3311066p3311066.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7

2011-07-28 Thread Uwe Schindler
Hello Apache Lucene & Apache Solr users,
Hello users of other Java-based Apache projects,

Oracle released Java 7 today. Unfortunately it contains hotspot compiler
optimizations, which miscompile some loops. This can affect code of several
Apache projects. Sometimes JVMs only crash, but in several cases, results
calculated can be incorrect, leading to bugs in applications (see Hotspot
bugs 7070134 [1], 7044738 [2], 7068051 [3]).

Apache Lucene Core and Apache Solr are two Apache projects, which are
affected by these bugs, namely all versions released until today. Solr users
with the default configuration will have Java crashing with SIGSEGV as soon
as they start to index documents, as one affected part is the well-known
Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be
miscompiled, too, leading to index corruption (especially on Lucene trunk
with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]).

These problems were detected only 5 days before the official Java 7 release,
so Oracle had no time to fix those bugs, affecting also many more
applications. In response to our questions, they proposed to include the
fixes into service release u2 (eventually into service release u1, see [6]).
This means you cannot use Apache Lucene/Solr with Java 7 releases before
Update 2! If you do, please don't open bug reports, it is not the
committers' fault! At least disable loop optimizations using the
-XX:-UseLoopPredicate JVM option to not risk index corruptions.

Please note: Also Java 6 users are affected, if they use one of those JVM
options, which are not enabled by default: -XX:+OptimizeStringConcat or
-XX:+AggressiveOpts

It is strongly recommended not to use any hotspot optimization switches in
any Java version without extensive testing!

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Lucene project,
Uwe

[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134
[2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738
[3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051
[4] https://issues.apache.org/jira/browse/LUCENE-3335
[5] https://issues.apache.org/jira/browse/LUCENE-3346
[6] http://s.apache.org/StQ

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Experiencing High CPU load with 3.3

2011-07-23 Thread Uwe Schindler
Hi Alan,

We don't see such problems here on our machines, so first we need to know
some things:

Do you open a new index reader before each search and close it afterwards?
In normal Lucene usage you keep the IndexReaders open all the time and only
reopen them when your underlying index changed. If you keep the IndexReader
open, such close() calls with doPrivileged will not happen so often.

The problem in Lucene 3.3 is that the default on 64 bit Linux is to use
memory mapping, which is implemented by the operating system like a swap
file mapped into virtual address space. This makes searches very fast, as
the whole index is somehow in cache RAM and can be accessed like memory
swapped out into a swap file. This memory mapping is much more expensive
than simple opening a file and closing it after usage. If you reopen the
indexreader all the time, this expensive mapping cost is what you see all
the time.

So I would suggest you check:
a) Only open/close IndexReaders when you really need it. You can use one
IndexReader and even handle multiple parallel searches on it (it's thread
safe).
b) If you need to reopen quite often, don't use the defaults
FSDirectory.open(), but instead choose a Directory implementation yourself.
To get the behavior of Lucene 3.2, use new NIOFSDirectory(...). But on Linux
64, searches will be slower than with MMap (if you keep readers open).
c) Is your JVM somehow closed down in terms of security? These doPrivileged
mappings are only effective, when you have a web application container that
restricts web applications to run with low privileges (something like
Windows Vista/7 UAC). In this case, doPrivileged may be expensive and is not
simply a noop. For Lucene/Solr it's always recommened not to run it in
restricted Java environments. Maybe OpenBD does this - I have no idea.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Alan Williamson (aw2.0 cloud experts) [mailto:a...@aw20.co.uk]
> Sent: Saturday, July 23, 2011 9:42 AM
> To: general@lucene.apache.org
> Subject: Experiencing High CPU load with 3.3
> 
> Good day all.
> 
> We area a long term user of Lucene with it powering our search engine
inside
> of the Java CFML runtime OpenBD (http://openbd.org).  We've had no
> complaints from it whatsoever.  Until now that is.
> 
> Since moving to 3.3 we have been experiencing extremely high CPU load
> when searching against an index.  The index is only 19MB in size with all
of
> 40,000 items in it.
> 
> We have produced some very useful stack traces with CPU % load times.
> Some other data points.  This is not happening to just one machine, this
is
> happening to all the machines in the web farm.  The actual traffic on the
> machine is low, the stack traces i give you, the only threads that are
actually
> doing anything, are the ones were Lucene has gone into a tizzy.  The
> machines have all 3.5GB of memory.  Running Java 1.6.23
> 
> Thanks
> 
> alan
> 
> --- Sample #1 -
> 
> "qtp274064735-1462" prio=10 tid=0x2aaacaa62800 nid=19115 runnable
> [0x4174d000]
> java.lang.Thread.State: RUNNABLE
>   at java.security.AccessController.doPrivileged(Native Method)
>   at
> org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.jav
> a:158)
>   at
> org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDir
> ectory.java:383)
>   at
> org.apache.lucene.index.CompoundFileReader.close(CompoundFileReader.j
> ava:137)
>   - locked<0x000795b86520>  (a
> org.apache.lucene.index.CompoundFileReader)
>   at
> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:302)
>   at
> org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:359)
>   at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
> s.java:750)
>   at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
> s.java:589)
>   at
> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:355)
>   at
> org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.j
> ava:476)
>   at
>
org.apache.lucene.index.DirectoryReader.initialize(DirectoryReader.java:347
> )
>   at
> org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:130)
>   at
> org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:8
> 3)
>   at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
> s.java:750)
>   at
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
>   at
> org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
>   at

RE: Special Board Report for May 2011

2011-05-12 Thread Uwe Schindler
> > I think the current state of logging only #lucene-dev is good.
> 
> Yeah, except no one is on it other than a few people even though many of
> them (committers that is) are on #lucene

I haven't seen any technical discussions anymore on #lucene. I was
discussing with simon and mike on #lucene-dev the past days and had some
work going on for the IndexUpgrader tool and MergePolicies. The discussions
were even linked on JIRA issues.

> >  I go to #lucene-dev now. I think only IRC channel(s) that are
Lucene/Solr
> internal development in nature need to be logged -- and that's just
#lucene-
> dev. So just because you have observed many developers are on #lucene
> instead of #lucene-dev doesn't indicate a problem, so long as no design
> decisions for Lucene/Solr take place on #lucene or #solr.  #lucene and
#solr is
> where users get to ask questions, much like how it is on the user mailing
lists.
> So *if* (I don't know if it happens) internal Lucene / Solr design
decisions are
> taking place on #lucene or #solr then obviously that must stop. I'd rather
> these channels not get logged so that we can have an expectation of a
single
> place for these discussions on IRC and have that place be clear of user
> support questions.
> >
> > RE refactoring / modularization, it's good to finally see a sense of
> agreement on how to move forward.

Yeah that ok, I have nothing to add to that (and don't want anymore, it's a
soap opera).

> >> 3. Put in the automated patch checking system that Hadoop uses.
> Volunteers?  Perhaps we can knock this out at Lucene Revolution?

Who logs the stuff there? In my opinion, a meeting on Lucene-Rev is also
"private" - or is this different somehow? What's difference between a
private talk between two or three people in a bar at Lucene Revolution
without somebody writing down a log? A log can also be written if somebody
else talks with me in a private Skype chat!

Uwe



RE: Querying in Solr

2011-05-10 Thread Uwe Schindler
Hi,

http://wiki.apache.org/solr/Solrj

This is the Java client that talks either to an embedded solr server or via
http to a separate installation.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Romi [mailto:romijain3...@gmail.com]
> Sent: Tuesday, May 10, 2011 11:55 AM
> To: general@lucene.apache.org
> Subject: Re: Querying in Solr
> 
> I have to use Solr with Java
> 
> -
> Romi
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Querying-in-Solr-
> tp2922058p2922495.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: [VOTE] Create Solr TLP

2011-04-26 Thread Uwe Schindler
Hi,

Strong -1 to unmerge.

Many of you know that I was originally against the merge, but once I saw the
possibilities (especially refactoring the analysis stuff), I started to also
actively support it. I helped together with lots of other
previous-only-Lucene committers to move the svn together and rewrite parts
of the build system. After that we started to move analyzers to one place,
added Solr by factories for *all* Lucene analyzers available and vice versa
opened Solr analyzers to Lucene users. We removed lots of deprecated code
usage (which made Solr move from Lucene 2.9 to 3.0). This was especially the
work of Lucene committers who originally developed the new analysis API.
Solr had at this time not many active developers, so help from Lucene users
was welcome. So at this time, the merge helped both projects.

Problems started at that time, when some of us suggested to "remove"
features from Solr and move it to Lucene Core, means faceting (I mentioned
that first on a conference to the public, which disagreed some people),
function queries, schema support, clustering, dismax. From my point of view
as originally only a "Lucene Committer" is, that Solr was and is still
somehow dominated by one person who is afraid of losing functionality in
Solr that was originally developed by him and this could reduce the power of
Solr on the market (yes, there is also a company behind, that mainly wants
to sell consulting to Solr users [as this is of course easier to do], but
that's just a side note).

I think instead of splitting again, Lucene TLP should consider thinking
about better communication between the committers, allow different opinions
for Solr's later development and maybe vote a new PMC (as the current PMC
was simply merged from Solr and Lucene, where conflicts are programmed).

If the merged Lucene+Solr is not what the dominating person wants to have,
it is free to fork Solr from Apache (yes it's open source and you can
sell/provide a forked version to customers with only huper-duper features
that separates from Lucene, but I think this is already done -
LW-Enterprise). But if most committers here want to help to bring both
Lucene+Solr to the top of search engines, they are free to do it at the ASF
with discussion and also lots of code refactoring - we are using SVN, so we
always have the track what was done. Reverting or not reverting is only
political, nothing technical. And disagreement is also valid in an open
source project, but disagreeing people should sometimes revise their opinion
- this applies to a few more people here, I am also not always the best
discussion partner (police is the executive... *g*).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Tuesday, April 26, 2011 8:50 PM
> To: general@lucene.apache.org
> Subject: [VOTE] Create Solr TLP
> 
> A single merged project works only when people are relatively on the same
> page, and when people feel it's mutually beneficial.  Recent events make
it
> clear that that is no longer the case.
> 
> Improvements to Solr have been recently blocked and reverted on the
> grounds that the new functionality was not immediately available to
non-Solr
> users.
> This was obviously never part of the original idea (well actually - it was
> considered but rejected as too onerous).  But the past doesn't matter as
> much as the present - about how people chose to act and interpret things
> today.
> 
> https://issues.apache.org/jira/browse/SOLR-2272
> http://markmail.org/message/unrvjfudcbgqatsy
> 
> Some people warned us against merging at the start, and I guess it turns
out
> they were right.
> 
> I no longer feel it's in Solr's best interests to remain under the same
PMC as
> Lucene-Java, and I know some other committers who have said they feel like
> Lucene got the short end of the stick.  But rather than arguing about
who's
> right (maybe both?) since enough of us feel it's no longer mutually
beneficial,
> we should stop fighting and just go our separate ways.
> 
> Please VOTE to create a new Apache Solr TLP.
> 
> Here's my +1
> 
> -Yonik



RE: Unable to download Snowball !

2011-04-17 Thread Uwe Schindler
Hi,

Lucene does not contain a class sbStemmer. You have to choose one of the
following:
http://lucene.apache.org/java/3_1_0/api/contrib-analyzers/org/tartarus/snowb
all/ext/package-summary.html

Your configuration of Snowball seems to be wrong, so it tries to load a
class that does not exist (sbStemmer).

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: Neil Ghosh [mailto:neil.gh...@gmail.com]
> Sent: Sunday, April 17, 2011 3:27 PM
> To: general@lucene.apache.org; Jakarta General List
> Subject: Re: Unable to download Snowball !
> 
> I added the library in contrib directory
> (lucene-3.1.0/contrib/analyzers/common/lucene-analyzers-3.1.0.jar)
> 
> But getting the following runtime exception
> 
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.ClassNotFoundException: org.tartarus.snowball.ext.sbStemmer
> 
> On Sun, Apr 17, 2011 at 6:43 PM, Neil Ghosh  wrote:
> 
> > Rahul,  What is this link about ?
> >
> >
> > On Sun, Apr 17, 2011 at 6:42 PM, Rahul Akolkar
> wrote:
> >
> >> On Sun, Apr 17, 2011 at 8:49 AM, Neil Ghosh 
> wrote:
> >> > I have already downloaded lucene but where is the snowball analyzer
> >> > in
> >> that
> >> > ?
> >> > The one in contrib directory is throwing runtime exception
> >> >
> >> 
> >>
> >> http://lucene.apache.org/mail.html
> >>
> >> -Rahul
> >>
> >>
> >> > On Sun, Apr 17, 2011 at 6:16 PM, Rahul Akolkar
> >> >  >> >
> >> > wrote:
> >> >>
> >> >> On Sun, Apr 17, 2011 at 8:34 AM, Neil Ghosh 
> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > Unable to download snowball analyzer I am trying to use snowball
> >> >> > analyzer for my search engine but unable
> >> to
> >> >> > download the library.
> >> >> >
> >> >> 
> >> >>
> >> >> http://lucene.apache.org/
> >> >>
> >> >> -Rahul
> >> >>
> >> >>
> >> >> > Please help
> >> >> >
> >> >> > --
> >> >> > Thanks and Regards
> >> >> > Neil
> >> >> > http://neilghosh.com
> >> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and Regards
> >> > Neil
> >> > http://neilghosh.com
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and Regards
> > Neil
> > http://neilghosh.com
> >
> >
> >
> >
> 
> 
> --
> Thanks and Regards
> Neil
> http://neilghosh.com



RE: Range query rewrite incorrect

2011-03-21 Thread Uwe Schindler
That's all correct, the rewrite is done like this, what comes out after
rewrite is purely internal and may change from query to query or index to
index (as the best rewrite method is chosen from your index contents).

Your query has one problem: Lucene only works with "string" ranges that way
how you use it, so a pure numeric range without padding the terms cannot
work.

If you want real numeric queries, use NumericRangeQuery in combination with
NumericField. But QueryParser cannot handle the so you have to build the
queries from code (instantiate NumericRangeQuerxy in your code).

Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: jpogorman [mailto:jp.ogor...@documatics.com]
> Sent: Monday, March 21, 2011 6:14 PM
> To: general@lucene.apache.org
> Subject: Range query rewrite incorrect
> 
> Hello,
> 
> I have a range query like this...
> TYPE:DOCUMENT AND docSize:[0 TO 1048576]
> 
> I'm using the MultiFieldQueryParser and when I call Rewrite on this I get
back
> the following text...
> +TYPE:document +ConstantScore(QueryWrapperFilter())
> 
> This text will not return any results and I am not sure where the
> QueryWrapperFilter() portion of text has come from. It has something to do
> with the 1048576 value and similar values.
> 
> If I search using TYPE:DOCUMENT AND docSize:[0 TO 500] the rewrite will
> generate a query that works...
> +TYPE:document +ConstantScore(docSize:[0 TO 500])
> 
> Can anyone shed some light on why this is happening and how to avoid it?
> 
> Thanks,
> JP
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Range-
> query-rewrite-incorrect-tp2710807p2710807.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Extending Lucene Query Parser for XML-based queris

2011-02-28 Thread Uwe Schindler
Hi,

Maybe you are looking for that:
http://lucene.apache.org/java/3_0_3/api/contrib-xml-query-parser/index.html

By the way: Lucene 2.3.2 is very old (you referenced it in [1]).

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: aneuryzma [mailto:patrick.divia...@gmail.com]
> Sent: Monday, February 28, 2011 10:28 AM
> To: general@lucene.apache.org
> Subject: Extending Lucene Query Parser for XML-based queris
> 
> I need to extend Lucene Query Parser[1] to deal with XML-based queries.
> 
> Is there any ready implementation I can use for it?
> 
> thanks
> 
>   [1]: http://lucene.apache.org/java/2_3_2/queryparsersyntax.html
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Extending-Lucene-Query-Parser-for-
> XML-based-queris-tp2593571p2593571.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



RE: Can I use MatchAllDocsQuery and and specify terms to search in multiple fields of my documents ?

2011-02-28 Thread Uwe Schindler
A MatchAllDocsQuery simply matches all non-deleted documents exactly as the
name suggests, so what are you trying to do? It makes no sense to add a
restriction to this Query as it returns always everything. The parameter is
only used for scoring to sort the documents according to the boost factor
for the given field name.

If you want to search for multiple terms in different Fields combine one or
more TermQuery in a BooleanQuery.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: aneuryzma [mailto:patrick.divia...@gmail.com]
> Sent: Monday, February 28, 2011 2:13 PM
> To: general@lucene.apache.org
> Subject: Can I use MatchAllDocsQuery and and specify terms to search in
> multiple fields of my documents ?
> 
> Can I use MatchAllDocsQuery and and specify terms to search in multiple
> fields of my documents ?
> 
> I've seen that this query takes only 1 parameter: MatchAllDocsQuery(String
> normsField), so I was wondering if I can search for multiple terms on
multiple
> fields.
> 
> thanks
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Can-I-
> use-MatchAllDocsQuery-and-and-specify-terms-to-search-in-multiple-fields-
> of-my-documents-tp2594905p2594905.html
> Sent from the Lucene - General mailing list archive at Nabble.com.



  1   2   3   >