RE: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

2021-02-20 Thread Uwe Schindler
Congrats Jan!

 

Uwe

 

-

Uwe Schindler

Achterdiek 19, D-28357 Bremen

 <https://www.thetaphi.de> https://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Anshum Gupta  
Sent: Thursday, February 18, 2021 7:55 PM
To: Lucene Dev ; solr-user@lucene.apache.org
Subject: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!

 

Hi everyone,

 

I’d like to inform everyone that the newly formed Apache Solr PMC nominated and 
elected Jan Høydahl for the position of the Solr PMC Chair and Vice President. 
This decision was approved by the board in its February 2021 meeting.

 

Congratulations Jan! 

 

-- 

Anshum Gupta



[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)

2018-07-04 Thread Uwe Schindler
CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload
(exchange rate provider config / enum field config / TIKA parsecontext)

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.4
Solr 7.0.0 to 7.3.1

Description:
The details of this vulnerability were reported by mail to the Apache
security mailing list.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (currency.xml, enumsConfig.xml referred from schema.xml,
TIKA parsecontext config file). In addition, Xinclude functionality provided
in these config files is also affected in a similar way. The vulnerability can
be used as XXE using file/ftp/http protocols in order to read arbitrary
local files from the Solr server or the internal network. The manipulated
files can be uploaded as configsets using Solr's API, allowing to exploit
that vulnerability. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Yuyang Xiao, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12450
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




[SECURITY] CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

2018-05-21 Thread Uwe Schindler
CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload

Severity: High

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 6.0.0 to 6.6.3
Solr 7.0.0 to 7.3.0

Description:
The details of this vulnerability were reported internally by one of Apache
Solr's committers.
This vulnerability relates to an XML external entity expansion (XXE) in Solr
config files (solrconfig.xml, schema.xml, managed-schema). In addition,
Xinclude functionality provided in these config files is also affected in a
similar way. The vulnerability can be used as XXE using file/ftp/http
protocols in order to read arbitrary local files from the Solr server or the
internal network. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.4 or Solr 7.3.1 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases only allow external entities and Xincludes that
refer to local files / zookeeper resources below the Solr instance directory
(using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in
mind, that external entities and XInclude are explicitly supported to better
structure config files in large installations. Before Solr 6 this was no
problem, as config files were not accessible through the APIs.

If users are unable to upgrade to Solr 6.6.4 or Solr 7.3.1 then they are
advised to make sure that Solr instances are only used locally without access
to public internet, so the vulnerability cannot be exploited. In addition,
reverse proxies should be guarded to not allow end users to reach the
configset APIs. Please refer to [2] on how to correctly secure Solr servers.

Solr 5.x and earlier are not affected by this vulnerability; those versions
do not allow to upload configsets via the API. Nevertheless, users should
upgrade those versions as soon as possible, because there may be other ways
to inject config files through file upload functionality of the old web
interface. Those versions are no longer maintained, so no deep analysis was
done.

Credit:
Ananthesh, Ishan Chattopadhyaya

References:
[1] https://issues.apache.org/jira/browse/SOLR-12316
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




[SECURITY] CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter

2018-04-08 Thread Uwe Schindler
CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request 
parameter

Severity: Major

Vendor:
The Apache Software Foundation

Versions Affected:
Solr 1.2 to 6.6.2
Solr 7.0.0 to 7.2.1

Description:
The details of this vulnerability were reported to the Apache Security mailing 
list. 

This vulnerability relates to an XML external entity expansion (XXE) in the
`=` parameter of Solr's DataImportHandler. It can be
used as XXE using file/ftp/http protocols in order to read arbitrary local
files from the Solr server or the internal network. See [1] for more details.

Mitigation:
Users are advised to upgrade to either Solr 6.6.3 or Solr 7.3.0 releases both
of which address the vulnerability. Once upgrade is complete, no other steps
are required. Those releases disable external entities in anonymous XML files
passed through this request parameter. 

If users are unable to upgrade to Solr 6.6.3 or Solr 7.3.0 then they are
advised to disable data import handler in their solrconfig.xml file and
restart their Solr instances. Alternatively, if Solr instances are only used
locally without access to public internet, the vulnerability cannot be used
directly, so it may not be required to update, and instead reverse proxies or
Solr client applications should be guarded to not allow end users to inject
`dataConfig` request parameters. Please refer to [2] on how to correctly
secure Solr servers.

Credit:
麦 香浓郁

References:
[1] https://issues.apache.org/jira/browse/SOLR-11971
[2] https://wiki.apache.org/solr/SolrSecurity

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




FOSS Backstage Micro Summit on Monday in Berlin

2017-11-17 Thread Uwe Schindler
Hi,

It's already a bit late, but all people who are visiting Germany next week and 
want to do a short trip to Berlin: There are still slots free on the FOSS 
Backstage Micro Summit. It is a mini conference conference on everything 
related to governance, collaboration, legal and economics within the scope of 
FOSS. The main event will take place as part of berlinbuzzwords 2018. We have a 
lot of speakers invited - also from ASF!

https://www.foss-backstage.de/

Program:
https://www.foss-backstage.de/news/micro-summit-program-online-now

I hope to see you there,
Uwe

-
Uwe Schindler
uschind...@apache.org 
ASF Member, Apache Lucene PMC / Committer
Bremen, Germany
http://lucene.apache.org/




RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Yes that is fixed.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
 Sent: Monday, April 27, 2015 4:29 PM
 To: u...@tika.apache.org
 Cc: trung...@anlab.vn; solr-user@lucene.apache.org
 Subject: Re: TIKA OCR not working
 
 It should work out of the box in Solr as long as Tesseract is installed and on
 the class path. Solr had an issue with it since Tika sends 2 startDocument 
 calls,
 but I fixed that with Uwe and it was shipped in 4.10.4 and in 5.x I think?
 
 ++
 
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398) NASA Jet
 Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 
 Adjunct Associate Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 
 -Original Message-
 From: Allison, Timothy B. talli...@mitre.org
 Reply-To: u...@tika.apache.org u...@tika.apache.org
 Date: Monday, April 27, 2015 at 10:26 AM
 To: u...@tika.apache.org u...@tika.apache.org
 Cc: trung...@anlab.vn trung...@anlab.vn, solr-
 u...@lucene.apache.org
 solr-user@lucene.apache.org
 Subject: FW: TIKA OCR not working
 
 Trung,
 
 I haven't experimented with our OCR parser yet, but this should give a
 good start: https://wiki.apache.org/tika/TikaOCR .
 
 Have you installed tesseract?
 
 Tika colleagues,
   Any other tips?  What else has to be configured and how?
 
 -Original Message-
 From: trung.ht [mailto:trung...@anlab.vn]
 Sent: Friday, April 24, 2015 11:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: TIKA OCR not working
 
 HI everyone,
 
 Does anyone have the answer for this problem :)?
 
 
 I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika
 1.7,
  but it looks like it does not work. Does anyone know that TIKA OCR
  works automatically with Solr or I have to change some settings?
 
 
 Trung.
 
 
  It's not clear if OCR would happen automatically in Solr Cell, or if
  changes to Solr would be needed.
 
  For Tika OCR info, see:
 
  https://issues.apache.org/jira/browse/TIKA-93
  https://wiki.apache.org/tika/TikaOCR
 
 
 
  -- Jack Krupansky
 
  On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch 
  arafa...@gmail.com
  wrote:
 
   I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't
 seen
  it
   in use yet.
  
   Regards,
   Alex
   On 23 Apr 2015 10:24 pm, Ahmet Arslan
   iori...@yahoo.com.invalid
  wrote:
  
Hi Trung,
   
I didn't know about OCR capabilities of tika.
Someone who is familiar with sold-cell can inform us whether
this functionality is added to solr or not.
   
Ahmet
   
   
   
On Thursday, April 23, 2015 2:06 PM, trung.ht
trung...@anlab.vn
  wrote:
Hi Ahmet,
   
I used a png file, not a pdf file. From the document, I
understand
  that
solr will post the file to tika, and since tika 1.7, OCR is
 included.
  Is
there something I misunderstood.
   
Trung.
   
   
On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan
  iori...@yahoo.com.invalid
   
wrote:
   
 Hi Trung,

 solr-cell (tika) does not do OCR. It cannot exact text from
 image
  based
 pdfs.

 Ahmet



 On Thursday, April 23, 2015 7:33 AM, trung.ht
 trung...@anlab.vn
   wrote:



 Hi,

 I want to use solr to index some scanned document, after
 settings
  solr
 document with a two field content and filename, I tried to
  upload
   the
 attached file, but it seems that the content of the file is
 only
  \n \n
 \n.
 But if I used the tesseract from command line I got the result
   correctly.

 The log when solr receive my request:
 ---
 INFO  - 2015-04-23 03:49:25.941;
 org.apache.solr.update.processor.LogUpdateProcessor;
 [collection1]
 webapp=/solr path=/update/extract
 params={literal.groupid=2json.nl
=flat
 resource.name=phplNiPrsliteral.id

   
  
 
 =4commit=trueextractOnly=falseliteral.historyid=4omitHeader=tr
 ue
 lit
 eral.userid=3literal.createddate=2015-04-
 22T15:00:00Zfmap.content=c
 ont entwt=jsonliteral.filename=\\trunght\test\tesseract_3.png}

 

 The document when I check on solr admin page:
 -
 { groupid: 2, id: 4, historyid: 4, userid: 3,
  createddate:
 2015-04-22T15:00:00Z, filename:
   trunght\\test\\tesseract_3.png,
 autocomplete_text: [ trunght\\test\\tesseract_3.png ],
content: 
 \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Hi,
TIKA OCR is definitely working automatically with Solr 5.x.

It is just important to install TesseractOCR on path (which is a native tool 
that does the actual work). On Ubuntu Linux, this should be quite simple 
(apt-get install tesseract-ocr or like that). You may also need to ainstall 
additional language for better results.

Unless you are on a Turkish localized machine (which causes a bug in the JDK on 
spawning external processes) and the native tools are installed, it should work 
OOB, no configuration needed. Please also check log files.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Allison, Timothy B. [mailto:talli...@mitre.org]
 Sent: Monday, April 27, 2015 4:27 PM
 To: u...@tika.apache.org
 Cc: trung...@anlab.vn; solr-user@lucene.apache.org
 Subject: FW: TIKA OCR not working
 
 Trung,
 
 I haven't experimented with our OCR parser yet, but this should give a good
 start: https://wiki.apache.org/tika/TikaOCR .
 
 Have you installed tesseract?
 
 Tika colleagues,
   Any other tips?  What else has to be configured and how?
 
 -Original Message-
 From: trung.ht [mailto:trung...@anlab.vn]
 Sent: Friday, April 24, 2015 11:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: TIKA OCR not working
 
 HI everyone,
 
 Does anyone have the answer for this problem :)?
 
 
 I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika 1.7,
  but it looks like it does not work. Does anyone know that TIKA OCR
  works automatically with Solr or I have to change some settings?
 
 
 Trung.
 
 
  It's not clear if OCR would happen automatically in Solr Cell, or if
  changes to Solr would be needed.
 
  For Tika OCR info, see:
 
  https://issues.apache.org/jira/browse/TIKA-93
  https://wiki.apache.org/tika/TikaOCR
 
 
 
  -- Jack Krupansky
 
  On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch 
  arafa...@gmail.com
  wrote:
 
   I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't
   seen
  it
   in use yet.
  
   Regards,
   Alex
   On 23 Apr 2015 10:24 pm, Ahmet Arslan iori...@yahoo.com.invalid
  wrote:
  
Hi Trung,
   
I didn't know about OCR capabilities of tika.
Someone who is familiar with sold-cell can inform us whether this
functionality is added to solr or not.
   
Ahmet
   
   
   
On Thursday, April 23, 2015 2:06 PM, trung.ht trung...@anlab.vn
  wrote:
Hi Ahmet,
   
I used a png file, not a pdf file. From the document, I
understand
  that
solr will post the file to tika, and since tika 1.7, OCR is included.
  Is
there something I misunderstood.
   
Trung.
   
   
On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan
  iori...@yahoo.com.invalid
   
wrote:
   
 Hi Trung,

 solr-cell (tika) does not do OCR. It cannot exact text from
 image
  based
 pdfs.

 Ahmet



 On Thursday, April 23, 2015 7:33 AM, trung.ht
 trung...@anlab.vn
   wrote:



 Hi,

 I want to use solr to index some scanned document, after
 settings
  solr
 document with a two field content and filename, I tried to
  upload
   the
 attached file, but it seems that the content of the file is
 only
  \n \n
 \n.
 But if I used the tesseract from command line I got the result
   correctly.

 The log when solr receive my request:
 ---
 INFO  - 2015-04-23 03:49:25.941;
 org.apache.solr.update.processor.LogUpdateProcessor;
 [collection1] webapp=/solr path=/update/extract
 params={literal.groupid=2json.nl
=flat
 resource.name=phplNiPrsliteral.id

   
  
 
 =4commit=trueextractOnly=falseliteral.historyid=4omitHeader=true
  literal.userid=3literal.createddate=2015-04-22T15:00:00Zfmap.conten
  t=contentwt=jsonliteral.filename=\\trunght\test\tesseract_3.png}

 

 The document when I check on solr admin page:
 -
 { groupid: 2, id: 4, historyid: 4, userid: 3,
  createddate:
 2015-04-22T15:00:00Z, filename:
   trunght\\test\\tesseract_3.png,
 autocomplete_text: [ trunght\\test\\tesseract_3.png ],
content: 
 \n \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n
  \n
   \n
 \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n  \n
 \n
  \n
   ,
 _version_: 1499213034586898400 }

 ---

 Since I am a solr newbie I do not know where to look, can
 anyone
  give
   me
 an advice for where to look for error or settings to make it work.
 Thanks in advanced.

 Trung.

   
  
 
 
 



ApacheCon NA 2015 in Austin, Texas

2015-03-19 Thread Uwe Schindler
Dear Apache Lucene/Solr enthusiast,

In just a few weeks, we'll be holding ApacheCon in Austin, Texas, and we'd love 
to have you in attendance. You can save $300 on admission by registering NOW, 
since the early bird price ends on the 21st.

Register at http://s.apache.org/acna2015-reg

ApacheCon this year celebrates the 20th birthday of the Apache HTTP Server, and 
we'll have Brian Behlendorf, who started this whole thing, keynoting for us, 
and you'll have a chance to meet some of the original Apache Group, who will be 
there to celebrate with us.

We also have talks about Apache Lucene and Apache Solr in 7 tracks of great 
talks, as well as BOFs, the Apache BarCamp, project-specific hack events, and 
evening events where you can deepen your connection with the larger Apache 
community. See the full schedule at http://apacheconna2015.sched.org/

And if you have any questions, comments, or just want to hang out with us 
before and during the event, follow us on Twitter - @apachecon - or drop by 
#apachecon on the Freenode IRC network.

Hope to see you in Austin!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Reminder: FOSDEM 2015 - Open Source Search Dev Room

2014-12-03 Thread Uwe Schindler
Hello everyone,

We have extended the deadline for submissions to the FOSDEM 2015 Open Source 
Search Dev
Room to Monday, 9 December at 23:59 CET.

We are looking forward to your talk proposal!

Cheers,
Uwe

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/

 -Original Message-
 From: Uwe Schindler [mailto:uschind...@apache.org]
 Sent: Monday, November 24, 2014 9:33 AM
 To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
 u...@lucene.apache.org; gene...@lucene.apache.org
 Subject: Reminder: FOSDEM 2015 - Open Source Search Dev Room
 
 Hi,
 
 We host a Dev-Room about Open Source Search on this year's FOSDEM
 2015 (https://fosdem.org/2015/), taking place on January 31th and February
 1st, 2015, in Brussels, Belgium. There is still one more week to submit your
 talks, so hurry up and submit your talk early!
 
 Here is the full CFP as posted a few weeks ago:
 
 Search has evolved to be much more than simply full-text search. We now
 rely on “search engines” for a wide variety of functionality:
 search as navigation, search as analytics and backend for data visualization
 and sometimes, dare we say it, as a data store. The purpose of this dev room
 is to explore the new world of open source search engines: their enhanced
 functionality, new use cases, feature and architectural deep dives, and the
 position of search in relation to the wider set of software tools.
 
 We welcome proposals from folks working with or on open source search
 engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
 or technologies that heavily depend upon search (e.g.
 NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
 presentations on search algorithms, machine learning, real-world
 implementation/deployment stories and explorations of the future of
 search.
 
 Talks should be 30-60 minutes in length, including time for QA.
 
 You can submit your talks to us here:
 https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
 8G0OxSfp84A/viewform
 
 Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
 cannot guarantee we will have the opportunity to review submissions made
 after the deadline, so please submit early (and often)!
 
 Should you have any questions, you can contact the Dev Room
 organizers: opensourcesearch-devr...@lists.fosdem.org
 
 Cheers,
 LH on behalf of the Open Source Search Dev Room Program Committee*
 
 * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
 Curdt, Uwe Schindler
 
 -
 Uwe Schindler
 uschind...@apache.org
 Apache Lucene PMC Member / Committer
 Bremen, Germany
 http://lucene.apache.org/
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



Reminder: FOSDEM 2015 - Open Source Search Dev Room

2014-11-24 Thread Uwe Schindler
Hi,

We host a Dev-Room about Open Source Search on this year's FOSDEM 2015 
(https://fosdem.org/2015/), taking place on January 31th and February 1st, 
2015, in Brussels, Belgium. There is still one more week to submit your talks, 
so hurry up and submit your talk early!

Here is the full CFP as posted a few weeks ago:

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for QA.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




CFP: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
***Please forward this CFP to anyone who may be interested in participating.***

Hi,

Search has evolved to be much more than simply full-text search. We now rely on 
“search engines” for a wide variety of functionality:
search as navigation, search as analytics and backend for data visualization 
and sometimes, dare we say it, as a data store. The purpose of this dev room is 
to explore the new world of open source search engines: their enhanced 
functionality, new use cases, feature and architectural deep dives, and the 
position of search in relation to the wider set of software tools.

We welcome proposals from folks working with or on open source search engines 
(e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or 
technologies that heavily depend upon search (e.g.
NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in 
presentations on search algorithms, machine learning, real-world 
implementation/deployment stories and explorations of the future of search.

Talks should be 30-60 minutes in length, including time for QA.

You can submit your talks to us here:
https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform

Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We 
cannot guarantee we will have the opportunity to review submissions made after 
the deadline, so please submit early (and often)!

Should you have any questions, you can contact the Dev Room
organizers: opensourcesearch-devr...@lists.fosdem.org

Cheers,
LH on behalf of the Open Source Search Dev Room Program Committee*

* Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, 
Uwe Schindler

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: FOSDEM 2015 - Open Source Search Dev Room

2014-11-03 Thread Uwe Schindler
Hi,

forgot to mention:
FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See 
also: https://fosdem.org/2015/

I hope to see you there!
Uwe

 -Original Message-
 From: Uwe Schindler [mailto:uschind...@apache.org]
 Sent: Monday, November 03, 2014 1:29 PM
 To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr-
 u...@lucene.apache.org; gene...@lucene.apache.org
 Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room
 
 ***Please forward this CFP to anyone who may be interested in
 participating.***
 
 Hi,
 
 Search has evolved to be much more than simply full-text search. We now
 rely on “search engines” for a wide variety of functionality:
 search as navigation, search as analytics and backend for data visualization
 and sometimes, dare we say it, as a data store. The purpose of this dev room
 is to explore the new world of open source search engines: their enhanced
 functionality, new use cases, feature and architectural deep dives, and the
 position of search in relation to the wider set of software tools.
 
 We welcome proposals from folks working with or on open source search
 engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.)
 or technologies that heavily depend upon search (e.g.
 NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in
 presentations on search algorithms, machine learning, real-world
 implementation/deployment stories and explorations of the future of
 search.
 
 Talks should be 30-60 minutes in length, including time for QA.
 
 You can submit your talks to us here:
 https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3
 8G0OxSfp84A/viewform
 
 Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We
 cannot guarantee we will have the opportunity to review submissions made
 after the deadline, so please submit early (and often)!
 
 Should you have any questions, you can contact the Dev Room
 organizers: opensourcesearch-devr...@lists.fosdem.org
 
 Cheers,
 LH on behalf of the Open Source Search Dev Room Program Committee*
 
 * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten
 Curdt, Uwe Schindler
 
 -
 Uwe Schindler
 uschind...@apache.org
 Apache Lucene PMC Member / Committer
 Bremen, Germany
 http://lucene.apache.org/
 
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

2014-08-18 Thread Uwe Schindler
Hallo Apache Solr Users,

the Apache Lucene PMC wants to make the users of Solr aware of  the following 
issue:

Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its 
binary release tarball. This version (and all previous ones) of Apache POI are 
vulnerable to the following issues:

= CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML 
parser =
Type: Information disclosure
Description: Apache POI uses Java's XML components to parse OpenXML files 
produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that 
accept such files from end-users are vulnerable to XML External Entity (XXE) 
attacks, which allows remote attackers to bypass security restrictions and read 
arbitrary files via a crafted OpenXML document that provides an XML external 
entity declaration in conjunction with an entity reference.

= CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML 
parser =
Type: Denial of service
Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse 
OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). 
Applications that accept such files from end-users are vulnerable to XML Entity 
Expansion (XEE) attacks (XML bombs), which allows remote hackers to consume 
large amounts of CPU resources.

The Apache POI PMC released a bugfix version (3.10.1) today.

Solr users are affected by these issues, if they enable the Apache Solr 
Content Extraction Library (Solr Cell) contrib module from the folder 
contrib/extraction of the release tarball.

Users of Apache Solr are strongly advised to keep the module disabled if they 
don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can 
update the affected libraries by replacing the vulnerable JAR files in the 
distribution folder. Users of previous versions have to update their Solr 
release first, patching older versions is impossible.

To replace the vulnerable JAR files follow these steps:

- Download the Apache POI 3.10.1 binary release: 
http://poi.apache.org/download.html#POI-3.10.1
- Unzip the archive
- Delete the following files in your solr-4.X.X/contrib/extraction/lib 
folder: 
# poi-3.10-beta2.jar
# poi-ooxml-3.10-beta2.jar
# poi-ooxml-schemas-3.10-beta2.jar
# poi-scratchpad-3.10-beta2.jar
# xmlbeans-2.3.0.jar
- Copy the following files from the base folder of the Apache POI distribution 
to the solr-4.X.X/contrib/extraction/lib folder: 
# poi-3.10.1-20140818.jar
# poi-ooxml-3.10.1-20140818.jar
# poi-ooxml-schemas-3.10.1-20140818.jar
# poi-scratchpad-3.10.1-20140818.jar
- Copy xmlbeans-2.6.0.jar from POI's ooxml-lib/ folder to the 
solr-4.X.X/contrib/extraction/lib folder.
- Verify that the solr-4.X.X/contrib/extraction/lib no longer contains any 
files with version number 3.10-beta2.
- Verify that the folder contains one xmlbeans JAR file with version 2.6.0.

If you just want to disable extraction of Microsoft Office documents, delete 
the files above and don't replace them. Solr Cell will automatically detect 
this and disable Microsoft Office document extraction.

Coming versions of Apache Solr will have the updated libraries bundled.

Happy Searching and Extracting,
The Apache Lucene Developers

PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting 
these issues!

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





[ANNOUNCE] Apache Solr 4.8.0 released

2014-04-28 Thread Uwe Schindler
28 April 2014, Apache Solr™ 4.8.0 available

The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0

Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, dynamic
clustering, database integration, rich document (e.g., Word, PDF)
handling, and geospatial search.  Solr is highly scalable, providing
fault tolerant distributed search and indexing, and powers the search
and navigation features of many of the world's largest internet sites.

Solr 4.8.0 is available for immediate download at:
  http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

See the CHANGES.txt file included with the release for a full list of
details.

Solr 4.8.0 Release Highlights:

* Apache Solr now requires Java 7 or greater (recommended is
  Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions
  have known JVM bugs affecting Solr).

* Apache Solr is fully compatible with Java 8.

* fields and types tags have been deprecated from schema.xml.
  There is no longer any reason to keep them in the schema file,
  they may be safely removed. This allows intermixing of fieldType,
  field and copyField definitions if desired.

* The new {!complexphrase} query parser supports wildcards, ORs etc.
  inside Phrase Queries. 

* New Collections API CLUSTERSTATUS action reports the status of
  collections, shards, and replicas, and also lists collection
  aliases and cluster properties.
 
* Added managed synonym and stopword filter factories, which enable
  synonym and stopword lists to be dynamically managed via REST API.

* JSON updates now support nested child documents, enabling {!child}
  and {!parent} block join queries. 

* Added ExpandComponent to expand results collapsed by the
  CollapsingQParserPlugin, as well as the parent/child relationship
  of nested child documents.

* Long-running Collections API tasks can now be executed
  asynchronously; the new REQUESTSTATUS action provides status.

* Added a hl.qparser parameter to allow you to define a query parser
  for hl.q highlight queries.

* In Solr single-node mode, cores can now be created using named
  configsets.

* New DocExpirationUpdateProcessorFactory supports computing an
  expiration date for documents from the TTL expression, as well as
  automatically deleting expired documents on a periodic basis. 

Solr 4.8.0 also includes many other new features as well as numerous
optimizations and bugfixes of the corresponding Apache Lucene release.

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases.  It is possible that the mirror you are using
may not have replicated the release yet.  If that is the case, please
try another mirror.  This also goes for Maven access.

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Chair / Committer
Bremen, Germany
http://lucene.apache.org/




Attention: Lucene 4.8 and Solr 4.8 will require minimum Java 7

2014-03-12 Thread Uwe Schindler
Hi,

the Apache Lucene/Solr committers decided with a large majority on the vote to 
require Java 7 for the next minor release of Apache Lucene and Apache Solr 
(version 4.8)!
Support for Java 6 by Oracle  already ended more than a year ago and Java 8 is 
coming out in a few days.

The next release will also contain some improvements for Java 7:
- Better file handling (especially on Windows) in the directory 
implementations. Files can now be deleted on windows, although the index is 
still open - like it was always possible on Unix environments (delete on last 
close semantics).
- Speed improvements in sorting comparators: Sorting now uses Java 7's own 
comparators for integer and long sorts, which are highly optimized by the 
Hotspot VM..

If you want to stay up-to-date with Lucene and Solr, you should upgrade your 
infrastructure to Java 7. Please be aware that you must use at least use Java 
7u1.
The recommended version at the moment is Java 7u25. Later versions like 7u40, 
7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 
prerelease, which has fixed this bug. Once 7u60 is out, this will be the 
recommended version.
In addition, there is no Oracle/BEA JRockit available for Java 7, use the 
official Oracle Java 7. JRockit was never working correctly with Lucene/Solr 
(causing index corrumption), so this should not be an issue for you. Please 
also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs

Apache Lucene and Apache Solr were also heavily tested with all prerelease 
versions of Java 8, so you can also give it a try! Looking forward to the 
official Java 8 release next week - I will run my indexes with that version for 
sure!

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de





RE: solr bug feedback

2013-02-20 Thread Uwe Schindler
This is already fixed in Solr 4.1!

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de http://www.thetaphi.de/ 

eMail: u...@thetaphi.de

 

From: 虛客 [mailto:itemdet...@qq.com] 
Sent: Wednesday, February 20, 2013 11:17 AM
To: solr-user
Subject: solr bug feedback

 

solr: 3.6.1  --- Class: SolrRequestParsers ---line: 75  hava a  manual 
mistake:

“long uploadLimitKB = 1048;  // 2MB default” should  to  “long uploadLimitKB = 
2048;  // 2MB default”。

 

thinks for open source!!!



RE: How to setup SimpleFSDirectoryFactory

2012-07-23 Thread Uwe Schindler
Hi Geetha Anjali,

Lucene will not use MMapDirectoy by default on 32 bit platforms or if you
are not using a Oracle/Sun JVM. On 64 bit platforms, Lucene will use it, but
will accept the risks of segfaulting when unmapping the buffers - Lucene
does try its best to prevent this. It is a risk, but accepted by the Lucene
developers.

To come back to your issue: It is perfectly fine on Solr/Lucene to not unmap
all buffers as long as the index is open. The number of open file handles is
another discussion, but not related at all to MMap, if you are using an old
Lucene version (like 3.0.2), you should upgrade in all cases The recent one
is 3.6.1.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: geetha anjali [mailto:anjaliprabh...@gmail.com]
 Sent: Monday, July 23, 2012 4:28 AM
 Subject: Re: How to setup SimpleFSDirectoryFactory
 
 Hu Uwe,
 Thanks Wwe, Have you checked the Bug in JRE for mmapDirectory?. I was
 mentioning this, This is posted in Oracle site, and the API doc.
 They accept this as a bug, have you seen this?.
 
 MMapDirectoryhttp://lucene.apache.org/java/3_0_2/api/core/org/apache/l
 u=ene/store/MMapDirectory.htmluses
 memory-mapped IO when reading. This is a good choice if you have plenty of
 virtual memory relative to your index size, eg if you are running on a 64
bit JRE,
 or you are running on a 32 bit JRE but your index sizes are small enough
to fit
 into the virtual memory space. Java has currently the limitation of not
being
 able to unmap files from user code. The files are unmapped, when GC
releases
 the byte buffers. *Due to this
 bughttp://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038in
 Sun's JRE,
 MMapDirectory's

**IndexInput.close()*http://lucene.apache.org/java/3_0_2/api/core/org/apac
 =e/lucene/store/IndexInput.html#close%28%29
 * is unable to close the underlying OS file handle. Only when GC finally
collects
 the underlying objects, which could be quite some time later, will the
file
 handle be closed*. *This will consume additional transient disk
 usage*: on Windows, attempts to delete or overwrite the files will result
in an
 exception; on other platforms, which typically have a delete on last
close
 semantics, while such operations will succeed, the bytes are still
consuming
 space on disk. For many applications this limitation is not a problem
(e.g. if you
 have plenty of disk space, and you don't rely on overwriting files on
Windows)
 but it's still an important limitation to be aware of. This class supplies
a
 (possibly dangerous) workaround mentioned in the bug report, which may
fail
 on non-Sun JVMs. 
 
 
 Thanks,
 
 
 On Mon, Jul 23, 2012 at 4:13 AM, Uwe Schindler u...@thetaphi.de wrote:
 
  It is hopeless to talk to both of you, you don't understand virtual
memor=:
 
   I get a similar situation using Windows 2008 and Solr 3.6. Memory
   using mmap=is never released. Even if I turn off traffic and commit
   and do =
  manual
   gc= If the size of the index is 3gb then memory used will be heap +
   3=b
  of
   sha=ed used. If I use a 6gb index I get heap + 6gb.
 
  That is expected, but we are talking not about allocated physical
  memory, we are talking about allocated ADDRESS SPACE and you have 2^47
  of that on 64bit platforms. There is no physical memory wasted or
  allocated - please read the blog post a third, forth, fifth... or
  tenth time, until it is obvious. Yo= should also go back to school and
  take a course on system programming and operating system kernels.
  Every CS student gets that taught in his first year (at least in
  Germany).
 
  Java's GC has nothing to do with that - as long as the index is open,
  ADDRESS SPACE is assigned. We are talking not about memory nor Java
  heap space.
 
   If I turn off
   MMapDirectory=actory it goes back down. When is the MMap supposed to
   release memory ? It o=ly does it on JVM restart now.
 
  Can you please stop spreading nonsense about MMapDirectory with no
  knowledge behind? http://www.linuxatemyram.com/ - Also applies to
  Windows.
 
  Uwe
 
   Bill Bell
   Sent from mobile
  
  
   On Jul 22, 2012, at 6:21 AM, geetha anjali
   anjaliprabh...@gmail.com wrote:=
It happens in 3.6, for this reasons I thought of moving to solandra.
If I do a commit, the all documents are persisted with out any
issues= There is no issues  in terms of any functionality, but
only this happens i= increase in physical RAM, goes higher and
higher and sto= at maximum and i= never comes down.
   
Thanks
   
On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog goks...@gmail.com
   wrote:
   
Interesting. Which version of Solr is this? What happens if you
do a commit?
   
On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
   anjaliprabh...@gmail.com= wrote:
Hi uwe,
Great to know. We have files indexing 1/min. After 30 mins I
se= all= my physical memory say its 100 percentage
used(windows

[ANNOUNCE] Apache Solr 3.6.1 released

2012-07-22 Thread Uwe Schindler
22 July 2012, Apache SolrT 3.6.1 available

The Lucene PMC is pleased to announce the release of Apache Solr 3.6.1.

Solr is the popular, blazing fast open source enterprise search platform
from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search, dynamic clustering, database
integration, rich document (e.g., Word, PDF) handling, and geospatial
search.
Solr is highly scalable, providing distributed search and index replication,
and it powers the search and navigation features of many of the world's
largest internet sites.

This release is a bug fix release for version 3.6.0. It contains numerous
bug fixes, optimizations, and improvements, some of which are highlighted
below.  The release is available for immediate download at:
   http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see
note below).

See the CHANGES.txt file included with the release for a full list of
details.

Solr 3.6.1 Release Highlights:

 * The concurrency of MMapDirectory was improved, which caused
   a performance regression in comparison to Solr 3.5.0. This affected
   users with 64bit platforms (Linux, Solaris, Windows) or those
   explicitely using MMapDirectoryFactory.

 * ReplicationHandler maxNumberOfBackups was fixed to work if backups are
   triggered on commit.

 * Charset problems were fixed with HttpSolrServer, caused by an upgrade to
   a new Commons HttpClient version in 3.6.0.

 * Grouping was fixed to return correct count when not all shards are
   queried in the second pass. Solr no longer throws Exception when using
   result grouping with main=true and using wt=javabin.

 * Config file replication was made less error prone.

 * Data Import Handler threading fixes.

 * Various minor bugs were fixed.

Note: The Apache Software Foundation uses an extensive mirroring network for
distributing releases.  It is possible that the mirror you are using may not
have replicated the release yet.  If that is the case, please try another
mirror.  This also goes for Maven access.

Happy searching,

Uwe Schindler (release manager)
 all Lucene/Solr developers

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/





RE: RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
Hi,

It seems that both of you simply don't understand what's happening in your
operating system kernel. Please read the blog post again!

 It happens in 3.6, for this reasons I thought of moving to solandra.
 If I do a commit, the all documents are persisted with out any issues.
 There is no issues  in terms of any functionality, but only this happens
is
 increase in physical RAM, goes higher and higher and stop at maximum and
it
 never comes down.

This is perfectly fine in Windows and Linux (and any other operating
system). If an operating system would not use *all* available physical
memory it would waste costly hardware resources. Why not use resources that
are unused otherwise? As said before:

O/S kernel uses *all* available physical RAM for caching file system
accesses. The memory used for that is always reported as not free, because
it is used (very simple, right?). But if some other application wants to use
it, its free for malloc(), so it is not permanently occupied. That's always
that case, using MMapDirectory or not (same for SimpleFSDirectory or
NIOFSDirectory).

Of course, when you freshly booted your kernel, it reports free memory, but
definitely not on a server running 24/7 since weeks.

For all people who don't want to understand that, here is the easy
explanation page:
http://www.linuxatemyram.com/

   all my physical memory say its 100 percentage used(windows). On deep
   investigation found that mmap is not releasing os files handles. Do
   you find this behaviour?

One comment: The file handles are not freed as long as the index is open.
Used file handles have nothing to do with memory mapping, it's completely
unrelated to each other.

Uwe

 On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog goks...@gmail.com wrote:
 
  Interesting. Which version of Solr is this? What happens if you do a
  commit?
 
  On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
  anjaliprabh...@gmail.com
  wrote:
   Hi uwe,
   Great to know. We have files indexing 1/min. After 30 mins I see
   all my physical memory say its 100 percentage used(windows). On deep
   investigation found that mmap is not releasing os files handles. Do
   you find this behaviour?
  
   Thanks
  
   On 20 Jul 2012 14:04, Uwe Schindler u...@thetaphi.de wrote:
  
   Hi Bill,
  
   MMapDirectory uses the file system cache of your operating system,
   which
  has
   following consequences: In Linux, top  free should normally report
   only
   *few* free memory, because the O/S uses all memory not allocated by
   applications to cache disk I/O (and shows it as allocated, so having
   0%
  free
   memory is just normal on Linux and also Windows). If you have other
   applications or Lucene/Solr itself that allocate lot's of heap space
   or
   malloc() a lot, then you are reducing free physical memory, so
   reducing
  fs
   cache. This depends also on your swappiness parameter (if swappiness
   is higher, inactive processes are swapped out easier, default is 60%
   on
  linux -
   freeing more space for FS cache - the backside is of course that
   maybe in-memory structures of Lucene and other applications get pages
 out).
  
   You will only see no paging at all if all memory allocated all
  applications
   + all mmapped files fit into memory. But paging in/out the mmapped
   + Lucene
   index is much cheaper than using SimpleFSDirectory or
  NIOFSDirectory. If
   you use SimpleFS or NIO and your index is not in FS cache, it will
   also
  read
   it from physical disk again, so where is the difference. Paging is
  actually
   cheaper as no syscalls are involved.
  
   If you want as much as possible of your index in physical RAM, copy
   it to /dev/null regularily and buy more RUM :-)
  
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: uwe@thetaphi...
  
   From: Bill Bell [mailto:billnb...@gmail.com]
   Sent: Friday, July 20, 2012 5:17 AM
   Subject: Re: ...
   s=op using it? The least used memory will be removed from the OS
   automaticall=? Isee some paging. Wouldn't paging slow down the
 querying?
  
  
   My index is 10gb and every 8 hours we get most of it in shared
memory.
  The
   m=mory is 99 percent used, and that does not leave any room for
   other
   apps. =
  
   Other implications?
  
   Sent from my mobile device
   720-256-8076
  
   On Jul 19, 2012, at 9:49 A...
   H=ap space or free system RAM:
  
   
   
  http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
l
   
Uwe
   ...
use i= since you might run out of memory on large indexes right?
  
   
Here is how I got iSimpleFSDirectoryFactory to work. Just set -
Dsolr.directoryFactor...
set it=all up with a helper in solrconfig.xml...
  
   
if (Constants.WINDOWS) {
if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64...
 
 
 
  --
  Lance Norskog
  goks...@gmail.com
 



RE: How to setup SimpleFSDirectoryFactory

2012-07-22 Thread Uwe Schindler
It is hopeless to talk to both of you, you don't understand virtual memory:

 I get a similar situation using Windows 2008 and Solr 3.6. Memory using
 mmap=is never released. Even if I turn off traffic and commit and do a
manual
 gc= If the size of the index is 3gb then memory used will be heap + 3gb of
 sha=ed used. If I use a 6gb index I get heap + 6gb. 

That is expected, but we are talking not about allocated physical memory, we
are talking about allocated ADDRESS SPACE and you have 2^47 of that on 64bit
platforms. There is no physical memory wasted or allocated - please read the
blog post a third, forth, fifth... or tenth time, until it is obvious. You
should also go back to school and take a course on system programming and
operating system kernels. Every CS student gets that taught in his first
year (at least in Germany).

Java's GC has nothing to do with that - as long as the index is open,
ADDRESS SPACE is assigned. We are talking not about memory nor Java heap
space.

 If I turn off
 MMapDirectory=actory it goes back down. When is the MMap supposed to
 release memory ? It o=ly does it on JVM restart now.

Can you please stop spreading nonsense about MMapDirectory with no knowledge
behind? http://www.linuxatemyram.com/ - Also applies to Windows.

Uwe

 Bill Bell
 Sent from mobile
 
 
 On Jul 22, 2012, at 6:21 AM, geetha anjali anjaliprabh...@gmail.com
 wrote:=
  It happens in 3.6, for this reasons I thought of moving to solandra.
  If I do a commit, the all documents are persisted with out any issues.
  There is no issues  in terms of any functionality, but only this
  happens i= increase in physical RAM, goes higher and higher and stop
  at maximum and i= never comes down.
 
  Thanks
 
  On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog goks...@gmail.com
 wrote:
 
  Interesting. Which version of Solr is this? What happens if you do a
  commit?
 
  On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali
 anjaliprabh...@gmail.com= wrote:
  Hi uwe,
  Great to know. We have files indexing 1/min. After 30 mins I see
  all= my physical memory say its 100 percentage used(windows). On
  deep investigation found that mmap is not releasing os files handles.
Do
 you find this behaviour?
 
  Thanks
 
  On 20 Jul 2012 14:04, Uwe Schindler u...@thetaphi.de wrote:
 
  Hi Bill,
 
  MMapDirectory uses the file system cache of your operating system,
  which= has following consequences: In Linux, top  free should
  normally report only= *few* free memory, because the O/S uses all
  memory not allocated by applications to cache disk I/O (and shows it
  as allocated, so having 0%
  free
  memory is just normal on Linux and also Windows). If you have other
  applications or Lucene/Solr itself that allocate lot's of heap space
  or
  malloc() a lot, then you are reducing free physical memory, so
  reducing
  fs
  cache. This depends also on your swappiness parameter (if swappiness
  is higher, inactive processes are swapped out easier, default is 60%
  on
  linux -
  freeing more space for FS cache - the backside is of course that
  maybe in-memory structures of Lucene and other applications get pages
 out).
 
  You will only see no paging at all if all memory allocated all
  applications
  + all mmapped files fit into memory. But paging in/out the mmapped
  + Lucen=
  index is much cheaper than using SimpleFSDirectory or
  NIOFSDirectory. If
  you use SimpleFS or NIO and your index is not in FS cache, it will
  also
  read
  it from physical disk again, so where is the difference. Paging is
  actually
  cheaper as no syscalls are involved.
 
  If you want as much as possible of your index in physical RAM, copy
  it t= /dev/null regularily and buy more RUM :-)
 
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
  eMail: uwe@thetaphi...
 
  From: Bill Bell [mailto:billnb...@gmail.com]
  Sent: Friday, July 20, 2012 5:17 AM
  Subject: Re: ...
  s=op using it? The least used memory will be removed from the OS
  automaticall=? Isee some paging. Wouldn't paging slow down the
 queryi=g?
 
 
  My index is 10gb and every 8 hours we get most of it in shared
memory.
  The
  m=mory is 99 percent used, and that does not leave any room for
  other= apps. =
 
  Other implications?
 
  Sent from my mobile device
  720-256-8076
 
  On Jul 19, 2012, at 9:49 A...
  H=ap space or free system RAM:
 
 
 
  http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.ht
  m
  l
 
  Uwe
  ...
  use i= since you might run out of memory on large indexes right?
 
 
  Here is how I got iSimpleFSDirectoryFactory to work. Just set -
  Dsolr.directoryFactor...
  set it=all up with a helper in solrconfig.xml...
 
 
  if (Constants.WINDOWS) {
  if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64...
 
 
 
  --
  Lance Norskog
  goks...@gmail.com
 




RE: How to setup SimpleFSDirectoryFactory

2012-07-20 Thread Uwe Schindler
Hi Bill,

MMapDirectory uses the file system cache of your operating system, which has
following consequences: In Linux, top  free should normally report only
*few* free memory, because the O/S uses all memory not allocated by
applications to cache disk I/O (and shows it as allocated, so having 0% free
memory is just normal on Linux and also Windows). If you have other
applications or Lucene/Solr itself that allocate lot's of heap space or
malloc() a lot, then you are reducing free physical memory, so reducing fs
cache. This depends also on your swappiness parameter (if swappiness is
higher, inactive processes are swapped out easier, default is 60% on linux -
freeing more space for FS cache - the backside is of course that maybe
in-memory structures of Lucene and other applications get pages out).

You will only see no paging at all if all memory allocated all applications
+ all mmapped files fit into memory. But paging in/out the mmapped Lucene
index is much cheaper than using SimpleFSDirectory or NIOFSDirectory. If
you use SimpleFS or NIO and your index is not in FS cache, it will also read
it from physical disk again, so where is the difference. Paging is actually
cheaper as no syscalls are involved.

If you want as much as possible of your index in physical RAM, copy it to
/dev/null regularily and buy more RUM :-)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Bill Bell [mailto:billnb...@gmail.com]
 Sent: Friday, July 20, 2012 5:17 AM
 Subject: Re: How to setup SimpleFSDirectoryFactory
 
 Thanks. Are you saying that if we run low on memory, the MMapDirectory
will
 s=op using it? The least used memory will be removed from the OS
 automaticall=? Isee some paging. Wouldn't paging slow down the querying?
 
 My index is 10gb and every 8 hours we get most of it in shared memory. The
 m=mory is 99 percent used, and that does not leave any room for other
apps. =
 Other implications?
 
 Sent from my mobile device
 720-256-8076
 
 On Jul 19, 2012, at 9:49 AM, Uwe Schindler u...@thetaphi.de wrote:
 
  Read this, then you will see that MMapDirectory will use 0% of your Java
 H=ap space or free system RAM:
 
  http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm
  l
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: William Bell [mailto:billnb...@gmail.com]
  Sent: Tuesday, July 17, 2012 6:05 AM
  Subject: How to setup SimpleFSDirectoryFactory
 
  We all know that MMapDirectory is fastest. However we cannot always
  use i= since you might run out of memory on large indexes right?
 
  Here is how I got iSimpleFSDirectoryFactory to work. Just set -
  Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
 
  Your solrconfig.xml:
 
  directoryFactory name=DirectoryFactory
  class=${solr.directoryFactory:solr.StandardDirectoryFactory}/
 
  You can check it with http://localhost:8983/solr/admin/stats.jsp
 
  Notice that the default for Windows 64bit is MMapDirectory. Else
  NIOFSDirectory except for WIndows It would be nicer if we just
  set it=all up with a helper in solrconfig.xml...
 
  if (Constants.WINDOWS) {
  if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64BIT)
 return new MMapDirectory(path, lockFactory);
  else
 return new SimpleFSDirectory(path, lockFactory);
  } else {
 return new NIOFSDirectory(path, lockFactory);
   }
  }
 
 
 
  --
  Bill Bell
  billnb...@gmail.com
  cell 720-256-8076
 
 




RE: How to setup SimpleFSDirectoryFactory

2012-07-19 Thread Uwe Schindler
Read this, then you will see that MMapDirectory will use 0% of your Java Heap 
space or free system RAM:

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: William Bell [mailto:billnb...@gmail.com]
 Sent: Tuesday, July 17, 2012 6:05 AM
 Subject: How to setup SimpleFSDirectoryFactory
 
 We all know that MMapDirectory is fastest. However we cannot always use it
 since you might run out of memory on large indexes right?
 
 Here is how I got iSimpleFSDirectoryFactory to work. Just set -
 Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory.
 
 Your solrconfig.xml:
 
 directoryFactory name=DirectoryFactory
 class=${solr.directoryFactory:solr.StandardDirectoryFactory}/
 
 You can check it with http://localhost:8983/solr/admin/stats.jsp
 
 Notice that the default for Windows 64bit is MMapDirectory. Else
 NIOFSDirectory except for WIndows It would be nicer if we just set it all 
 up
 with a helper in solrconfig.xml...
 
 if (Constants.WINDOWS) {
  if (MMapDirectory.UNMAP_SUPPORTED  Constants.JRE_IS_64BIT)
 return new MMapDirectory(path, lockFactory);
  else
 return new SimpleFSDirectory(path, lockFactory);
  } else {
 return new NIOFSDirectory(path, lockFactory);
   }
 }
 
 
 
 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076




Java 7u1 fixes index corruption and crash bugs in Apache Lucene Core and Apache Solr

2011-10-26 Thread Uwe Schindler
Hi users of Apache Lucene Core and Apache Solr,

Oracle released Java 7u1 [1] on October 19. According to the release notes
and tests done by the Lucene committers, all bugs reported on July 28 are
fixed in this release, so code using Porter stemmer no longer crashes with
SIGSEGV. We were not able to experience any index corruption anymore, so it
is safe to use Java 7u1 with Lucene Core and Solr.

On the same day, Oracle released Java 6u29 [2] fixing the same problems
occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or
-XX:+OptimizeStringConcat were used. Of course, you should not use
experimental JVM options like -XX:+AggressiveOpts in production
environments! We recommend everybody to upgrade to this latest version 6u29.

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Apache Lucene/Solr committers,
Uwe Schindler

[1] http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html
[2] http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7

2011-07-28 Thread Uwe Schindler
Hello Apache Lucene  Apache Solr users,
Hello users of other Java-based Apache projects,

Oracle released Java 7 today. Unfortunately it contains hotspot compiler
optimizations, which miscompile some loops. This can affect code of several
Apache projects. Sometimes JVMs only crash, but in several cases, results
calculated can be incorrect, leading to bugs in applications (see Hotspot
bugs 7070134 [1], 7044738 [2], 7068051 [3]).

Apache Lucene Core and Apache Solr are two Apache projects, which are
affected by these bugs, namely all versions released until today. Solr users
with the default configuration will have Java crashing with SIGSEGV as soon
as they start to index documents, as one affected part is the well-known
Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be
miscompiled, too, leading to index corruption (especially on Lucene trunk
with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]).

These problems were detected only 5 days before the official Java 7 release,
so Oracle had no time to fix those bugs, affecting also many more
applications. In response to our questions, they proposed to include the
fixes into service release u2 (eventually into service release u1, see [6]).
This means you cannot use Apache Lucene/Solr with Java 7 releases before
Update 2! If you do, please don't open bug reports, it is not the
committers' fault! At least disable loop optimizations using the
-XX:-UseLoopPredicate JVM option to not risk index corruptions.

Please note: Also Java 6 users are affected, if they use one of those JVM
options, which are not enabled by default: -XX:+OptimizeStringConcat or
-XX:+AggressiveOpts

It is strongly recommended not to use any hotspot optimization switches in
any Java version without extensive testing!

In case you upgrade to Java 7, remember that you may have to reindex, as the
unicode version shipped with Java 7 changed and tokenization behaves
differently (e.g. lowercasing). For more information, read
JRE_VERSION_MIGRATION.txt in your distribution package!

On behalf of the Lucene project,
Uwe

[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134
[2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738
[3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051
[4] https://issues.apache.org/jira/browse/LUCENE-3335
[5] https://issues.apache.org/jira/browse/LUCENE-3346
[6] http://s.apache.org/StQ

-
Uwe Schindler
uschind...@apache.org 
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/




RE: Solr 3.1 / Java 1.5: Exception regarding analyzer implementation

2011-05-10 Thread Uwe Schindler
Hi,

 On 09.05.11 11:04, Martin Jansen wrote:
  I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5
  running in Java 1.5.  It fails with the following exception on start-up:
 
  java.lang.AssertionError: Analyzer implementation classes or at least
  their tokenStream() and reusableTokenStream() implementations must
 be
  final at
  org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57)
 
 In the meantime I solved the issue by installing Java 1.6.  Works without
a
 problem now, but I'm wondering if Solr 3.1 is intentionally incompatible
to
 Java 1.5 or if if happened by mistake.

Solr 3.1 is compatible with Java 1.5 and runs fine with that. The exception
you are seeing should not happen for Analyzers that are shipped with
Solr/Lucene, they can only happen if you wrote your own
Analyzer/TokenStreams that are not declared final as requested. In that case
the error will also happen with Java 6.

BUT: This is only an assertion to make development and debugging easier. The
assertions should not run in production mode, as they may affect performance
(seriously)! You should check you java command line for -ea parameters and
remove them on production.

The reason why this assert hits you in one of your tomcat installations
could also be related to some instrumentation tools you have enabled in this
tomcat. Lot's of instrumentation tools may dynamically change class bytecode
and e.g. make them unfinal. In that case the assertion of course fails (with
assertions enabled). Before saying Solr 3.1 is not compatible with Java 1.5:

- Disable assertions in production (by removing -ea command line parameters,
see http://download.oracle.com/javase/1.4.2/docs/guide/lang/assert.html)
- Check your configuration if you have some instrumentation enabled.

Both of the above points may not affect you on the other server that runs
fine with Java 6.

Uwe