Efficient title sorting on large result sets.

2011-11-21 Thread Andrew Ingram
Hi everyone, We have a large product catalogue (currently 9 million, but soon to inflate to around 25 million) with each product have a unicode title. We're offering the facility to sort by title, but often within quite large result sets, eg 1 million fiction books (we are correctly using

handling bad query input

2011-11-21 Thread Alan Miller
I'm new to SolR and just got things working. I can query my index retrieve JSON results via: HTTP GET using: wt=json and q=num_cpu parameters: e.g.: http://127.0.0.1:8080/solr/select?indent=onversion=2.2q=num_cpu%3A16fq=start=0rows=10fl=*%2Cscoreqt=wt=jsonexplainOther=debugQuery=on When the

Report about Solr and multilingual Thesaurus

2011-11-21 Thread Bernd Fehling
Dear list, just in case you are planning to integrate or combine a thesaurus with Solr the following report might help you. BASE - Solr and the multilingual EuroVoc Thesaurus http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html In brief: It explains how a working solution is possible

Re: wild card search and lower-casing

2011-11-21 Thread Erick Erickson
It may be. The tricky bit is that there is a constant governing the behavior of this that restricts it to 3.6 and above. You'll have to change it after applying the patch for this to work for you. Should be trivial, I'll leave a note in the code about this, look for SOLR-2438 in the 3x code line

Re: Solr Performance/Architecture

2011-11-21 Thread Shawn Heisey
On 11/21/2011 12:41 AM, Husain, Yavar wrote: Number of rows in SQL Table (Indexed till now using Solr): 1 million Total Size of Data in the table: 4GB Total Index Size: 3.5 GB Total Number of Rows that I have to index: 20 Million (approximately 100 GB Data) and growing What is the best

Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
Erick, Need your help on this. Waiting for resolution. Please help ... -- View this message in context: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: TikaEntityProcessor not working?

2011-11-21 Thread Erick Erickson
Sorry, but I don't really have that info. Erick On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj kumar.an...@gmail.com wrote: Erick,          Need your help on this. Waiting for resolution. Please help ... -- View this message in context:

Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
So where can i get some information on this issue, Can you please help ? On Mon, Nov 21, 2011 at 8:17 PM, Erick Erickson [via Lucene] ml-node+s472066n3524905...@n3.nabble.com wrote: Sorry, but I don't really have that info. Erick On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj [hidden

Re: TikaEntityProcessor not working?

2011-11-21 Thread Gora Mohanty
On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj kumar.an...@gmail.com wrote: So where can i get some information on this issue, Can you please help ? Have you tried simple things like searching Google, using the Tika site, and, failing these, asking on a Tika-specific mailing list? No offence, but

RE: Efficient title sorting on large result sets.

2011-11-21 Thread Young, Cody
Hi Andrew, When you request a sort on a field, Lucene stores every unique value in a field cache, which stays in ram. If you have a large index and you're sorting on a Unicode string field, this can be very memory intensive. The way that I've solved this in the past is to make a field

help with optimizing

2011-11-21 Thread Michael Long
We're trying to limit disk space when we optimize since we often hit out of disk space errors. We plan to add more disks but in the meantime I am pursing a software solution... in the past we have done multiple passes by looking at the number of segments and then optimizing down like 16, 8, 4,

Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks ! My business requirements have changed a bit. We need one year rolling data in Production. The index size for the same comes to approximately 200 - 220 GB. I am planning to address this using Solr distributed search as follows. 1. Whole index to be split up between 3 shards, with 3

Re: Architecture and Capacity planning for large Solr index

2011-11-21 Thread Rahul Warawdekar
Thanks Otis ! Please ignore my earlier email which does not have all the information. My business requirements have changed a bit. We now need one year rolling data in Production, with the following details - Number of records - 1.2 million - Solr index size for these records comes to

RE: Solr filterCache size settings...

2011-11-21 Thread Andrew Lundgren
Thank you for your reply. One clarification, is the maxdocs the max docs in the set, or the matched docs from the set? If there are 1000 docs and 19 of them match, is the maxdocs 1000, or 19? -- Andrew -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent:

Re: Solr filterCache size settings...

2011-11-21 Thread Markus Jelsma
Each fq will create a bitmap that is bounded by (maxdocs / 8) bytes. You can think of the entries in the fiterCache as a map where the key is the filter query you specify and the value is the aforementioned bitmap. The number of entries specified in the config file is the number of entries

Re: Solr filterCache size settings...

2011-11-21 Thread Markus Jelsma
ignore, i misread :) Each fq will create a bitmap that is bounded by (maxdocs / 8) bytes. You can think of the entries in the fiterCache as a map where the key is the filter query you specify and the value is the aforementioned bitmap. The number of entries specified in the config file

Re: DataImportHandler Streaming XML Parse

2011-11-21 Thread Chris Hostetter
: We're using DIH to import flat xml files. We're getting Heap memory : exceptions due to the file size. Is there any way to force DIH to do a : streaming parse rather than a DOM parse? I really don't want to chunk my : files up or increase the heap size. The XPathEntityProcessor is using a

RE: Solr filterCache size settings...

2011-11-21 Thread Chris Hostetter
: One clarification, is the maxdocs the max docs in the set, or the matched docs from the set? : : If there are 1000 docs and 19 of them match, is the maxdocs 1000, or 19? Erick ment the maxDocs of the index -- but that's really just a rule of thumb approximation that applies when many docs

RE: Efficient title sorting on large result sets.

2011-11-21 Thread Chris Hostetter
: The way that I've solved this in the past is to make a field : specifically for sorting and then truncate the string to a small number : of characters and sort on that. You have to accept that in some cases Something to consider is the ICUCollationKeyFilterFactory. As noted on the wiki...

Returning and faceting on some of the field's values

2011-11-21 Thread Jeff Schmidt
Hello: Solr version: 3.4.0 I'm trying to figure out if it's possible to both return (retrieval) as well as facet on certain values of a multivalued field. The scenario is a life science app comprised of a graph of nodes (genes, chemicals etc.) and each node has a neighborhood consisting of

Error Handling of deliberately missing required field

2011-11-21 Thread Greg Pelly
Hi, I'm trying to implement error handling in a PHP client (through the PHP SOLR Plugin), I'm doing so by making a missing field mandatory temporarily. When the update is sent through without the field made mandatory I get a response back with a status code of 0 which is great. In the situation

Painfully slow transfer speed from Solr

2011-11-21 Thread Stephen Powis
I'm running Solr 1.4.1 with Jetty. When I make requests against solr that have a large response (~1mb of data) I'm getting super slow transfer times back to the client, I'm hoping you guys can help shed some light on this issue for me. Some more information about my setup: - The qTime header in

how to use term proxymity queries with apache solr

2011-11-21 Thread Rahul Mehta
Hello, Have used Proximity Queries only work using a sloppy phrase query (e.g.: catalyst polymer ~5) but do not allow wildcards. Want to use Proximity Queries between any terms (e.g.: (poly* NEAR *lyst)) is this possible using additional query parsers like Surround? if yes ,Please suggest how

Solr real time update

2011-11-21 Thread yu shen
Hi All, I try to do a 'nearly real time update' to solr. My solr version is 1.4.1. I read this solr CommentWithin http://wiki.apache.org/solr/CommitWithinwiki, and a related threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly on the difficulty to do

Re: Painfully slow transfer speed from Solr

2011-11-21 Thread Shawn Heisey
On 11/21/2011 8:45 PM, Stephen Powis wrote: I'm running Solr 1.4.1 with Jetty. When I make requests against solr that have a large response (~1mb of data) I'm getting super slow transfer times back to the client, I'm hoping you guys can help shed some light on this issue for me. Some more

Matching + and

2011-11-21 Thread Tomasz Wegrzanowski
Hi, I've been trying to match some phrases with + and (like c++, google+, rd etc.), but tokenized gets rid of them before I can do anything with synonym filters. So I tried using CharFilters like this: fieldType name=text class=solr.TextField positionIncrementGap=100

Re: Painfully slow transfer speed from Solr

2011-11-21 Thread Stephen Powis
Thanks for the reply Shawn. The solr server currently has 8gb of ram and the total size of the dataDir is around 30gb. I start solr and give the java heap up to 4gb of ram, so that leaves 4gb for the OS, there are no other running services on the box. So from what you are saying, we are way

Re: TikaEntityProcessor not working?

2011-11-21 Thread kumar8anuj
Thanks for the reply Gora, I tried Googling but didn't find anything on this. I didn't try this on Tika mailing list ,I will post this to tika mailing list now. Thanks for the suggestion On Mon, Nov 21, 2011 at 9:10 PM, Gora Mohanty-3 [via Lucene] ml-node+s472066n3525046...@n3.nabble.com

Re: Painfully slow transfer speed from Solr

2011-11-21 Thread Yonik Seeley
On Tue, Nov 22, 2011 at 12:19 AM, Stephen Powis stephen.po...@pardot.com wrote: Just trying to get a better understanding of this.Wouldn't the indexes not being in the disk cache make the queries themselves slow as well (high qTime), not just fetching the results? What happens in

Re: Painfully slow transfer speed from Solr

2011-11-21 Thread Shawn Heisey
On 11/21/2011 10:19 PM, Stephen Powis wrote: Thanks for the reply Shawn. The solr server currently has 8gb of ram and the total size of the dataDir is around 30gb. I start solr and give the java heap up to 4gb of ram, so that leaves 4gb for the OS, there are no other running services on the

Re: Solr real time update

2011-11-21 Thread yu shen
Hi All, After some study, I used below snippet. Seems the documents is updated, while still takes a long time. Feels like the parameter does not take effect. Any comments? UpdateRequest req = new UpdateRequest(); req.add(solrDocs); req.setCommitWithin(5000);

Re: Painfully slow transfer speed from Solr

2011-11-21 Thread Walter Underwood
When you ask for a large response (~1mb of data), you are asking for Solr to do tons of disk accesses and sorting before it sends the first response. That is going to be slow. I strongly recommend requesting smaller results. One of those requests may be using most of the caching resources in

Integrating Surround Query Parser

2011-11-21 Thread Rahul Mehta
Hello, I want to Run surround query . 1. Downloading from http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm 2. Moved the lucene-surround-2.4.1.jar to /apache-solr-3.1.0/example/lib 3. Edit the solrconfig.xml with 1. queryParser name=SurroundQParser class=

Re: Question About Writing Custom Query Parser Plugin

2011-11-21 Thread rahul23134
Have you made that class i want to integrate the surround plugin with solr . -- View this message in context: http://lucene.472066.n3.nabble.com/Question-About-Writing-Custom-Query-Parser-Plugin-tp2360751p3527092.html Sent from the Solr - User mailing list archive at Nabble.com.