Hi everyone,
We have a large product catalogue (currently 9 million, but soon to inflate to
around 25 million) with each product have a unicode title. We're offering the
facility to sort by title, but often within quite large result sets, eg 1
million fiction books (we are correctly using
I'm new to SolR and just got things working.
I can query my index retrieve JSON results via: HTTP GET using: wt=json
and q=num_cpu parameters:
e.g.:
http://127.0.0.1:8080/solr/select?indent=onversion=2.2q=num_cpu%3A16fq=start=0rows=10fl=*%2Cscoreqt=wt=jsonexplainOther=debugQuery=on
When the
Dear list,
just in case you are planning to integrate or combine a thesaurus with Solr
the following report might help you.
BASE - Solr and the multilingual EuroVoc Thesaurus
http://www.ub.uni-bielefeld.de/~befehl/base/solr/eurovoc.html
In brief:
It explains how a working solution is possible
It may be. The tricky bit is that there is a constant governing the behavior of
this that restricts it to 3.6 and above. You'll have to change it after applying
the patch for this to work for you. Should be trivial, I'll leave a note in the
code about this, look for SOLR-2438 in the 3x code line
On 11/21/2011 12:41 AM, Husain, Yavar wrote:
Number of rows in SQL Table (Indexed till now using Solr): 1 million
Total Size of Data in the table: 4GB
Total Index Size: 3.5 GB
Total Number of Rows that I have to index: 20 Million (approximately 100 GB
Data) and growing
What is the best
Erick,
Need your help on this. Waiting for resolution. Please help ...
--
View this message in context:
http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-tp856965p3524881.html
Sent from the Solr - User mailing list archive at Nabble.com.
Sorry, but I don't really have that info.
Erick
On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj kumar.an...@gmail.com wrote:
Erick,
Need your help on this. Waiting for resolution. Please help ...
--
View this message in context:
So where can i get some information on this issue, Can you please help ?
On Mon, Nov 21, 2011 at 8:17 PM, Erick Erickson [via Lucene]
ml-node+s472066n3524905...@n3.nabble.com wrote:
Sorry, but I don't really have that info.
Erick
On Mon, Nov 21, 2011 at 9:37 AM, kumar8anuj [hidden
On Mon, Nov 21, 2011 at 8:45 PM, kumar8anuj kumar.an...@gmail.com wrote:
So where can i get some information on this issue, Can you please help ?
Have you tried simple things like searching Google, using the Tika
site, and, failing these, asking on a Tika-specific mailing list? No
offence, but
Hi Andrew,
When you request a sort on a field, Lucene stores every unique value in
a field cache, which stays in ram. If you have a large index and you're
sorting on a Unicode string field, this can be very memory intensive.
The way that I've solved this in the past is to make a field
We're trying to limit disk space when we optimize since we often hit out
of disk space errors. We plan to add more disks but in the meantime I am
pursing a software solution... in the past we have done multiple passes
by looking at the number of segments and then optimizing down like 16,
8, 4,
Thanks !
My business requirements have changed a bit.
We need one year rolling data in Production.
The index size for the same comes to approximately 200 - 220 GB.
I am planning to address this using Solr distributed search as follows.
1. Whole index to be split up between 3 shards, with 3
Thanks Otis !
Please ignore my earlier email which does not have all the information.
My business requirements have changed a bit.
We now need one year rolling data in Production, with the following details
- Number of records - 1.2 million
- Solr index size for these records comes to
Thank you for your reply.
One clarification, is the maxdocs the max docs in the set, or the matched docs
from the set?
If there are 1000 docs and 19 of them match, is the maxdocs 1000, or 19?
--
Andrew
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent:
Each fq will create a bitmap that is bounded by (maxdocs / 8) bytes.
You can think of the entries in the fiterCache as a map where the key is
the filter query you specify and the value is the aforementioned bitmap.
The number of entries specified in the config file is the number of
entries
ignore, i misread :)
Each fq will create a bitmap that is bounded by (maxdocs / 8) bytes.
You can think of the entries in the fiterCache as a map where the key is
the filter query you specify and the value is the aforementioned bitmap.
The number of entries specified in the config file
: We're using DIH to import flat xml files. We're getting Heap memory
: exceptions due to the file size. Is there any way to force DIH to do a
: streaming parse rather than a DOM parse? I really don't want to chunk my
: files up or increase the heap size.
The XPathEntityProcessor is using a
: One clarification, is the maxdocs the max docs in the set, or the matched
docs from the set?
:
: If there are 1000 docs and 19 of them match, is the maxdocs 1000, or 19?
Erick ment the maxDocs of the index -- but that's really just a rule of
thumb approximation that applies when many docs
: The way that I've solved this in the past is to make a field
: specifically for sorting and then truncate the string to a small number
: of characters and sort on that. You have to accept that in some cases
Something to consider is the ICUCollationKeyFilterFactory. As noted on
the wiki...
Hello:
Solr version: 3.4.0
I'm trying to figure out if it's possible to both return (retrieval) as well as
facet on certain values of a multivalued field. The scenario is a life science
app comprised of a graph of nodes (genes, chemicals etc.) and each node has a
neighborhood consisting of
Hi,
I'm trying to implement error handling in a PHP client (through the PHP
SOLR Plugin), I'm doing so by making a missing field mandatory temporarily.
When the update is sent through without the field made mandatory I get a
response back with a status code of 0 which is great. In the situation
I'm running Solr 1.4.1 with Jetty. When I make requests against solr that
have a large response (~1mb of data) I'm getting super slow transfer times
back to the client, I'm hoping you guys can help shed some light on this
issue for me.
Some more information about my setup:
- The qTime header in
Hello,
Have used Proximity Queries only work using a sloppy phrase query (e.g.:
catalyst polymer ~5) but do not allow wildcards.
Want to use Proximity Queries between any terms (e.g.: (poly* NEAR *lyst))
is this possible using additional query parsers like Surround?
if yes ,Please suggest how
Hi All,
I try to do a 'nearly real time update' to solr. My solr version is 1.4.1.
I read this solr CommentWithin
http://wiki.apache.org/solr/CommitWithinwiki, and a related
threadhttp://lucene.472066.n3.nabble.com/Solr-real-time-update-taking-time-td3472709.htmlmostly
on the difficulty to do
On 11/21/2011 8:45 PM, Stephen Powis wrote:
I'm running Solr 1.4.1 with Jetty. When I make requests against solr that
have a large response (~1mb of data) I'm getting super slow transfer times
back to the client, I'm hoping you guys can help shed some light on this
issue for me.
Some more
Hi,
I've been trying to match some phrases with + and (like c++,
google+, rd etc.),
but tokenized gets rid of them before I can do anything with synonym filters.
So I tried using CharFilters like this:
fieldType name=text class=solr.TextField
positionIncrementGap=100
Thanks for the reply Shawn.
The solr server currently has 8gb of ram and the total size of the dataDir
is around 30gb. I start solr and give the java heap up to 4gb of ram, so
that leaves 4gb for the OS, there are no other running services on the
box. So from what you are saying, we are way
Thanks for the reply Gora, I tried Googling but didn't find anything on
this. I didn't try this on Tika mailing list ,I will post this to tika
mailing list now. Thanks for the suggestion
On Mon, Nov 21, 2011 at 9:10 PM, Gora Mohanty-3 [via Lucene]
ml-node+s472066n3525046...@n3.nabble.com
On Tue, Nov 22, 2011 at 12:19 AM, Stephen Powis
stephen.po...@pardot.com wrote:
Just trying to get a better understanding of this.Wouldn't the indexes
not being in the disk cache make the queries themselves slow as well (high
qTime), not just fetching the results?
What happens in
On 11/21/2011 10:19 PM, Stephen Powis wrote:
Thanks for the reply Shawn.
The solr server currently has 8gb of ram and the total size of the dataDir
is around 30gb. I start solr and give the java heap up to 4gb of ram, so
that leaves 4gb for the OS, there are no other running services on the
Hi All,
After some study, I used below snippet. Seems the documents is updated,
while still takes a long time. Feels like the parameter does not take
effect. Any comments?
UpdateRequest req = new UpdateRequest();
req.add(solrDocs);
req.setCommitWithin(5000);
When you ask for a large response (~1mb of data), you are asking for Solr to
do tons of disk accesses and sorting before it sends the first response. That
is going to be slow.
I strongly recommend requesting smaller results.
One of those requests may be using most of the caching resources in
Hello,
I want to Run surround query .
1. Downloading from
http://www.java2s.com/Code/Jar/JKL/Downloadlucenesurround241jar.htm
2. Moved the lucene-surround-2.4.1.jar to /apache-solr-3.1.0/example/lib
3. Edit the solrconfig.xml with
1. queryParser name=SurroundQParser class=
Have you made that class i want to integrate the surround plugin with solr .
--
View this message in context:
http://lucene.472066.n3.nabble.com/Question-About-Writing-Custom-Query-Parser-Plugin-tp2360751p3527092.html
Sent from the Solr - User mailing list archive at Nabble.com.
34 matches
Mail list logo