Have you tried to reindex using DocValues? Fields used for faceting are
stored on disk and not on ram using the FieldCache. If you have enough
memory they will be loaded on the system cache but not on the java heap.
This is good for GC too when committing.
http://wiki.apache.org/solr/DocValues
This is totally deprecated but maybe can be helpful if you want to re-sort
some documents
https://issues.apache.org/jira/browse/SOLR-1311
--
View this message in context:
http://lucene.472066.n3.nabble.com/Tweaking-boosts-for-more-search-results-variety-tp4088302p4089044.html
Sent from the
Hey there,
I'm testing a custom similarity which loads data from and external file
located in solr_home/core_name/conf/. I load data from the file into a Map
on the init method of the SimilarityFactory. I would like to reload that Map
every time a commit happens or every X hours.
To do that I've
Not a solution for the short term but sounds like a good use case to migrate
to Solr 4.X and use DocValues instead of FieldCache for faceting.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-3-6-optimize-and-field-cache-question-tp4076398p4076822.html
Sent from the
I have a doubt about how NRTCachingDirectory works.
As far as I've seen, it receives a delegator Directory and caches newly
created segments. So, if MMapDirectory use to be the default:
1.- Does NRTCachingDirectory works acting sort of as a wrapper of MMap
caching the new segments?
2.- If I have
Has someone noticed this problem and solved it somehow? (without using
LUCENE_33 in the solrconfig.xml)
https://issues.apache.org/jira/browse/LUCENE-3668
Thanks in advance
--
View this message in context:
Well an example would be:
synonyms.txt:
huge,big size
The I have the docs:
1- The huge fox attacks first
2- The big size fox attacks first
Then if I query for huge, the highlights for each document are:
1- The stronghuge/strong strongfox/strong attacks first
2- The strongbig size/strong fox
http://lucene.472066.n3.nabble.com/Multiple-Facet-Dates-td495480.html
--
View this message in context:
http://lucene.472066.n3.nabble.com/Faceting-on-a-date-field-multiple-times-tp3961282p3961865.html
Sent from the Solr - User mailing list archive at Nabble.com.
http://wiki.apache.org/solr/Deduplication
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-ignore-indexing-of-duplicated-documents-tp3814858p3818973.html
Sent from the Solr - User mailing list archive at Nabble.com.
timeAllowed can be used outside distributed search. It is used by the
TimeL¡mitingCollector. When the search time is equal to timeAllowed it will
stop searching and will return the results that could find till then.
This can be a problem when using incremental indexing. Lucene starts
searching
As far as I know there's no issue about this. You have to reindex and that's
it.
In which kind of field are you changing the norms? (You just will see
changes in text fields)
Using debugQuery=true you can see how norms affect the score (in case you
have them not omited)
--
View this message in
Replication is easier to manage and a bit faster. See the performance
numbers: http://wiki.apache.org/solr/SolrReplication
--
View this message in context:
http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html
Sent from the Solr - User mailing
Hey there,
I'm wondering if there's a more clean way to to this:
I've written a SearchComponent, that runs as last-component. In the prepare
method I build a DocSet (SortedIntDocSet) based on if some values of the
fieldCache of a determined field accomplish some rules (if rules are
accomplished,
Deduplication uses lucene indexWriter.updateDocument using the signature
term. I don't think it's possible as a default feature to choose wich
document to index, the original should be always the last to be indexed.
/IndexWriter.updateDocument
Updates a document by first deleting the document(s)
You have different options here. You can give more boost at indexing time to
the documents that have set the fields you want. For this to take effect you
will have to reindex and set omitNorms=false to the fields you are going
to search. This same concept can be applied to boost single fields
Are u indexing with full import? In case yes and the resultant index has
similar num of docs (that the one you had before) try setting reopenReaders
to false in solrconfig.xml
* You have to send the comit, of course.
--
View this message in context:
Hey there,
I've noticed a very odd behaviour with the snapinstaller and commit (using
collectionDistribution scripts). The first time I install a new index
everything works fine. But when installing a new one, I can't see the new
documents. Checking the status page of the core tells me that the
Test are done on Solr 1.4
The simplest way to reproduce my problem is having 2 indexes and a Solr box
with just one core. Both index must have been created with the same schema.
1- Remove the index dir of the core and start the server (core is up with an
empty index)
2- check status page of the
I don't know if this could have something to do with the problem but some of
the files of the indexes have same size and name (in all the index but not
in the empty one).
I have also realized that when moving back to the empty index and
committing, numDocs and maxDocs change. Once I'm with the
I have some more info!
I've build another index bigger than the others so names of the files are
not the same. This way, if I move from any of the other index to the bigger
one or vicevera it works (I can see the cahnges in the version, numDocs and
maxDocs)! So, I thing it is related to the name
I've found the problem in case someone is interested.
It's because of the indexReader.reopen(). If it is enabled, when opening a
new searcher due to the commit, this code is executed (in
SolrCore.getSearcher(boolean forceNew, boolean returnSearcher, final
Future[] waitSearcher)):
...
Any suggestion about this issue?--
View this message in context:
http://lucene.472066.n3.nabble.com/Strange-performance-behaviour-when-concurrent-requests-are-done-tp505478p2878758.html
Sent from the Solr - User mailing list archive at Nabble.com.
That's true. But the degradation is so big. If you use lunch concurrent
requests to a web app taht doesn't use Solr the time per request won't
degradate that much. For me, it looks more like a synchronized is happening
somewhere in Solr or Lucene and is causing this.--
View this message in
In case you need to create lots of indexes and register/unregister fast,
there is work on the way http://wiki.apache.org/solr/LotsOfCores
--
View this message in context:
I need to dive into search grouping / field collapsing again. I've seen there
are lot's of issues about it now.
Can someone point me to the minimum patches I need to run this feature in
trunk? I want to see the code of the most optimised version and what's being
done in distributed search. I
and i index data on the basis of these fields. Now, incase i need to add a
new field, is there a way i can add the field without corrupting the
previous data. Is there any feature which adds a new field with a
default value to the existing records.
You just have to add the new field in the
and i index data on the basis of these fields. Now, incase i need to add a
new field, is there a way i can add the field without corrupting the
previous data. Is there any feature which adds a new field with a
default value to the existing records.
You just have to add the new field in the
As far as I know, in the core admin page you can find when was the last time
an index had a modification and was comitted checking the lastModified.
But? what startTime and uptime mean?
Thanks in advance
--
View this message in context:
To create the core, the folder with the confs must already exist and has to
be placed in the proper place (inside the solr home). Once you run the
create core action, this core will we added to solr.xml and dinamically
loaded.
--
View this message in context:
You have to create the core's folder with it's conf inside the Solr home.
Once done you can call the create action of the admin handler:
http://wiki.apache.org/solr/CoreAdmin#CREATE
If you need to dinamically create, start and stop lots of cores there's this
patch, but don't know about it's
Well these are pretty different things. SolrCloud is meant to handle
distributed search in a more easy way that raw solr distributed search.
You have to build the shards in your own way.
Solr+hadoop is a way to build these shards/indexes in paralel.
--
View this message in context:
I noticed that long ago.
Fixed it doing in HighlightComponent finishStage:
@Override
public void finishStage(ResponseBuilder rb) {
boolean hasHighlighting = true ;
if (rb.doHighlights rb.stage == ResponseBuilder.STAGE_GET_FIELDS) {
Map.EntryString, Object[] arr = new
http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/
--
View this message in context:
http://lucene.472066.n3.nabble.com/JVM-GC-is-very-frequent-tp1345760p1348065.html
Sent from the Solr - User mailing list archive at Nabble.com.
I need to load a FieldCache for a field wich is a solr integer type and has
as maximum 3 digits. Let's say my index has 10M docs.
I am wandering what is more optimal and less memory consumig, to load a
FieldCache.DEFAUL.getInts or a FieldCache.DEFAULT.getStringIndex.
The second one will have a
As far as I know, the higher you set the value, the faster the indexing
process will be (because more things are kept in memory). But depending on
which are your needs, it may not be the best option. If you set a high
mergeFactor and you want to optimize the index once the process is done,
this
Hey there,
I've done some tests with a custom java app using EmbeddedSolrServer to
create an index.
It works ok and I am able to build the index but I've noticed after the
commit an optimize are done, the app never terminates.
How should I end it? Is there any way to tell the EmbeddedSolrServer
Seems that coreContainer.shoutdown() solves the problem.
Anyone doing it in a different way?
--
View this message in context:
http://lucene.472066.n3.nabble.com/ending-a-java-app-that-uses-EmbeddedSolrServer-tp963573p964013.html
Sent from the Solr - User mailing list archive at Nabble.com.
I supose you use BatchSize=-1 to index that amount of data. Up from 5.1.7
connector there's this param:
netTimeoutForStreamingResults
The default value is 600. Increasing that maybe can help (2400 for example?)
--
View this message in context:
*There are lot's of docs with the same value, I mention that because I
supose that same value has nothing to do with the number of un-inverted term
instances.
It has to do, I've been able to reproduce teh error by setting different
values to each field:
HTTP Status 500 - there are more terms
Hi Otis, just for curiosity, wich strategy do you use? Index in the map or
reduce side?
Do you use it to build shards or a single monolitic index?
Thanks
--
View this message in context:
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html
Sent from the Solr - User
Thanks, that's very useful info. However can't reproduce the error. I've
created and index where all documents have a multivalued date field and each
document have a minimum of one value in that field. (most of the docs have 2
or 3). So, the number of un-inverted term instances is greater than
Well, sorting requires that all the unique values in the target field
get loaded into memory
That's what I tought, thanks.
But a larger question is whether what your doing is worthwhile
even as just a measurement. You say
This is good for me, I don't care for my tests. I claim that
you do care
I
I think there's people using this patch in production:
https://issues.apache.org/jira/browse/SOLR-1301
I have tested it myself indexing data from CSV and from HBase and it works
properly
--
View this message in context:
I think a good solution could be to use hadoop with SOLR-1301 to build solr
shards and then use solr distributed search against these shards (you will
have to copy to local from HDFS to search against them)
--
View this message in context:
Well, the patch consumes the data from a csv. You have to modify the input to
use TableInputFormat (I don't remember if it's called exaclty like that) and
it will work.
Once you've done that, you have to specify as much reducers as shards you
want.
I know 2 ways to index using hadoop
method 1
Maybe this helps:
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
--
View this message in context:
http://lucene.472066.n3.nabble.com/Can-query-boosting-be-used-with-a-custom-request-handlers-tp884499p912691.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hey Erik,
I am currently sorting by a multiValued. It apears a feature tha't you may
not know wich of the fields of the multiValued field makes the document be
in that position. This is good for me, I don't care for my tests.
What I need to know if there is any performance issue in all of this.
hey there!
can someone explain me how impacts to have multivalued fields when sorting?
I have read in other threads how does it affect when faceting but couldn't
find any info of the impact when sorting
Thanks in advance
--
View this message in context:
I mean sorting the query results, not facets.
I am asking because I have added a multivalued field that has as much 10
values. But 70% of the docs has just 1 or 2 fields of this multiValued
field. I am not doing faceting.
Since I have added the multiValued field, java old gen seems to get full
I normally use jmeter, jconsole and iostat. Recently
http://www.newrelic.com/solr.html has been released
--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-test-solr-s-performance-tp881928p885025.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hey there,
I am facing a problem related to query analysis and stopwords. Have some
ideas how to sort it out but would like to do it in the cleanest way
possible.
I am using dismax and I query to 3 fields. These fields are defined as
text this way:
fieldType name=text class=solr.TextField
With the uninverted algorithm it will be very fast whatever is the number of
unique terms. But be careful with the memory because it uses quite a lot.
Using the oldest facet algorithm, if you have a lot of different terms it
will be slow.
--
View this message in context:
Since Solr 1.4 I think the uninverted method is on by default. Anyway, you
can choose wich to use with the method param:
facet.method=fc/enum (where fc is the uninverted one)
http://wiki.apache.org/solr/SimpleFacetParameters
--
View this message in context:
You can use deduplication to do that. Create the signature based on the
unique field or any field you want.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p772768.html
Sent from the Solr - User mailing list
I have noticed when using q.alt even if hl=true highlights are not returned.
When using distributed search, q.alt and hl, HighlightComponent.java
finishStage expects the highlighting NamedList of each shard (if hl=true)
but it will never be returned. It will end up with a NullPointerExcepion.
I
I have noticed when using q.alt even if hl=true highlights are not returned.
When using distributed search, q.alt and hl, HighlightComponent.java
finishStage expects the highlighting NamedList of each shard (if hl=true)
but it will never be returned. It will end up with a NullPointerExcepion.
I
Should I include not omit-norms on any fields that I would like to boost
via a boost-query/function query?
You don't have to set norms to use boost queries or functions. Just have to
set them when you want to boost docs or fields at indexing time.
What about sortable fields? Facetable fields?
I don't think there's a way to do what has come to my mind but want to be
sure.
Let's say I have a doc with 2 fileds, one is multiValued
doc1:
name-john
year-2009;year-2010;year-2011
And I query for:
q=johnfq=-year:2010
Doc1 won't be in the matching results. Is there a way to make it appear
I am debuggin a 2 words query build using dismax. So it's build from
DisjunctionMaxQueries being the minShouldMatch 100% and tie breaker
multiplier = 0.3
+((DisjunctionMaxQuery((content:john | title:john~0.3)
DisjunctionMaxQuery((content:malone | title:malone)~0.3))~2)
And a 3 words one (with
Hey there,
I am testing date facets in trunk with huge index. Aparently, as the default
solrconfig.xml shows, the fastest way to run dace facets queries is index
the field with this data type:
!-- A Trie based date field for faster date range queries and date
faceting. --
fieldType
I'll give you an example about how to configure your default SearchHandler to
do highlighting but I strongly recomend you to check properly the wiki.
Everything is really well explained in there:
http://wiki.apache.org/solr/HighlightingParameters
str name=hltrue/str
str
If you shut down the server propertly it's weird that you get an error when
starting up again.
How did you delete the index? I was experiencing something similar long time
ago because I was removing the content from the index folder but not the
folder itself. The correct way to do it was to
Are you sure you don't have a folder called exampledocs with xml files
inside? These are the files to index as a first example:
apache-solr-1.5-dev/example/exampledocs
Check the
/home/marc/Desktop/data/apache-solr-1.5-dev/example/solr/conf/schema.xml and
solrconfig.xml and you will see how to
I think you can handle that writing a custom transformer. There's a good
explanation in the wiki:
http://wiki.apache.org/solr/DIHCustomTransformer
KshamaPai wrote:
Hi,
Am new to solr.
I am trying location aware search with spatial lucene in solr1.5 nightly
build.
My table in mysql has
There's no problem about having the same warming in both cases. First queries
are use to warm the index once you start the solr instance. New queries warm
the index once a commit in executed, for example.
In first queries warming there was no previous IndexSearcher opened. In new
queries there
As far as I know it's not suported by default. I thing you should implement
your custom Lucene Similarity class and plug it into Solr via solrconfig.xml
pcmanprogrammeur wrote:
Hello all (sorry if my english is bad, i'm french) !
I have a Solr Index with ads which contain a title and a
Hey there, I am experiencing concurrent performance problems in trunk. Does
it open readers in readOnly mode?
Thanks in advance
--
View this message in context:
http://old.nabble.com/readOnly-and-concurrency-performance-problems-tp27670680p27670680.html
Sent from the Solr - User mailing list
or something?I am quite lost and surprised about the behaviour I
have noticed...
markrmiller wrote:
Yeah it does - I take it your not on windows?
- Mark
http://www.lucidimagination.com (mobile)
On Feb 20, 2010, at 4:39 PM, Marc Sturlese marc.sturl...@gmail.com
wrote:
Hey there, I am
I have noticed that in the class UninvertedField.java there is a synchronized
access to the FieldValueCache.
I would like to know why this access is synchronized. Could this end up in a
loss of performance when there are concurrent search requests?
I am doing as much research as I can as I have
Hey there,
I have been doing some stress with a 2 physical CPU (with 4 cores each)
server.
After some reading about GC performance tunning I have configured it this
way:
/usr/lib/jvm/java-6-sun/bin/java -server -Xms7000m -Xmx7000m
-XX:ReservedCodeCacheSize=10m -XX:NewSize=1000m
Hey there,
I see that when solr gives me back the scores in the response it are the
same for many different documents.
I have build a simple index for testing purposes with just documents with
one field indexed with standard analyzer and containing pices of text.
I have done the same with a self
?
On Mon, Feb 1, 2010 at 8:04 AM, Marc Sturlese marc.sturl...@gmail.com
wrote:
I already asked about this long ago but the answer doesn't seem to
work...
I am trying to set a negative query boost to send the results that match
field_a: 54 to a lower position. I have tried it in 2 different
: bq=(*:* -field_a:54^1)
I think what you want there is bq=(*:* -field_a:54)^1
...you are boosting things that don't match field_a:54
Thanks Hoss. I've updated the Wiki, the content of the bq param was wrong:
I like to use JMeter with a large queries file. This way you can measure
response times with lots of requests at the same time. Having JConsole
opened at the same time you can check the memory status
James liu-2 wrote:
before stressing test, Should i close SolrCache?
which tool u use?
I already asked about this long ago but the answer doesn't seem to work...
I am trying to set a negative query boost to send the results that match
field_a: 54 to a lower position. I have tried it in 2 different ways:
bq=(*:* -field_a:54^1)
bq=-field_a:54^1
None of them seem to work.
I am testing trunk and have seen a different behaviour when loading
updateProcessors wich I don't know if it's normal (at least with multicore)
Before I use to use an updateProcessorChain this way:
requestHandler name=/update class=solr.XmlUpdateRequestHandler
lst name=defaults
str
have a look:
http://issues.apache.org/jira/browse/SOLR-1395
V SudershanReddy wrote:
Hi,
Can we Integrate solr with katta?
In order to overcome the limitations of Solr in distributed search, I
need
to integrate katta with solr, without loosing any features of Solr.
In case you are going to use core per user take a look to this patch:
http://wiki.apache.org/solr/LotsOfCores
Trey-13 wrote:
Hi Matt,
In most cases you are going to be better off going with the userid method
unless you have a very small number of users and a very large number of
Check out this patch witch solve the distributed IDF's problem:
https://issues.apache.org/jira/browse/SOLR-1632
I think it fixes what you are explaining. The price you pay is that there
are 2 requests per shard. If I am not worng the first is to get term
frequencies and needed info and the second
If you want to retrieve a huge volume of rows you will end up with an
OutOfMemoryException due to the jdbc driver. Setting batchSize to -1 in your
data-config.xml (that internally will set it to Integer.MIN_VALUE) will make
the query to be executed in streaming, avoiding the memory exception.
Should sortMissingLast param be working on trie-fields?
--
View this message in context:
http://old.nabble.com/tire-fields-and-sortMissingLast-tp26873134p26873134.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hey there,
I need that once a document has been created be able to decide if I want it
to be indexed or not. I have thought in implement an UpdateRequestProcessor
to do that but don't know how to tell Solr in the processAdd void to skip
the document.
If I delete all the field would it be skiped
{
LOG.debug(Doc skipped!) ;
}
}
Thanks in advance
Chris Male wrote:
Hi,
If your UpdateRequestProcessor does not forward the AddUpdateCommand onto
the RunUpdateProcessor, I believe the document will not be indexed.
Cheers
On Thu, Dec 10, 2009 at 12:09 PM, Marc
Yes, it did
Cheers
Chris Male wrote:
Hi,
Yeah thats what I was suggesting. Did that work?
On Thu, Dec 10, 2009 at 12:24 PM, Marc Sturlese
marc.sturl...@gmail.comwrote:
Do you mean something like?:
@Override
public void processAdd(AddUpdateCommand cmd) throws IOException
I am tracing QueryComponent.java and would like to know the pourpose of doFSV
function. Don't understand what fsv are for.
Have tried some queries with fsv=true and some extra info apears in the
response:
lst name=sort_values/
But don't know what is it for and can't find much info out there. I
And what about:
fieldtype name=sint class=solr.SortableIntField
sortMissingLast=true/
vs.
fieldtype name=bcdint class=solr.BCDIntField sortMissingLast=true/
Wich is the differenece between both? It's just bcdint always better?
Thanks in advance
Yonik Seeley-2 wrote:
On Fri, Dec 4, 2009 at
With 1.4
-Add log4j jars to Solr
-Configure de SyslogAppender with something like:
log4j.appender.solrLog=org.apache.log4j.net.SyslogAppender
log4j.appender.solrLog.Facility=LOCAL0
log4j.appender.solrLog.SyslogHost=127.0.0.1
log4j.appender.solrLog.layout=org.apache.log4j.PatternLayout
Hey there,
I am using Solr 1.4 out of the box and am trying to create a core at runtime
using the CREATE action.
I am getting this error when executing:
http://localhost:8983/solr/admin/cores?action=CREATEname=xinstanceDir=xpersist=trueconfig=solrconfig.xmlschema=schema.xmldataDir=data
Hey there,
I am thinking to develope facet dates for distributed search but I don't
know exacly where to start. I am familiar with facet dates source code and I
think if I could undesertand how distributed facet queries work shouldn't be
that difficult.
I have read
Are you using one single solr instance with multicore or multiple solr
instances with one index each?
Erik_l wrote:
Hi,
Currently we're running 10 Solr indexes inside a single Tomcat6 instance.
In the near future we would like to add another 30-40 indexes to every
Tomcat instance we
to
hold you will suffer of slow response times.
Erik_l wrote:
We're not using multicore. Today, one Tomcat instance host a number of
indexes in form of 10 Solr indexes (10 individual war files).
Marc Sturlese wrote:
Are you using one single solr instance with multicore or multiple solr
Hey there,
I am trying to set up the Katta integration plugin. I would like to know if
Katta's ranking algorith is used when searching among shards. In case yes,
would it mean it solves the problem with IDF's of distributed Solr?
--
View this message in context:
I think it doesn't make sense to enable warming if your solr instance is just
for indexing pourposes (it changes if you use it for search aswell). You
could comment the caches aswell from solrconfig.xml
Setting queryResultWindowSize and queryResultMaxDocsCached to sero maybe
could help... (but if
Hey there, I am using DIH to import a db table and and have writed a custom
transformer following the example:
package foo;
public class CustomTransformer1{
public Object transformRow(MapString, Object row) {
String artist = row.get(artist);
if
Hey there,
I need a query to get the total number of documents in my index. I can get
if I do this using DismaxRequestHandler:
q.alt=*:*facet=falsehl=falserows=0
I have noticed this query is very memory consuming. Is there any more
optimized way in trunk to get the total number of documents of my
Hey there, I need to sort my query results alphabetically for a determinated
field called town. This field is analyzed with a KeywordAnalyzer and isn't
multiValued. Add that some docs doesn't doesn'h have this field.
Doing just:
.
On Mon, Aug 24, 2009 at 11:58 AM, Marc Sturlese
marc.sturl...@gmail.comwrote:
Hey there, I need to sort my query results alphabetically for a
determinated
field called town. This field is analyzed with a KeywordAnalyzer and
isn't
multiValued. Add that some docs doesn't doesn'h have
happens ;)
On Mon, Aug 24, 2009 at 12:24 PM, Marc Sturlese
marc.sturl...@gmail.comwrote:
Yes but I thought it was just for sortable fields:
sint,sfloat,sdouble,slong.
Can I apply sortMissingLastto text fields analyzed with
KeywordAnalyzer?
Constantijn Visinescu wrote:
There's
As far as I know you can not do that with DIH. What size is your index?
Probably the best you can do is index from scratch again with full-import.
clico wrote:
I hope it could be a solution.
But I think I understood that u can use deletePkQuery like this
select document_id from
:the only way to negative boost is to positively boost the inverse...
:
: (*:* -field1:value_to_penalize)^10
This will do the job aswell as bq supports pure negative queries (at least
in trunk):
bq=-field1:value_to_penalize^10
1 - 100 of 255 matches
Mail list logo