Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Rahul R
Here is one sample query that I picked up from the log file :

Re: Removing old documents

2012-05-02 Thread Paul Libbrecht
With which client? paul Le 2 mai 2012 à 01:29, alx...@aim.com a écrit : all caching is disabled and I restarted jetty. The same results.

Re: Solr: extracting/indexing HTML via cURL

2012-05-02 Thread Lance Norskog
You can have two fields: one which is stripped, and another which stores the original data. You can use copyField directives and make the stripped field indexed but not stored, and the original field stored but not indexed. You only have to upload the file once, and only store the text once. If

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu
Ok, thanks Otis Another question on merging What is the best way to monitor merging? Is there something in the log file that I can look for? It seems like I have to monitor the system resources - read/write IOPS etc.. and work out when a merge happened It would be great if I can do it by looking

Re: should slave replication be turned off / on during master clean and re-index?

2012-05-02 Thread Erick Erickson
Simply turn off replication during your rebuild-from-scratch. See: http://wiki.apache.org/solr/SolrReplication#HTTP_API the disabelreplication command. The autocommit thing was, I think, in reference to keeping any replication of a partial-rebuild from being replicated. Autocommit is usually a

Re: Solr Merge during off peak times

2012-05-02 Thread Erick Erickson
Why do you care? Merging is generally a background process, or are you doing heavy indexing? In a master/slave setup, it's usually not really relevant except that (with 3.x), massive merges may temporarily stop indexing. Is that the problem? Look at the merge policys, there are configurations

Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Jack Krupansky
The FieldCache gets populated the first time a given field is referenced as a facet and then will stay around forever. So, as additional queries get executed with different facet fields, the number of FieldCache entries will grow. If I understand what you have said, theses faceted queries do

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu
We have a fairly large scale system - about 200 million docs and fairly high indexing activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want to work out what a good mergeFactor setting would be by testing with different mergeFactor settings. I think the

Re: Solr Merge during off peak times

2012-05-02 Thread Erick Erickson
But again, with a master/slave setup merging should be relatively benign. And at 200M docs, having a M/S setup is probably indicated. Here's a good writeup of mergepolicy http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ If you're indexing and searching on a single machine,

Null Pointer Exception in SOLR

2012-05-02 Thread mechravi25
Hi, When I tried to remove a data from UI (which will in turn hit SOLR), the whole application got stuck up. When we took the log files of the UI, we could see that this set of requests did not reach SOLR itself. In the SOLR log file, we were able to find the following exception occuring at the

Re: Newbie question on sorting

2012-05-02 Thread Jacek
Erick, I'll do that. Thank you very much. Regards, Jacek On Tue, May 1, 2012 at 7:19 AM, Erick Erickson erickerick...@gmail.comwrote: The easiest way is to do that in the app. That is, return the top 10 to the app (by score) then re-order them there. There's nothing in Solr that I know of

RE: Solr Merge during off peak times

2012-05-02 Thread Prakashganesh, Prabhu
Actually we are not thinking of a M/S setup We are planning to have x number of shards on N number of servers, each of the shard handling both indexing and searching The expected query volume is not that high, so don't think we would need to replicate to slaves. We think each shard will be able

ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty
Greetings Solr folk, How can I instruct the extract request handler to ignore metadata/headers etc. when it constructs the content of the document I send to it? For example, I created an MS Word document containing just the word SEARCHWORD and nothing else. However, when I ship this doc to my

Re: Solr Merge during off peak times

2012-05-02 Thread Erick Erickson
Optimizing is much less important query-speed wise than historically, essentially it's not recommended much any more. A significant effect of optimize _used_ to be purging obsolete data (i.e. that from deleted docs) from the index, but that is now done on merge. There's no harm in optimizing on

Dumb question: Streaming collector /query results

2012-05-02 Thread vybe3142
I doubt if SOLR has this capability , given that it is based on a RESTful architecture, but I wanted to ask in case I'm mistaken. In lucene, it is easier to gain a direct handle to the collector / scorer and access all the results as they're collected (as opposed to the SOLR query call that

Re: Dumb question: Streaming collector /query results

2012-05-02 Thread vybe3142
In other words, .. as an alternative , what's the most efficient way to gain access to all of the document ids that match a query -- View this message in context: http://lucene.472066.n3.nabble.com/Dumb-question-Streaming-collector-query-results-tp3955175p3955194.html Sent from the Solr - User

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Jack Krupansky
Check to see if you have a CopyField for a wildcard pattern that copies to meta, which would copy all of the Tika-generated fields to meta. -- Jack Krupansky -Original Message- From: Joseph Hagerty Sent: Wednesday, May 02, 2012 9:56 AM To: solr-user@lucene.apache.org Subject:

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty
I do not. I commented out all of the copyFields provided in the default schema.xml that ships with 3.5. My schema is rather minimal. Here is my fields block, if this helps: fields field name=cust type=stringindexed=true stored=true required=true / field name=assettype=string

question about dates

2012-05-02 Thread G.Long
Hi :) I'm starting to use Solr and I'm facing a little problem with dates. My documents have a date property which is of type 'MMdd'. To index these dates, I use the following code: String dateString = 20101230; SimpleDateFormat sdf = new SimpleDateFormat(MMdd); Date date =

Re: question about dates

2012-05-02 Thread Jack Krupansky
The trailing Z is required in your input data to be indexed, but the Z is not actually stored. Your query must have the trailing Z though, unless you are doing a wildcard or prefix query. -- Jack Krupansky -Original Message- From: G.Long Sent: Wednesday, May 02, 2012 11:18 AM To:

SOLRJ: Is there a way to obtain a quick count of total results for a query

2012-05-02 Thread vybe3142
I can achieve this by building a query with start and rows = 0, and using queryResponse.getResults().getNumFound(). Are there any more efficient approaches to this? Thanks -- View this message in context:

Re: question about dates

2012-05-02 Thread Jack Krupansky
Oops... I meant to say that Solr doesn't *index* the trailing Z, but it is stored (the stored value, not the indexed value.) The query must match the indexed value, not the stored value. -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, May 02, 2012 11:55 AM

Re: question about dates

2012-05-02 Thread Jack Krupansky
That wasn't right either... the query must have the trailing Z, which Solr will strip off to match the indexed value which doesn't have the Z. So, my corrected original statement is: The trailing Z is required in your input data to be indexed, but the Z is not actually indexed by Solr (it is

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Ken Krugler
Hi Robert, On May 1, 2012, at 7:07pm, Robert Muir wrote: On Tue, May 1, 2012 at 6:48 PM, Ken Krugler kkrugler_li...@transpac.com wrote: Hi list, Does anybody know if the Suggester component is designed to work with shards? I'm not really sure it is? They would probably have to override

Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Noordeen, Roxy
Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from dataDir${solr.data.dir:./solr/data}/dataDir To dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir 2. I placed my elevate.xml in my solr's data directory.

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Jack Krupansky
I did some testing, and evidently the meta field is treated specially from the ERH. I copied the example schema, and added both meta and metax fields and set fmap.content=metax, and lo and behold only the doc content appears in metax, but all the doc metadata appears in meta. Although, I

Re: Dumb question: Streaming collector /query results

2012-05-02 Thread Mikhail Khludnev
I did small research with the fairly modest result https://github.com/m-khl/solr-patches/tree/streaming you can start exploring it from the trivial test

Re: Removing old documents

2012-05-02 Thread alxsss
I use jetty that comes with solr. I use solr's dedupe updateRequestProcessorChain name=dedupe processor class=solr.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldid/str bool name=overwriteDupestrue/bool

Re: ExtractRH: How to strip metadata

2012-05-02 Thread Joseph Hagerty
How interesting! You know, I did at one point consider that perhaps the fieldname meta may be treated specially, but I talked myself out of it. I reasoned that a field name in my local schema should have no bearing on how a plugin such as solr-cell/Tika behaves. I should have tested my hypothesis;

Dynamic core creation works in 3.5.0 fails in 3.6.0: At least one core definition required at run-time for Solr 3.6.0?

2012-05-02 Thread Emes, Matthew (US - Irvine)
Hi: I have been working on an integration project involving Solr 3.5.0 that dynamically registers cores as needed at run-time, but does not contain any cores by default. The current solr.xml configuration file is:- ?xml version=1.0 encoding=UTF-8 ? solr persistent=false sharedLib=lib cores

Re: question about dates

2012-05-02 Thread Chris Hostetter
: String dateString = 20101230; : SimpleDateFormat sdf = new SimpleDateFormat(MMdd); : Date date = sdf.parse(dateString); : doc.addField(date, date); : : In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z ( because : of GMT). because of GMT is missleading and vague ... what

Re: Error with distributed search and Suggester component (Solr 3.4)

2012-05-02 Thread Robert Muir
On Wed, May 2, 2012 at 12:16 PM, Ken Krugler kkrugler_li...@transpac.com wrote: What confuses me is that Suggester says it's based on SpellChecker, which supposedly does work with shards. It is based on spellchecker apis, but spellchecker's ranking is based on simple comparators like string

need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.

2012-05-02 Thread locuse
i've installed tomcat7 and solr 3.6.0 on linux/64 i'm trying to get a single webapp + multicore setup working. my efforts have gone off the rails :-/ i suspect i've followed too many of the wrong examples. i'd appreciate some help/direction getting this working. so far, i've configured

synonyms

2012-05-02 Thread Carlos Andres Garcia
Hello everbody, I have a doubt with respect to synonyms in Solr, In our company we are lookink for one solution to resolve synonyms from database and not from one text file like SynonymFilterFactory do it. The idea is save all the synonyms in the database, indexing and they will be ready to

Re: synonyms

2012-05-02 Thread Jack Krupansky
I'm not sure I completely follow, but are you simply saying that you want to have a synonym filter that reads the synonym table from a database rather than the current text file? If so, sure, you could develop a replacement for the current synonym filter which loads its table from a database,

RE: synonyms

2012-05-02 Thread Noordeen, Roxy
Another solution is to write a script to read the database and create the synonyms.txt file, dump the file to solr and reload the core. This gives you the custom synonym solution. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, May 02, 2012

RE: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.

2012-05-02 Thread Robert Petersen
I don't know if this will help but I usually add a dataDir element to each cores solrconfig.xml to point at a local data folder for the core like this: !-- Used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If

Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.

2012-05-02 Thread vybe3142
I chronicled exactly what I had to configure to slay this dragon at http://vinaybalamuru.wordpress.com/2012/04/12/solr4-tomcat-multicor/ Hope that helps -- View this message in context:

Re: Phrase Slop probelm

2012-05-02 Thread Jack Krupansky
You are missing the pf, pf2, and pf3 request parameters, which says which fields to do phrase proximity boosting on. pf boosts using the whole query as a phrase, pf2 boosts bigrams, and pf3 boost trigrams. You can use any combination of them, but if you use none of them, ps appears to be

RE: synonyms

2012-05-02 Thread Carlos Andres Garcia
Thanks for your answers, now I have another cuestions,if I develop the filter to replacement the current synonym filter,I understand that this procces would be in time of the indexing because in time of the query search there are a lot problems knows. if so, how can I do for create my index file.

Re: Solr Merge during off peak times

2012-05-02 Thread Otis Gospodnetic
Hello Prabhu, Look at SPM for Solr (URL in sig below).  It includes Index Statistics graphs, and from these graphs you can tell: * how many docs are in your index * how many docs are deleted * size of index on disk * number of index segments * number of index files * maybe something else I'm

solr broke a pipe

2012-05-02 Thread Robert Petersen
Anyone have any clues about this exception? It happened during the course of normal indexing. This is new to me (we're running solr 3.6 on tomcat 6/redhat RHEL) and we've been running smoothly for some time now until this showed up: Red Hat Enterprise Linux Server release 5.3 (Tikanga)

Re: syntax for negative query OR something

2012-05-02 Thread Chris Hostetter
: How do I search for things that have no value or a specified value? Things with no value... (*:* -fieldName:[* TO *]) Things with a specific value... fieldName:A Things with no value or a specific value... (*:* -fieldName:[* TO *]) fieldName:A ...or if you aren't using

Re: syntax for negative query OR something

2012-05-02 Thread Jack Krupansky
Sounds good. OR in the negation of any query that matches any possible value in a field. The Solr query parser doc lists the open range as you used: -field:[* TO *] finds all documents without a value for field See: http://wiki.apache.org/solr/SolrQuerySyntax This also include pure

Re: syntax for negative query OR something

2012-05-02 Thread Jack Krupansky
Oops... that is: (-fname:*) OR fname:(A B C) or (-fname:[* TO *]) OR fname:(A B C) -- Jack Krupansky -Original Message- From: Jack Krupansky Sent: Wednesday, May 02, 2012 7:48 PM To: solr-user@lucene.apache.org Subject: Re: syntax for negative query OR something Sounds good. OR

Re: syntax for negative query OR something

2012-05-02 Thread Jack Krupansky
Hmmm... I thought that worked in edismax. And I thought that pure negative queries were allowed in SolrQueryParser. Oh well. In any case, in the Lucene or Solr query parser, add *:* to select all docs before negating the docs that have any value in the field: (*:* -fname:*) OR fname:(A B C)

Re: synonyms

2012-05-02 Thread Jack Krupansky
There are lots of different strategies for dealing with synonyms, depending on what exactly is most important and what exactly your are willing to tolerate. In your latest example, you seem to be using string fields, which is somewhat different form the text synonyms we talk about in Solr.

Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory

2012-05-02 Thread Koji Sekiguchi
(12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from dataDir${solr.data.dir:./solr/data}/dataDir To dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir 2. I placed my

Re: synonyms

2012-05-02 Thread Sohail Aboobaker
I think regular sync of database table with synonym text file seems to be simplest of the solutions. It will allow you to use Solr natively without any customization and it is not very complicated operation to update synonyms file with entries in database.

Re: syntax for negative query OR something

2012-05-02 Thread Ryan McKinley
thanks! On Wed, May 2, 2012 at 4:43 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : How do I search for things that have no value or a specified value? Things with no value...        (*:* -fieldName:[* TO *]) Things with a specific value...        fieldName:A Things with no value

Re: Lucene FieldCache - Out of memory exception

2012-05-02 Thread Rahul R
Jack, Yes, the queries work fine till I hit the OOM. The fields that start with S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field definitions from schema.xml : dynamicField name=S_* type=stringindexed=true stored=true omitNorms=true/ dynamicField name=I_*