Why Jboss server is stopped due to SOLR

2011-10-27 Thread kiran.bodigam
I am trying to connect the SOLR with Java code using URLConnection, i have
deployed solr war file in jboss server(assuming server machine in some other
location or remote) its working fine if no exception raises... but if any
exception raises in server like connection failure its stopping the jboss
client(assuming client machine) where my Java code resides.


11:49:38,345 INFO  [STDOUT] [2011-10-27 11:49:38.345] class =
com.dstsystems.adc.efs.rs.util.SimplePost,method = fatal(),level = SEVERE:
,message = Connection error (is Solr running at
http://xx.yy.zzz:8080/solr/update ?): java.net.ConnectException: Connection
refused: connect
11:49:38,361 INFO  [Server] Runtime shutdown hook called, forceHalt: true
11:49:38,376 INFO  [Server] JBoss SHUTDOWN: Undeploying all packages
11:49:48,018 INFO  [TransactionManagerService] Stopping recovery manager
11:49:48,128 INFO  [Server] Shutdown complete
Shutdown complete..

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-Jboss-server-is-stopped-due-to-SOLR-tp3456903p3456903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-27 Thread Michael Kuhlmann
Am 26.10.2011 18:29, schrieb Shawn Heisey:
 For inserting, I do use a Collection of SolrInputDocuments.  The delete
 process grabs values from idx_delete, does a query like the above (the
 part that's slow in Java), then if any documents are found, issues a
 deleteByQuery with the same string.

Why do you first query for these documents? Why don't you just delete
them? Solr won't harm if no documents are affected by your delete query,
and you'll get the number of affected documents in your response anyway.

When deleting, Solrj nearly does nothing on its own, it just sends the
POST request and analyzes the simple response. The behaviour in a get
request is similar. We do thousands of update, delete and get requests
in a minute using Solrj without problems, your timing problems must come
frome somewhere else.

-Kuli


Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-27 Thread Michael Kuhlmann
Sorry, I was wrong.

Am 27.10.2011 09:36, schrieb Michael Kuhlmann:
 and you'll get the number of affected documents in your response anyway.

That's not true, you don't get the affected document count. Anyway, it's
still true that you don't need to check for documents first, at least
not when you don't need this information somewhere else.

-Kuli


Re: DisMax search

2011-10-27 Thread Ahmet Arslan
 I am searching for 9065 , so its not
 about case sensitivity. My search is
 searching across all the field names and not limiting it to
 one
 field(specified in the qf param and using deftype dismax)

By saying case sensitivity, Erik was referring def*T*ype parameter itself. (not 
the value of query)

http://wiki.apache.org/solr/CommonQueryParameters#defType


Re: Get results ordered by field content starting with specific word

2011-10-27 Thread Ahmet Arslan


--- On Wed, 10/26/11, darul daru...@gmail.com wrote:

 From: darul daru...@gmail.com
 Subject: Get results ordered by field content starting with specific word
 To: solr-user@lucene.apache.org
 Date: Wednesday, October 26, 2011, 11:36 PM
 I have seen many threads talking
 about it but not found any way on how to
 resolve it.
 
 In my schema 2 fields :
 
 
 
 Results are sorted by field2 desc like in the following
 listing when looking
 for word1 as query pattern:
 
 
 
 I would like to get Doc3 at the end because word1 is not
 at the beginning
 of the field content.
 
 Have you any idea ? 
 
 I have seen SpanNearQuery, tried FuzzySearch with no
 success etc...maybe
 making a special QueryParserPlugin, but I am lost ;)

May be you can make use of SpanFirstQuery.

http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/spans/SpanFirstQuery.html

However, I would insert an artificial token (e.g. BEGIN_OF_DOC) before indexing 
my fields. And use a phrase query or something to boost documents starts with 
word1. e.g. BEGIN_OF_DOC word1




Re: Optimization /Commit memory

2011-10-27 Thread Sujatha Arun
Thanks Simon and Jay .That was helpful .

So what we are looking at  during optimize is  2 or 3 times free Disk Space
to recreate the index.

Regards
Sujatha



On Wed, Oct 26, 2011 at 12:26 AM, Simon Willnauer 
simon.willna...@googlemail.com wrote:

 RAM costs during optimize / merge is generally low. Optimize is
 basically a merge of all segments into one, however there are
 exceptions. Lucene streams existing segments from disk and serializes
 the new segment on the fly. When you optimize or in general when you
 merge segments you need disk space for the source segments and the
 targed (merged) segment.

 If you use CompoundFileSystem (CFS) you need to additional space once
 the merge is done and your files are packed into the CFS which is
 basically the size of the target (merged) segment. Once the merge is
 done lucene can free the diskspace unless you have an IndexReader open
 that references those segments (lucene keeps track of these files and
 frees diskspace once possible).

 That said, I think you should use optimize very very rarely. Usually
 if you document collection is rarely changing optimize is useful and
 reasonable once in a while. if you collection is constantly changing
 you should rely on the merge policy to balance the number of segments
 for you in the background. Lucene 3.4 has a nice improved
 TieredMergePolicy that does a great job. (previous version are also
 good - just saying)

 A commit is basically flushing the segment you have in memory
 (IndexWriter memory) to disk. compression ratio can be up to 30% of
 the ram cost or even more depending on your data. The actual commit
 doesn't need a notable amount of memory.

 hope this helps

 simon

 On Mon, Oct 24, 2011 at 7:38 PM, Jaeger, Jay - DOT
 jay.jae...@dot.wi.gov wrote:
  I have not spent a lot of time researching it, but one would expect that
 the OS RAM requirement for optimization of an index to be minimal.
 
  My understanding is that during optimization an essentially new index is
 built.  Once complete it switches out the indexes and will throw away the
 old one.  (In Windows it may not throw away the old one until the next
 Commit).
 
  JRJ
 
  -Original Message-
  From: Sujatha Arun [mailto:suja.a...@gmail.com]
  Sent: Friday, October 21, 2011 12:10 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Optimization /Commit memory
 
  Just one more thing ,when we are talking about Optimization , we
  are referring to  HD  free space for  replicating the index  (2 or 3
 times
  the index size  ) .what is role of  RAM (OS) here?
 
  Regards
  Suajtha
 
  On Fri, Oct 21, 2011 at 10:12 AM, Sujatha Arun suja.a...@gmail.com
 wrote:
 
  Thanks that helps.
 
  Regards
  Sujatha
 
 
  On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT 
 jay.jae...@dot.wi.govwrote:
 
  Well, since the OS RAM includes the JVM RAM, that is part of your
  requirement, yes?  Aside from the JVM and normal OS requirements, all
 you
  need OS RAM for is file caching.  Thus, for updates, the OS RAM is not
 a
  major factor.  For searches, you want sufficient OS RAM to cache enough
 of
  the index to get the query performance you need, and to cache queries
 inside
  the JVM if you get a lot of repeat queries (see solrconfig.xml for the
  various caches: we have not played with them much).  So, the amount of
 RAM
  necessary for that is very much dependent upon the size of your index,
 so I
  cannot give you a simple number.
 
  You seem to believe that you have to have sufficient memory to have the
  entire index in memory.  Except where extremely high performance is
  required, I have not found that to be the case.
 
  This is just one of those your mileage may vary things.  There is not
 a
  single answer or formula that fits every situation.
 
  JRJ
 
  -Original Message-
  From: Sujatha Arun [mailto:suja.a...@gmail.com]
  Sent: Wednesday, October 19, 2011 11:58 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Optimization /Commit memory
 
  Thanks  Jay ,
 
  I was trying to compute the *OS RAM requirement*  *not JVM RAM* for a
 14
  GB
  Index [cumulative Index size of all Instances].And I put it thus -
 
  Requirement of Operating System RAM for an Index of  14GB is   - Index
  Size
  + 3 Times the  maximum Index Size of Individual Instance for Optimize .
 
  That is to say ,I have several Instances ,combined Index Size is 14GB
  .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM
 is
   14GB +3 * 2.5 GB  ~ = 22GB.
 
  Correct?
 
  Regards
  Sujatha
 
 
 
  On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT 
 jay.jae...@dot.wi.gov
  wrote:
 
   Commit does not particularly spike disk or memory usage, unless you
 are
   adding a very large number of documents between commits.  A commit
 can
  cause
   a need to merge indexes, which can increase disk space temporarily.
  An
   optimize is *likely* to merge indexes, which will usually increase
 disk
   space temporarily.
  
   How much disk space depends very much 

Re: Get results ordered by field content starting with specific word

2011-10-27 Thread darul
Well, at indexed time I can not touch because we do not have data to index
anymore.

To use SpanFirstQuery, I need to make a custom ParserQuery ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-results-ordered-by-field-content-starting-with-specific-word-tp3455754p3457167.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Get results ordered by field content starting with specific word

2011-10-27 Thread Ahmet Arslan
 Well, at indexed time I can not touch
 because we do not have data to index
 anymore.
 
 To use SpanFirstQuery, I need to make a custom ParserQuery
 ?

If re-index is not an option, then writing custom is necessary to use 
SpanFirstQuery. You need to add it as an optional clause (with high boost) your 
whole boolean query. 

Also you can try and vote SOLR-839. With it it may be possible to use 
SpanFirstQuery.

https://issues.apache.org/jira/browse/SOLR-839


Search calendar avaliability

2011-10-27 Thread Anatoli Matuskova
hello,
I want to filter search by calendar availability. For each document I know
the days which it is not available.
How could I build my fields filter the documents that are available in a
range of dates?
For example, a document A is available from 1-9-2011 to 5-9-2011 and is
available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
between)
If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would
be a match.
If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't
be a match as even the start and end are avaliables there's a gap of no
avaliability between them.
is this possible with Solr?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can dynamic fields defined by a prefix be used with LatLonType?

2011-10-27 Thread Tom Cooke
It appears that the solution to this is to ensure that the pattern for
your component field is longer than the pattern for your dynamic
parent field.  This will ensure that the component field takes
precedence.

For example *__coordinate is longer than OBJECT_LL_* so it will take
precedence.

-Original Message-
From: Tom Cooke [mailto:tom.co...@gossinteractive.com]
Sent: 26 October 2011 20:06
To: solr-user@lucene.apache.org
Subject: Can dynamic fields defined by a prefix be used with LatLonType?

Hi,



I'm adding support for lat/lon data into an existing schema which uses
prefix-based dynamic fields e.g. OBJECT_I_*.  I would like to add
OBJECT_LL_* as a dynamic field for LatLonType data but it seems that
the LatLonType always needs to add suffixes for the dynamically created
subfields which leads to a field name being generated that not only
matches the subfield suffix e.g. *_coordinate but also matches
OBJECT_LL_* leading to a clash.



Is there any way around this other than always using a suffix-based
approach to define any dynamic fields that contain LatLonType data?



Thanks,



Tom





Sign-up to our newsletter for industry best practice and thought
leadership: http://www.gossinteractive.com/newsletter

Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street,
Plymouth, PL1 1LG. Company Registration No: 3553908

This email contains proprietary information, some or all of which may be
legally privileged. It is for the intended recipient only. If an
addressing or transmission error has misdirected this email, please
notify the author by replying to this email. If you are not the intended
recipient you may not use, disclose, distribute, copy, print or rely on
this email.

Email transmission cannot be guaranteed to be secure or error free, as
information may be intercepted, corrupted, lost, destroyed, arrive late
or incomplete or contain viruses. This email and any files attached to
it have been checked with virus detection software before transmission.
You should nonetheless carry out your own virus check before opening any
attachment. GOSS Interactive Ltd accepts no liability for any loss or
damage that may be caused by software viruses.





Re: help needed on solr-uima integration

2011-10-27 Thread Koji Sekiguchi

(11/10/27 9:12), Xue-Feng Yang wrote:

Hi,

 From Solr Info page, I can see my solr-uima core is there, but 
updateRequestProcessorChain is not there. What is the reason?


Because UpdateRequestProcessor(and Chain) is not type of SolrInfoMBean.
(As those classes in the page implement SolrInfoMBean, you can see them)

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Limit by score? sort by other field

2011-10-27 Thread Robert Brown
When we display search results to our users we include a percentage 
score.


Top result being 100%, then all others normalised based on the 
maxScore, calculated outside of Solr.


We now want to limit returned docs with a percentage score higher than 
say, 50%.


e.g. We want to search but only return docs scoring above 80%, but 
want to sort by date, hence not being able to just sort by score.




Re: Search calendar avaliability

2011-10-27 Thread Per Newgro

what you is looking for is imho not releated to solr in special.
The topic should be solr as temporal database.
In your case if you have a timeline from 0 to 10 and you have two
documents from 1 to 6 and 5 to 13 you can get all documents within 0 - 10
by quering document.end = 0 and document.start = 10.
The greater or less equal depends on your definition of outside and inside
the interval. But beware the exchanged fields end and start.

Hth
Per

Am 27.10.2011 12:06, schrieb Anatoli Matuskova:

hello,
I want to filter search by calendar availability. For each document I know
the days which it is not available.
How could I build my fields filter the documents that are available in a
range of dates?
For example, a document A is available from 1-9-2011 to 5-9-2011 and is
available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
between)
If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA would
be a match.
If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA wouldn't
be a match as even the start and end are avaliables there's a gap of no
avaliability between them.
is this possible with Solr?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Search calendar avaliability

2011-10-27 Thread lee carroll
do your docs have daily availability ?
if so you could index each doc for each day (rather than have some
logic embedded in your data)

so instead of doc1 (1/9/2011 - 5/9/2011)
you have
doc1 1/9/2011
doc1 2/9/2011
doc1 3/9/2011
doc1 4/9/2011
doc1 5/9/2011

this makes search much easier and flexible. If needed you can collapse
on doc id if you need to present to the user at doc level.
or group of date even.

The problem you have is because you have logic and data in a field,
get rid of the logic and just store the data.

Cheers Lee c


On 27 October 2011 12:36, Per Newgro per.new...@gmx.ch wrote:
 what you is looking for is imho not releated to solr in special.
 The topic should be solr as temporal database.
 In your case if you have a timeline from 0 to 10 and you have two
 documents from 1 to 6 and 5 to 13 you can get all documents within 0 - 10
 by quering document.end = 0 and document.start = 10.
 The greater or less equal depends on your definition of outside and inside
 the interval. But beware the exchanged fields end and start.

 Hth
 Per

 Am 27.10.2011 12:06, schrieb Anatoli Matuskova:

 hello,
 I want to filter search by calendar availability. For each document I know
 the days which it is not available.
 How could I build my fields filter the documents that are available in a
 range of dates?
 For example, a document A is available from 1-9-2011 to 5-9-2011 and is
 available from 17-9-2011 to 22-9-2011 too (it's no available in the gap in
 between)
 If the filter query asks for avaliables from 2-9-2011 to 4-9-2011 docA
 would
 be a match.
 If the filter query for avaliables from 2-9-2011 to 20-9-2011 docA
 wouldn't
 be a match as even the start and end are avaliables there's a gap of no
 avaliability between them.
 is this possible with Solr?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457203.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: MoreLikeThis - To many hits

2011-10-27 Thread Erick Erickson
Have you tried varying mintf and mindf? Setting them higher than 1
seems like it would reduce the number of docs returned..


Best
Erick

On Tue, Oct 25, 2011 at 2:57 AM, vraa allanv...@gmail.com wrote:
 Hi

 I'm using the MoreLikeThis functionallity
 http://wiki.apache.org/solr/MoreLikeThis
 http://wiki.apache.org/solr/MoreLikeThis , and it works almost perfectly for
 my situation.

 But, i get to many hist, and mayby thats the hole idea of MoreLikeThis, but
 im gonna ask anyway.

 My query looks like this:
 /select/?q=id:11mlt=truemlt.match.include=truemlt.fl=make,model,variantmlt.mindf=1mlt.mintf=1fl=id,score,make,model,variant
 .
 The id is a Lamborghini. There are only 8 lamborghinis in my database and
 still i get a lot more hits.
 Is it possible to make it so Solr only return 8 results in this query? Which
 means solr must interpret the query so that there must be a hit on all of
 the mlt.fl. If not, then remove the last of the mlt.fl (variant) and try
 again. If no hits then remove model and so forth.

 Does it make sense?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/MoreLikeThis-To-many-hits-tp3450632p3450632.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-27 Thread Michael Sokolov
From everything you've said, it certainly sounds like a low-level I/O 
problem in the client, not a server slowdown of any sort.  Maybe Perl is 
using the same connection over and over (keep-alive) and Java is not.  I 
really don't know.  One thing I've heard is that 
StreamingUpdateSolrServer (I think that's what it's called) can give 
better throughput for large request batches.  If you're not using that, 
you may be having problems w/closing and re-opening connections?


-Mike

On 10/26/2011 9:56 PM, Shawn Heisey wrote:

On 10/26/2011 6:16 PM, Michael Sokolov wrote:
Have you checked to see when you are committing?  Is the pattern the 
same in both instances?  If you are committing after each delete 
request in Java, but not in Perl, that could slow things down.


Due to the multihreading of delete requests, I now have the full 
delete down to 10-15 seconds instead of a minute or more.  This is now 
an acceptable time, but I am completely mystified as to why the Pelr 
code can do it without multithreading just as fast, and often faster.  
The Java code is long-running, and the Perl code is started by cron.  
If you look back to the first message on the thread, you'll see commit 
messages in the Perl log, but those commits are done with the wait 
options set to false.  That's an extra step the Java code isn't doing 
- and it's STILL faster.




Regarding Solr Query

2011-10-27 Thread Sahoo, Jayanta
I have one query regarding solr search.I have one key words like wireleess 
mobilty kit i need to search,I am not able to get when i am doing the 
search.BUt when i have manually added in synonyms.txt file like[wirelss, 
wireless access.etc] i am able to search the product related to this 
.Please help me out without giving any input in sysnonyms.txt how i able to 
search?

Hints: already wireleess mobilty kit product already indexed in solr when i 
am searching.
Is it checked Synonyms.txt when search is happening.

Please let me know any solution for this ASAP,its an urgent requirement for me

Regards,
Jayanta



Re: Queries suggestion (not the suggester :P)

2011-10-27 Thread Erick Erickson
I've seen something like this done with an index of queries. That is, you
index actual user queries in some new core where each document is
a query. Then you issue the terms of the new query against this index
and get back similar documents (that are really queries). You'll want
to take some care about what is actually indexed, but that's an
exercise for the reader.

You might wind up using edismax as your request handler in order to
use some of the tuning parameters for how tight/loose you want your
responses to be or maybe just ORing the terms together and counting
on the ranking will be OK.

Best
Erick

On Tue, Oct 25, 2011 at 6:21 AM, Simone Tripodi
simonetrip...@apache.org wrote:
 Hi all guys,
 I'm working on a search service that uses solr as search engine and
 the results are provided in the Atom form, containing some OpenSearch
 tags.

 What I'm interested to understand is if it is possible, via solr,
 having in the response some suggestions to other queries in order to
 enrich our opensearch info, i.e. a user submits `General Motors annual
 report` and solr answers the results with information to form a
 `General Motors annual report 2005` subset or a `General Motors`
 superset, so the replu can be transformed to:

   opensearch:Query role=request searchTerms=General Motors annual
 report /
    opensearch:Query role=subset searchTerms=General Motors annual
 report 2005
    opensearch:Query role=superset searchTerms=General Motors /

 So my question is: is this possible? And if yes... how? :)

 Many thanks in advance, every suggestion would be really appreciated!
 Have a nice day, all the best,
 Simo

 http://people.apache.org/~simonetripodi/
 http://simonetripodi.livejournal.com/
 http://twitter.com/simonetripodi
 http://www.99soft.org/



Re: Search for the single hash # character never returns results

2011-10-27 Thread Erick Erickson
Take a look at your admin/analysis page and put your tokens in for both
index and query times. What I think you'll see is that the # is being
stripped at query time due to the first PatternReplaceFilterFactory.

You probably want to split your analyzers into an index-time and query-time
pair and do the appropriate replacements to keep # at quer time.


Best
Erick

On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley
daniel.brad...@adfero.co.uk wrote:
 When running a search such as:
  field_name:#
  field_name:#
  field_name:\#

 where there is a record with the value of exactly #, solr returns 0 rows.

 The workaround we are having to use is to use a range query on the
 field such as:
  field_name:[# TO #]
 and this returns the correct documents.

 Use case details:
 We have a field that indexes a text field and calculates a letter
 group. This keeps only the first significant character from a value
 (number or letter), and if it is a number the simply stores # as we
 want all numbered items grouped together.

 I'm also aware that we could also fix this by using a specific number
 instead of the hash character, however, I though I'd raise this to see
 if there is a wider issue. I've listed some specific details below.

 Thanks for your time,

 Daniel Bradley


 Field definition:
    fieldType name=letterGrouping class=solr.TextField
 sortMissingLast=true omitNorms=true
      analyzer
        tokenizer class=solr.PatternTokenizerFactory
 pattern=^([a-zA-Z0-9]).* group=1/
        filter class=solr.LowerCaseFilterFactory /
        filter class=solr.TrimFilterFactory /
        filter class=solr.PatternReplaceFilterFactory
                pattern=([^a-z0-9]) replacement= replace=all
        /
        filter class=solr.PatternReplaceFilterFactory
                pattern=([0-9]) replacement=# replace=all
        /
      /analyzer
    /fieldType

 Server information:
 Solr Specification Version: 3.2.0
 Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
 Lucene Specification Version: 3.2.0
 Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57



Re: DisMax and WordDelimiterFilterFactory

2011-10-27 Thread Erick Erickson
What happens if you change your WDDF definition in the query part of
your analysis
chain to NOT split on case change? Then your index should contain the right
fragments (and combined words) and your queries would match.

I admit I haven't thought this through entirely, but this would work
for your example I
think. Unfortunately I suspect it would break other cases I
suspect you're in a
lesser of two evils situation.

But I can't imagine a 100% solution here. You're effectively asking to
compensate for
any fat-fingered thing a user does. Impossible I think...

Best
Erick

On Tue, Oct 25, 2011 at 1:13 PM, Demian Katz demian.k...@villanova.edu wrote:
 I've seen a couple of threads related to this subject (for example, 
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg33400.html), but I 
 haven't found an answer that addresses the aspect of the problem that 
 concerns me...

 I have a field type set up like this:

    fieldType name=text class=solr.TextField positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.ICUTokenizerFactory/
        filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1/
        filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
        filter class=solr.ICUFoldingFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
        filter class=solr.SnowballPorterFilterFactory language=English/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
      analyzer type=query
        tokenizer class=solr.ICUTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
        filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 /
        filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
        filter class=solr.ICUFoldingFilterFactory/
        filter class=solr.KeywordMarkerFilterFactory 
 protected=protwords.txt/
        filter class=solr.SnowballPorterFilterFactory language=English/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

 The important feature here is the use of WordDelimiterFilterFactory, which 
 allows a search for WiFi to match an indexed term of wi fi (for example).

 The problem, of course, is that if a user accidentally introduces a case 
 change in their query, the query analyzer chain breaks it into multiple words 
 and no hits are found...  so a search for exaMple will look for exa mple 
 and fail.

 I've found two solutions that resolve this problem in the admin panel field 
 analysis tool:


 1.)    Turn on catenateWords and catenateNumbers in the query analyzer - this 
 reassembles the user's broken word and allows a match.

 2.)    Turn on preserveOriginal in the query analyzer - this passes through 
 the user's original query, which then gets cleaned up bythe 
 ICUFoldingFilterFactory and allows a match.

 The problem is that in my real-world application, which uses DisMax, neither 
 of these solutions work.  It appears that even though (if I understand 
 correctly) the WordDelimiterFilterFactory is returning ALTERNATIVE tokens, 
 the DisMax handler is combining them a way that requires all of them to match 
 in an inappropriate way...  for example, here's partial debugQuery output for 
 the exaMple search using Dismax and solution #2 above:

    parsedquery:+DisjunctionMaxQuery((genre:\(exampl exa) mple\^300.0 | 
 title_new:\(exampl exa) mple\^100.0 | topic:\(exampl exa) mple\^500.0 | 
 series:\(exampl exa) mple\^50.0 | title_full_unstemmed:\(example exa) 
 mple\^600.0 | geographic:\(exampl exa) mple\^300.0 | contents:\(exampl 
 exa) mple\^10.0 | fulltext_unstemmed:\(example exa) mple\^10.0 | 
 allfields_unstemmed:\(example exa) mple\^10.0 | title_alt:\(exampl exa) 
 mple\^200.0 | series2:\(exampl exa) mple\^30.0 | title_short:\(exampl 
 exa) mple\^750.0 | author:\(example exa) mple\^300.0 | title:\(exampl 
 exa) mple\^500.0 | topic_unstemmed:\(example exa) mple\^550.0 | 
 allfields:\(exampl exa) mple\ | author_fuller:\(example exa) mple\^150.0 
 | title_full:\(exampl exa) mple\^400.0 | fulltext:\(exampl exa) mple\)) 
 (),

 Obviously, that is not what I want - ideally it would be something like 
 'exampl OR ex ample'.

 I also read about the autoGeneratePhraseQueries setting, but that seems to 
 take things way too far in the opposite direction - if I set that to false, 
 then I get matches for any individual token; i.e. example OR ex OR ample - 
 not good at all!

 I have a sinking suspicion that there is not an easy solution to my problem, 
 but this seems to be a fairly basic need; splitOnCaseChange is a useful 
 feature to have, but it's more 

Re: solr.PatternReplaceFilterFactory AND endoffset

2011-10-27 Thread Erick Erickson
What does your admin/analysis page show? And how about the
results with debugQuery=on?

Best
Erick

On Wed, Oct 26, 2011 at 5:34 AM, roySolr royrutten1...@gmail.com wrote:
 Hi,

 I have some problems with the patternreplaceFilter. I can't use the
 worddelimiter because i only want to replace some special chars given by
 myself.

 Some example:

 Tottemham-hotspur (london)
 Arsenal (london)

 I want this:
 replace - with  
 ( OR ) with .

 In the analytics i see this:

 position        1
 term text       tottemham hotspur london
 startOffset     0
 endOffset       26

 So the replacefilter works. Now i want to search tottemham hotspur london.
 This gives no results.

 position        1
 term text       tottemham hotspur london
 startOffset     0
 endOffset       24

 It works when i search for tottemham-hotspur (london).
 I think the problem is the difference in offset(24 vs 26). I need some
 help...



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-PatternReplaceFilterFactory-AND-endoffset-tp3454049p3454049.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Faceting on multiple fields, with multiple where clauses

2011-10-27 Thread Rubinho
hi,

I have the following situation:
- A dropdownlist to search trips by Country
- A dropdownlist to search trips by departureperiod (range/month)

I want to have facetresults on these fields.
When i select a value in 1 of the dropdownlists, i receive the correct
numbers (facets)
If Country = Belgium, then i receive the original number of trips per
country and the number of trips per departuredate for Belgium.

But, when i combine this search with a country and a departureperiod, then i
expect to receive:
- the number of trips per country in the selected departureperiod (for first
dropdownlist)
AND
- the number of trips in the selected country (for second dropdownlist)

But for some reason, i can't get the correct values when i combine these 2
filters.
I receive the correct number of trips/period, but the countries aren't
filtered by this period anymore.
Can somebody explain me what i'm doing wrong?

This is the query for the combined search:
http://localhost:8080/solr/select/?facet=truefacet.date={!ex=SD}StartDatef.StartDate.facet.date.start=2011-10-1T00:00:00Zf.StartDate.facet.date.end=2012-09-30T00:00:00Zfacet.field={!ex=CC}CountryCodef.StartDate.facet.date.gap=%2B1MONTHrows=0version=2.2q={!tag=CC}CountryCode:IDq={!tag=SD}StartDate:[2011-11-01T00:00:00Z
TO 2011-11-30T00:00:00Z]


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Faceting on multiple fields, with multiple where clauses

2011-10-27 Thread Erik Hatcher
You've got two q parameters.  For filtering on facet values, you're better off 
using fq parameters instead (and if there is no other query, set q=*:*, or if 
using dismax set q.alt=*:* and leave q empty/unspecified).  Only one q 
parameter is used, but any number of fq parameters may be specified.

Erik

On Oct 27, 2011, at 08:09 , Rubinho wrote:

 hi,
 
 I have the following situation:
 - A dropdownlist to search trips by Country
 - A dropdownlist to search trips by departureperiod (range/month)
 
 I want to have facetresults on these fields.
 When i select a value in 1 of the dropdownlists, i receive the correct
 numbers (facets)
 If Country = Belgium, then i receive the original number of trips per
 country and the number of trips per departuredate for Belgium.
 
 But, when i combine this search with a country and a departureperiod, then i
 expect to receive:
 - the number of trips per country in the selected departureperiod (for first
 dropdownlist)
 AND
 - the number of trips in the selected country (for second dropdownlist)
 
 But for some reason, i can't get the correct values when i combine these 2
 filters.
 I receive the correct number of trips/period, but the countries aren't
 filtered by this period anymore.
 Can somebody explain me what i'm doing wrong?
 
 This is the query for the combined search:
 http://localhost:8080/solr/select/?facet=truefacet.date={!ex=SD}StartDatef.StartDate.facet.date.start=2011-10-1T00:00:00Zf.StartDate.facet.date.end=2012-09-30T00:00:00Zfacet.field={!ex=CC}CountryCodef.StartDate.facet.date.gap=%2B1MONTHrows=0version=2.2q={!tag=CC}CountryCode:IDq={!tag=SD}StartDate:[2011-11-01T00:00:00Z
 TO 2011-11-30T00:00:00Z]
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457432.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Search for the single hash # character never returns results

2011-10-27 Thread Daniel Bradley
Fantastic, thanks, yes I completely overlooked that case, separating the
analysers worked a treat.

Had also posted on stack overflow but the mailing list proved to be
superior!

Many thanks,

Daniel

On 27 October 2011 13:09, Erick Erickson erickerick...@gmail.com wrote:

 Take a look at your admin/analysis page and put your tokens in for both
 index and query times. What I think you'll see is that the # is being
 stripped at query time due to the first PatternReplaceFilterFactory.

 You probably want to split your analyzers into an index-time and query-time
 pair and do the appropriate replacements to keep # at quer time.


 Best
 Erick

 On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley
 daniel.brad...@adfero.co.uk wrote:
  When running a search such as:
   field_name:#
   field_name:#
   field_name:\#
 
  where there is a record with the value of exactly #, solr returns 0
 rows.
 
  The workaround we are having to use is to use a range query on the
  field such as:
   field_name:[# TO #]
  and this returns the correct documents.
 
  Use case details:
  We have a field that indexes a text field and calculates a letter
  group. This keeps only the first significant character from a value
  (number or letter), and if it is a number the simply stores # as we
  want all numbered items grouped together.
 
  I'm also aware that we could also fix this by using a specific number
  instead of the hash character, however, I though I'd raise this to see
  if there is a wider issue. I've listed some specific details below.
 
  Thanks for your time,
 
  Daniel Bradley
 
 
  Field definition:
 fieldType name=letterGrouping class=solr.TextField
  sortMissingLast=true omitNorms=true
   analyzer
 tokenizer class=solr.PatternTokenizerFactory
  pattern=^([a-zA-Z0-9]).* group=1/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z0-9]) replacement= replace=all
 /
 filter class=solr.PatternReplaceFilterFactory
 pattern=([0-9]) replacement=# replace=all
 /
   /analyzer
 /fieldType
 
  Server information:
  Solr Specification Version: 3.2.0
  Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
  Lucene Specification Version: 3.2.0
  Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
  



Re: Faceting on multiple fields, with multiple where clauses

2011-10-27 Thread Rubinho
Hi Erik,

Thank you very much.
Your hint did solve the problem.

Acutally, i don't understand why (i read the difference between Q and QF,
but it's still not clear to me why it did'nt work with Q).

But it's solved, that's the most important :)


Thanks,
Ruben

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457569.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Upgratding the Index from 1.4.1 to 3.4 using replication

2011-10-27 Thread Tommaso Teofili
I don't think it'll work as I've tried this approach myself and the blocking
issue was that Solr 1.4.1 use a different javabin version than Solr 3.4 (I
think it's 1 vs 2) so the master and the slave(s) can't communicate using
standard replication handler and thus can't exchange information and data
about the index.
My 2 cents.
Tommaso

2011/10/26 Jaeger, Jay - DOT jay.jae...@dot.wi.gov

 I very much doubt that would work:  different versions of Lucene involved,
 and Solr replication does just a streamed file copy, nothing fancy.

 JRJ

 -Original Message-
 From: Nemani, Raj [mailto:raj.nem...@turner.com]
 Sent: Wednesday, October 26, 2011 12:55 PM
 To: solr-user@lucene.apache.org
 Subject: Upgratding the Index from 1.4.1 to 3.4 using replication

 All,



 We are planning to upgrade our Solr instance from 1.4.1 to 3.4.  We
 understand that we need to re-index all the documents given the changes
 to the index structure.  If we setup a replication pipe with 1.4.1 as
 the Master and 3.4 as the salve (with an empty index) is there would the
 replication process convert the index from 1.4.1 format to 3.4 format?



 Thanks so much in advance for your time and help.

 Raj






RE: DisMax and WordDelimiterFilterFactory (limitations of MultiPhraseQuery)

2011-10-27 Thread Demian Katz
If we change the query chain to not split on case change, then we lose half the 
benefit of that feature -- if a user types WiFi and the source record 
contains wi fi, we fail to get a hit.  As you say, that may be worth 
considering if it comes down to picking the lesser evil, but I still think 
there should be a complete solution to my problem -- I'm not trying to 
compensate for every fat-fingered user behavior... just one specific one!

Ultimately, I think my problem relates to this note from the documentation 
about using phrases in the SynonymFilterFactory:

Phrase searching (ie: sea biscit) will cause the QueryParser to pass the 
entire string to the analyzer, but if the SynonymFilter is configured to expand 
the synonyms, then when the QueryParser gets the resulting list of tokens back 
from the Analyzer, it will construct a MultiPhraseQuery that will not have the 
desired effect. This is because of the limited mechanism available for the 
Analyzer to indicate that two terms occupy the same position: there is no way 
to indicate that a phrase occupies the same position as a term. For our 
example the resulting MultiPhraseQuery would be (sea | sea | seabiscuit) 
(biscuit | biscit) which would not match the simple case of seabiscuit 
occuring in a document.

So I suppose I'm just running up against a fundamental limitation of Solr...  
but this seems like a fundamental limitation that might be worth overcoming -- 
I'm sure my use case is not the only one where this could matter.  Has anyone 
given this any thought?

- Demian

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, October 27, 2011 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: Re: DisMax and WordDelimiterFilterFactory
 
 What happens if you change your WDDF definition in the query part of
 your analysis
 chain to NOT split on case change? Then your index should contain the
 right
 fragments (and combined words) and your queries would match.
 
 I admit I haven't thought this through entirely, but this would work
 for your example I
 think. Unfortunately I suspect it would break other cases I
 suspect you're in a
 lesser of two evils situation.
 
 But I can't imagine a 100% solution here. You're effectively asking to
 compensate for
 any fat-fingered thing a user does. Impossible I think...
 
 Best
 Erick
 
 On Tue, Oct 25, 2011 at 1:13 PM, Demian Katz
 demian.k...@villanova.edu wrote:
  I've seen a couple of threads related to this subject (for example,
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg33400.html),
 but I haven't found an answer that addresses the aspect of the problem
 that concerns me...
 
  I have a field type set up like this:
 
     fieldType name=text class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
         tokenizer class=solr.ICUTokenizerFactory/
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
         filter class=solr.ICUFoldingFilterFactory/
         filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
         filter class=solr.SnowballPorterFilterFactory
 language=English/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
       analyzer type=query
         tokenizer class=solr.ICUTokenizerFactory/
         filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 /
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
         filter class=solr.ICUFoldingFilterFactory/
         filter class=solr.KeywordMarkerFilterFactory
 protected=protwords.txt/
         filter class=solr.SnowballPorterFilterFactory
 language=English/
         filter class=solr.RemoveDuplicatesTokenFilterFactory/
       /analyzer
     /fieldType
 
  The important feature here is the use of WordDelimiterFilterFactory,
 which allows a search for WiFi to match an indexed term of wi fi
 (for example).
 
  The problem, of course, is that if a user accidentally introduces a
 case change in their query, the query analyzer chain breaks it into
 multiple words and no hits are found...  so a search for exaMple will
 look for exa mple and fail.
 
  I've found two solutions that resolve this problem in the admin panel
 field analysis tool:
 
 
  1.)    Turn on catenateWords and catenateNumbers in the query
 analyzer - this reassembles the user's broken word and allows a match.
 
  2.)    Turn on preserveOriginal in the query analyzer - this passes
 through the user's original query, which then gets cleaned up bythe
 

Re: Regarding Solr Query

2011-10-27 Thread Alireza Salimi
Can you explain more what's the fieldType, what's the actual content of the
field in the document.
Why are you trying to use synonyms?

Regards

On Thu, Oct 27, 2011 at 7:55 AM, Sahoo, Jayanta jayanta.sa...@hp.comwrote:

 I have one query regarding solr search.I have one key words like wireleess
 mobilty kit i need to search,I am not able to get when i am doing the
 search.BUt when i have manually added in synonyms.txt file like[wirelss,
 wireless access.etc] i am able to search the product related to this
 .Please help me out without giving any input in sysnonyms.txt how i able to
 search?

 Hints: already wireleess mobilty kit product already indexed in solr when
 i am searching.
 Is it checked Synonyms.txt when search is happening.

 Please let me know any solution for this ASAP,its an urgent requirement for
 me

 Regards,
 Jayanta




-- 
Alireza Salimi
Java EE Developer


RE: Difficulties Installing Solr with Jetty 7.x

2011-10-27 Thread Jaeger, Jay - DOT
OK, so it sounds like the index.jsp welcome page setting is not the issue.  
That is not a big surprise.  (WebSphere does not have that as a global default, 
but Jetty 6 certainly did, and it looks like Jetty 7 does as well).

BTW, that should be  /solr/admin/index.jsp, as I indicated, not 
/solr/admin.index/jsp as appears in your message.  I am guessing that was just 
a typo.

If that was a typo, then your issue has nothing to do with Solr proper, but 
instead probably means that the WAR was not properly installed into Jetty or 
that the release of Jetty you snagged itself has issues.

All you *should* need to do is copy the solr.war file from the solr 
distribution into $JETTY_HOME/webapps and (perhaps) restart Jetty.  But it 
sounds like you did that, and your logs indicate that you did that.

As an aside, the web.xml originates in the WAR, and once that is deployed in 
Jetty it lives in  $JETTY_HOME/work/Jetty.solr.war.../webapp/WEB-INF .  
At least in Jetty 6.  In your case, in your logs, I saw:

 2011-10-25 16:44:51.564:INFO:oejw.WebInfConfiguration:Extract
 jar:file:/var/jetty/webapps/solr.war!/ to
 /tmp/jetty-0.0.0.0-8080-solr.war-_-any-/webappexpanded 
 WAR lives here.

So it would live under the .../webapp directory as shown above.   That 
exclamation point puzzles me a little, but doesn't seem to be a real issue.

You should see, in the above .../webapp directory a file index.jsp, a folder 
admin, and a file admin/index.jsp, among other things.  If those are not 
present, then Jetty was unable to properly extract the WAR (that seems 
unlikely, but worth checking).  If they are there, then Jetty ought to be able 
to find them.

My complete WAG is that the fix will lie somewhere in the contexts/ 
directory.

Well, Jetty 6 has no such directory that I can see.  So maybe that is something 
new to Jetty 7, and it doesn't automatically take a root context from an 
application from the WAR file.  Could be.  However, in reading 
http://wiki.eclipse.org/Jetty/Howto/Deploy_Web_Applications, it looks like it 
ought to work the same as it ever did.  I do note, however, that if you 
pre-created a directory named solr/, that could mess things up.  But a simple 
touch solr.war followed by a restart of Jetty ought to cause Jetty to 
redeploy the application.

In any event, it looks to me like you might do better to post your question to 
the Jetty folks - if you can't get to /solr/index.jsp or /solr/admin/index.jsp, 
and get a 404, then that points to a Jetty related issue.

JRJ

-Original Message-
From: Scott Vanderbilt [mailto:li...@datagenic.com]
Sent: Wednesday, October 26, 2011 5:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Difficulties Installing Solr with Jetty 7.x

Jay:

Thanks for the response.

$JETTY_HOME/etc/webdefault.xml is the unmodified file that came with
Jetty, and it has a welcome-file-list referencing index.jsp,
index.html, and index.htm.

Attempting to load /solr/admin.index.jsp generates a 404. All other URLs
generate a 404 also, except /, which returns the Jetty test app home
page. Not sure if this is useful, but that page contains the following info:

   This webapp is deployed in $JETTY_HOME/webapp/test and configured
   by $JETTY_HOME/contexts/test.xml

You refer to Solr's web.xml. I have no such file, or any other config
files which are Solr-specific, so far as I can tell. I followed the Solr
wiki page instructions http://wiki.apache.org/solr/SolrInstall, so
apart from copying the solr.war into $JETTY_HOME/webapps/, the only
other thing I copied over from the Solr example distribution was the
directory apache-solr-3.4.0/example/solr/ as $JETTY_HOME/solr/.

My complete WAG is that the fix will lie somewhere in the contexts/
directory. I really see no other place to do Solr-specific configuration
apart from $JETTY_HOME/etc/, and my intuition is that these files
shouldn't be messed with unless the intention is to affect global
container-wide behavior. Which I don't. I'm only trying to get Solr
running. I may want to run other apps, so I'd rather leave Jetty's
config files as is.



On 10/26/2011 2:05 PM, Jaeger, Jay - DOT wrote:
 ERRATA, that should the the *SOLR* web.xml (not the Jetty web.xml)

 Sorry for the confusion.

 -Original Message-
 From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
 Sent: Wednesday, October 26, 2011 4:02 PM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Difficulties Installing Solr with Jetty 7.x

 From your logs, it looks like the Solr library is being found just fine, and 
 that the servlet is initing OK.
 X-Spam-Status: No, hits=0.00 required=0.90

 Does your Jetty configuration specify index.jsp in a welcome list?

 We had that problem in WebSphere:  we got 404's the same way, and the cure 
 was to modify the Jetty web.xml to include:

 welcome-file-list
welcome-fileindex.jsp/welcome-file
 /welcome-file-list

 In our Solr web.xml, and submitted a JIRA on the issue (I don't have the 
 number 

Re: Limit by score? sort by other field

2011-10-27 Thread karsten-solr
Hi Robert,

take a look to
http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117
and
http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html

So will
sort=date+descq={!frange l=0.85}query($qq)
qq=the original relevancy query
help?


Best regards
  Karsten 

 Original-Nachricht 
 Datum: Thu, 27 Oct 2011 12:30:31 +0100
 Von: Robert Brown r...@intelcompute.com
 An: solr-user@lucene.apache.org
 Betreff: Limit by score? sort by other field

 When we display search results to our users we include a percentage 
 score.
 
 Top result being 100%, then all others normalised based on the 
 maxScore, calculated outside of Solr.
 
 We now want to limit returned docs with a percentage score higher than 
 say, 50%.
 
 e.g. We want to search but only return docs scoring above 80%, but 
 want to sort by date, hence not being able to just sort by score.
 


Re: Limit by score? sort by other field

2011-10-27 Thread Robert Stewart
Sounds like a custom sorting collector would work - one that throws away docs 
with less than some minimum score, so that it only collects/sorts documents 
with some minimum score.  AFAIK score is calculated even if you sort by some 
other field.

On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote:

 Hi Robert,
 
 take a look to
 http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117
 and
 http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html
 
 So will
 sort=date+descq={!frange l=0.85}query($qq)
 qq=the original relevancy query
 help?
 
 
 Best regards
  Karsten 
 
  Original-Nachricht 
 Datum: Thu, 27 Oct 2011 12:30:31 +0100
 Von: Robert Brown r...@intelcompute.com
 An: solr-user@lucene.apache.org
 Betreff: Limit by score? sort by other field
 
 When we display search results to our users we include a percentage 
 score.
 
 Top result being 100%, then all others normalised based on the 
 maxScore, calculated outside of Solr.
 
 We now want to limit returned docs with a percentage score higher than 
 say, 50%.
 
 e.g. We want to search but only return docs scoring above 80%, but 
 want to sort by date, hence not being able to just sort by score.
 



Re: Limit by score? sort by other field

2011-10-27 Thread Robert Stewart
BTW, this would be good standard feature for SOLR, as I've run into this 
requirement more than once.


On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote:

 Hi Robert,
 
 take a look to
 http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117
 and
 http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html
 
 So will
 sort=date+descq={!frange l=0.85}query($qq)
 qq=the original relevancy query
 help?
 
 
 Best regards
  Karsten 
 
  Original-Nachricht 
 Datum: Thu, 27 Oct 2011 12:30:31 +0100
 Von: Robert Brown r...@intelcompute.com
 An: solr-user@lucene.apache.org
 Betreff: Limit by score? sort by other field
 
 When we display search results to our users we include a percentage 
 score.
 
 Top result being 100%, then all others normalised based on the 
 maxScore, calculated outside of Solr.
 
 We now want to limit returned docs with a percentage score higher than 
 say, 50%.
 
 e.g. We want to search but only return docs scoring above 80%, but 
 want to sort by date, hence not being able to just sort by score.
 



Re: Search calendar avaliability

2011-10-27 Thread Anatoli Matuskova
I don't like the idea of indexing a doc per each value, the dataset can grow
a lot. I have thought that something like this could work:
At indexing time, if I know the dates of no avaliability, I could gather the
avaliability ones (will consider unknown as available). So, I index 4 fields
aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are
multiValued)
If the user ask for avaliability from $start to $end I filter like:

fq=aval_yes_start:[$start TO $end]fq=aval_yes_end:[$start TO
$end]fq=*-*aval_no_start:[$start TO $end]fq=*-*aval_no_end:[$start TO
$end]

This way I make sure start date is available, end dates too and there are no
unavaliable gaps in between.
As I save ranges and no concrete days the number of multiValued shouldn't
grow a lot and using trie fields I think these range queries should be fast.

Any better idea?
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search calendar avaliability

2011-10-27 Thread Ted Dunning
On Thu, Oct 27, 2011 at 7:13 AM, Anatoli Matuskova 
anatoli.matusk...@gmail.com wrote:

 I don't like the idea of indexing a doc per each value, the dataset can
 grow
 a lot.


What does a lot mean?  How high is the sky?

A million people with 3 year schedules is a billion tiny documents.

That doesn't sound like such an enormous number.


 I have thought that something like this could work:
 At indexing time, if I know the dates of no avaliability, I could gather
 the
 avaliability ones (will consider unknown as available). So, I index 4
 fields
 aval_yes_start, aval_yes_end, aval_no_start, aval_no_end (all are
 multiValued)
 If the user ask for avaliability from $start to $end I filter like:

 fq=aval_yes_start:[$start TO $end]fq=aval_yes_end:[$start TO
 $end]fq=*-*aval_no_start:[$start TO $end]fq=*-*aval_no_end:[$start TO
 $end]


This can be done.  And given that you want long stretches of availability,
but what happens when a reservation is canceled?  You have to coalesce
intervals.  That isn't impossible, but it is a pain.

Would this count as premature optimization?

Simply retrieving days in the range and counting gets the right answer a bit
more simply.  Additions and deletions and modifications all work.

If you want to drive down to a resolution of seconds, the document time slot
model doesn't work.  But for days, it probably does.


Re: Search calendar avaliability

2011-10-27 Thread Anatoli Matuskova
 What does a lot mean?  How high is the sky? 
If I have 3 milion docs I would end up with 3 milion * days avaliable

 This can be done.  And given that you want long stretches of availability, 
 but what happens when a reservation is canceled?  You have to coalesce 
 intervals.  That isn't impossible, but it is a pain. 

 Would this count as premature optimization? 

I always build the index from scratch indexing from an external datasource,
getting the avaliability from there (and all the other data from a document)

 If you want to drive down to a resolution of seconds, the document time
 slot 
 model doesn't work.  But for days, it probably does. 

yes, the avaliability is defined per days, not per seconds.

I'm trying to find the way to make this perform as better as possible.
I've found this and it's interesting too:
https://issues.apache.org/jira/browse/SOLR-1913
But the only way I see to use it is generate dinamic fields per month and
filter using them. The problem here is that for each month I want to filter
a search request, I would have to load a FieldCache.getInts and will quickly
run OOM.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-calendar-avaliability-tp3457203p3457899.html
Sent from the Solr - User mailing list archive at Nabble.com.


How can I force the threshold for a fuzzy query?

2011-10-27 Thread Gustavo Falco
Hi guys,

I'm new to Solr (as you may guess for the subject). I'd like to force the
threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries are
expensive, but limiting it's threshold to a number near 1 would help.

So my question is: Is this possible to configure in some of the xml
configuration files? and if that's so, if I use this query:

myField:myQuery~0.2

Would Solr use the configured threshold instead, preventing indeed that
anyone force a minor value than what I've set in the xml file? Would it help
for what I want to do?



Thanks in advance!


Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-27 Thread Shawn Heisey

On 10/27/2011 1:36 AM, Michael Kuhlmann wrote:
Why do you first query for these documents? Why don't you just delete 
them? Solr won't harm if no documents are affected by your delete 
query, and you'll get the number of affected documents in your 
response anyway. When deleting, Solrj nearly does nothing on its own, 
it just sends the POST request and analyzes the simple response. The 
behaviour in a get request is similar. We do thousands of update, 
delete and get requests in a minute using Solrj without problems, your 
timing problems must come frome somewhere else. -Kuli 


When you do a delete blind, you have to follow it up with a commit.  On 
my larger shards containing data older than approximately one week, a 
commit is resource intensive and takes 10 to 30 seconds.  As much as 75% 
of the time, there are no updates to my larger shards (10.7 million 
records each), most of the activity happens on the small shard with the 
newest data (usually under 50 records), which I call the 
incremental.  On almost every update run, there are changes to the 
incremental, but doing a commit on that shard rarely takes more than a 
second or two.


The long commit times on the larger indexes is a result of cache 
warming, and almost all of the time is spent warming the filter cache.  
The answer to the next obvious question: autowarmCount=4 on that cache, 
with a maximum size of 64.  We are working as fast as we can on reducing 
the complexity and size of our filter queries.  It will require 
significant changes in our application.


Thanks,
Shawn



Re: Limit by score? sort by other field

2011-10-27 Thread Jason Toy
I have a similar problem except I need to filter scores that are too high. 


Robert Stewart bstewart...@gmail.com 於 Oct 27, 2011 7:04 AM 寫道:

 BTW, this would be good standard feature for SOLR, as I've run into this 
 requirement more than once.
 
 
 On Oct 27, 2011, at 9:49 AM, karsten-s...@gmx.de wrote:
 
 Hi Robert,
 
 take a look to
 http://lucene.472066.n3.nabble.com/How-to-cut-off-hits-with-score-below-threshold-td3219064.html#a3219117
 and
 http://lucene.472066.n3.nabble.com/Filter-by-relevance-td1837486.html
 
 So will
 sort=date+descq={!frange l=0.85}query($qq)
 qq=the original relevancy query
 help?
 
 
 Best regards
 Karsten 
 
  Original-Nachricht 
 Datum: Thu, 27 Oct 2011 12:30:31 +0100
 Von: Robert Brown r...@intelcompute.com
 An: solr-user@lucene.apache.org
 Betreff: Limit by score? sort by other field
 
 When we display search results to our users we include a percentage 
 score.
 
 Top result being 100%, then all others normalised based on the 
 maxScore, calculated outside of Solr.
 
 We now want to limit returned docs with a percentage score higher than 
 say, 50%.
 
 e.g. We want to search but only return docs scoring above 80%, but 
 want to sort by date, hence not being able to just sort by score.
 
 


Re: DisMax search

2011-10-27 Thread jyn7
Sorry my bad :(. Thanks for the help. It worked. I completely overlooked the
defType.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DisMax-search-tp3455671p3458454.html
Sent from the Solr - User mailing list archive at Nabble.com.


bbox issue

2011-10-27 Thread Christopher Gross
I'm using the geohash field to store points for my data.  When I do a
bounding box like:

localhost:8080/solr/select?q=point:[-45,-80%20TO%20-24,-39]

I get a data point that falls outside the box: (-73.03358 -50.46815)

The Spatial Search (http://wiki.apache.org/solr/SpatialSearch) page says:
Exact distance calculations can be somewhat expensive and it can often
make sense to use a quick approximation instead. The bbox filter is
guaranteed to encompass all of the points of interest, but it may also
include other points that are slightly outside of the required
distance.

I had sort of assumed that doing a ranged point search would just keep
it to those points, but I'm getting items outside my requested range.

Is there a way that I can only include items within the box via a
configuration change?

Worst case, I'll store a lat/long pair and do the ranged search
myself, but then I'll have to reindex all my data and make some coding
changes in order for it to work.

Any input would be greatly appreciated!  Thanks!

-- Chris


Collection Distribution vs Replication in Solr

2011-10-27 Thread Alireza Salimi
Hi guys,

If we ignore the features that Replication provides (
http://wiki.apache.org/solr/SolrReplication#Features),
which approach is better?
Is there any performance problems with Replication?

Replications seems quite easier (no special configuration, ssh setting, cron
setting),
while rsync is a robust protocol.

Which one do you recommend?

Thanks

-- 
Alireza Salimi
Java EE Developer


Re: How can I force the threshold for a fuzzy query?

2011-10-27 Thread Simon Willnauer
I am not sure if there is such an option but you might be able to
override your query parser and reset that value if it is too fuzzy.
look for   protected Query newFuzzyQuery(Term term, float
minimumSimilarity, int prefixLength)  there you can change the actual
value used for minimumSimilarity

simon


On Thu, Oct 27, 2011 at 4:54 PM, Gustavo Falco
comfortablynum...@gmail.com wrote:
 Hi guys,

 I'm new to Solr (as you may guess for the subject). I'd like to force the
 threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries are
 expensive, but limiting it's threshold to a number near 1 would help.

 So my question is: Is this possible to configure in some of the xml
 configuration files? and if that's so, if I use this query:

 myField:myQuery~0.2

 Would Solr use the configured threshold instead, preventing indeed that
 anyone force a minor value than what I've set in the xml file? Would it help
 for what I want to do?



 Thanks in advance!



Re: bbox issue

2011-10-27 Thread Yonik Seeley
On Thu, Oct 27, 2011 at 2:34 PM, Christopher Gross cogr...@gmail.com wrote:
 I'm using the geohash field to store points for my data.  When I do a
 bounding box like:

 localhost:8080/solr/select?q=point:[-45,-80%20TO%20-24,-39]

 I get a data point that falls outside the box: (-73.03358 
 -50.46815)

Is there a reason you're using geohash and not LatLonType?
The SpatialSearch page is really only applicable to LatLonType - other
methods are currently not supported or well tested (and geohash is not
mentioned on that page, except in reference to a things in
development page).

-Yonik
http://www.lucidimagination.com


 The Spatial Search (http://wiki.apache.org/solr/SpatialSearch) page says:
 Exact distance calculations can be somewhat expensive and it can often
 make sense to use a quick approximation instead. The bbox filter is
 guaranteed to encompass all of the points of interest, but it may also
 include other points that are slightly outside of the required
 distance.

 I had sort of assumed that doing a ranged point search would just keep
 it to those points, but I'm getting items outside my requested range.

 Is there a way that I can only include items within the box via a
 configuration change?

 Worst case, I'll store a lat/long pair and do the ranged search
 myself, but then I'll have to reindex all my data and make some coding
 changes in order for it to work.

 Any input would be greatly appreciated!  Thanks!

 -- Chris



Re: bbox issue

2011-10-27 Thread Christopher Gross
True -- I found the geohash on a separate page.  I was using it
because it can allow for multiple points, and I was hoping to be ahead
of the curve for allowing that feature for the data I'm managing.

I can roll back and use the LatLon type -- but then I'm still
concerned about the bounding box giving results outside the specified
range.  Or would I be better off just indexing a lat  lon in separate
fields, then making a normal numeric ranged search against them.

-- Chris



On Thu, Oct 27, 2011 at 3:09 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Thu, Oct 27, 2011 at 2:34 PM, Christopher Gross cogr...@gmail.com wrote:
 I'm using the geohash field to store points for my data.  When I do a
 bounding box like:

 localhost:8080/solr/select?q=point:[-45,-80%20TO%20-24,-39]

 I get a data point that falls outside the box: (-73.03358 
 -50.46815)

 Is there a reason you're using geohash and not LatLonType?
 The SpatialSearch page is really only applicable to LatLonType - other
 methods are currently not supported or well tested (and geohash is not
 mentioned on that page, except in reference to a things in
 development page).

 -Yonik
 http://www.lucidimagination.com


 The Spatial Search (http://wiki.apache.org/solr/SpatialSearch) page says:
 Exact distance calculations can be somewhat expensive and it can often
 make sense to use a quick approximation instead. The bbox filter is
 guaranteed to encompass all of the points of interest, but it may also
 include other points that are slightly outside of the required
 distance.

 I had sort of assumed that doing a ranged point search would just keep
 it to those points, but I'm getting items outside my requested range.

 Is there a way that I can only include items within the box via a
 configuration change?

 Worst case, I'll store a lat/long pair and do the ranged search
 myself, but then I'll have to reindex all my data and make some coding
 changes in order for it to work.

 Any input would be greatly appreciated!  Thanks!

 -- Chris




Re: How can I force the threshold for a fuzzy query?

2011-10-27 Thread Gustavo Falco
Great! I didn't think there was a way to do it. I was about removing this
feature from my app for that reason. I'll give your advice it a try.


Thanks a lot!

2011/10/27 Simon Willnauer simon.willna...@googlemail.com

 I am not sure if there is such an option but you might be able to
 override your query parser and reset that value if it is too fuzzy.
 look for   protected Query newFuzzyQuery(Term term, float
 minimumSimilarity, int prefixLength)  there you can change the actual
 value used for minimumSimilarity

 simon


 On Thu, Oct 27, 2011 at 4:54 PM, Gustavo Falco
 comfortablynum...@gmail.com wrote:
  Hi guys,
 
  I'm new to Solr (as you may guess for the subject). I'd like to force the
  threshold for fuzzy queries to, say, 0.7. I've read that fuzzy queries
 are
  expensive, but limiting it's threshold to a number near 1 would help.
 
  So my question is: Is this possible to configure in some of the xml
  configuration files? and if that's so, if I use this query:
 
  myField:myQuery~0.2
 
  Would Solr use the configured threshold instead, preventing indeed that
  anyone force a minor value than what I've set in the xml file? Would it
 help
  for what I want to do?
 
 
 
  Thanks in advance!
 



Re: bbox issue

2011-10-27 Thread Yonik Seeley
On Thu, Oct 27, 2011 at 3:22 PM, Christopher Gross cogr...@gmail.com wrote:
 I can roll back and use the LatLon type -- but then I'm still
 concerned about the bounding box giving results outside the specified
 range.

The implementation of things like bbox are intimately tied to the
field type (i.e. normally completely different code).
LatLon bbox should work fine, but please let us know if it doesn't!

-Yonik
http://www.lucidimagination.com


Re: Get results ordered by field content starting with specific word

2011-10-27 Thread darul
Meaning I need to implement my own QueryParser ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-results-ordered-by-field-content-starting-with-specific-word-tp3455754p3459064.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: questions about autocommit committing documents

2011-10-27 Thread darul
While sending documents with SolrJ Http API...at the end, I am never sure
documents are indexed. 

I would like to store them somewhere and resend them in case commit has
failed.

If commit occurred every 10 minutes for example, and 100 documents are
waiting to be commit, server crash or stop..this 100 documents won't be
indexed later because my business logic won't send them again...

Then I would like create a Job (cron) which look into a table or somewhere
for documents which may not have been indexed due a problem occurred during
commit process.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/questions-about-autocommit-committing-documents-tp1582487p3459089.html
Sent from the Solr - User mailing list archive at Nabble.com.


changing omitNorms on an already built index

2011-10-27 Thread Jonathan Rochkind
So Solr 1.4.  I decided I wanted to change a field to have 
omitNorms=true that didn't previously.


So I changed the schema to have omitNorms=true.  And I reindexed all 
documents.


But it seems to have had absolutely no effect. All relevancy rankings 
seem to be the same.


Now, I could have a mistake somewhere else, maybe I didn't do what I 
thought.


But I'm wondering if there are any known issues related to this, is 
there something special you have to do to change a field from 
omitNorms=false to omitNorms=true on an already built index?  Other than 
re-indexing everything?Any known issues relevant here?


Thanks for any help,

Jonathan


Re: changing omitNorms on an already built index

2011-10-27 Thread Marc Sturlese
As far as I know there's no issue about this. You have to reindex and that's
it.
In which kind of field are you changing the norms? (You just will see
changes in text fields)
Using debugQuery=true you can see how norms affect the score (in case you
have them not omited)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/changing-omitNorms-on-an-already-built-index-tp3459132p3459169.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection Distribution vs Replication in Solr

2011-10-27 Thread Marc Sturlese
Replication is easier to manage and a bit faster. See the performance
numbers: http://wiki.apache.org/solr/SolrReplication

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Collection Distribution vs Replication in Solr

2011-10-27 Thread Alireza Salimi
I can't see those benchmarks, can you?

On Thu, Oct 27, 2011 at 5:20 PM, Marc Sturlese marc.sturl...@gmail.comwrote:

 Replication is easier to manage and a bit faster. See the performance
 numbers: http://wiki.apache.org/solr/SolrReplication

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Collection-Distribution-vs-Replication-in-Solr-tp3458724p3459178.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Alireza Salimi
Java EE Developer


Passing system parameters to solr at runtime

2011-10-27 Thread Michael Dodd
I've been given the project of setting up a CentOS-based solr replication slave 
for a project here at work.  I think it's configured correctly, and replication 
seems to be happening correctly.

I've got some CentOS experience, but I'm having to get up to speed on Solr in a 
short period of time.  The guy who was working on this piece of the project is 
no longer available and I'm not sure he knew what he was doing anyway.

The main problem I'm having is that the project lead wants to make sure the 
slaves have master disabled for all cores without changing the solrconfig.xml 
for every core.

This link seems to apply to what I'm trying to do:

http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

My first question is this:
In the tomcat/solr implementation I'm using, I can't tell where/how to pass 
system parameters (-Denable.master=false) to solr.  I've found how to pass this 
sort of thing into tomcat, but it doesn't seem like this is the same thing.

Next question:
The link references setting these properties in a solrcore.properties file.  
I've created the file and landed it next to the applicable solrconfig.xml but 
it doesn't seem to apply it's settings.

Both the master and the slave node are on solr v3.3
The master is running on windows server 2008 r2, and was set up well before my 
involvement
The slave is running  CentOS 6.0

Thanks for reading.  I'm happy to provide more info as needed.


Re: data import in 4.0

2011-10-27 Thread Erick Erickson
Two things:
1 Look at http://wiki.apache.org/solr/DataImportHandler, the
interactive Development Mode section. There's a page that helps you
debug this kind of thing. But I suspect your SQL is not correct. You
should be able to form a single SQL query that does what you want,
something like (and I haven't tested this and my SQL is rusty)
SELECT ID, Status, Title, last_name FROM project, person where
project.pi_pid = person.pid

2 please start a new thread when changing the subject, from
hossman's apache page:
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is hidden in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.

See:http://people.apache.org/~hossman/#threadhijack

Best
Erick

On Wed, Oct 26, 2011 at 10:46 AM, Adeel Qureshi adeelmahm...@gmail.com wrote:
 Any comments .. please

 I am able to do the bulkimport without nested query but with nested query it
 just keeps working on it and never seems to end ..

 I would appreciate any help

 Thanks
 Adeel


 On Sat, Oct 22, 2011 at 11:12 AM, Adeel Qureshi adeelmahm...@gmail.comwrote:

 yup that was it .. my data import files version was not the same as solr
 war .. now I am having another problem though

 I tried doing a simple data import

 document
     entity name=p query=SELECT ID, Status, Title FROM project
       field column=ID name=id /
       field column=Status name=status_s /
       field column=Title name=title_t /
    /entity
   /document

 simple in terms of just pulling up three fields from a table and adding to
 index and this worked fine but when I add a nested or joined table ..

 document
     entity name=project query=SELECT ID, Status, Title FROM project
       field column=ID name=id /
       field column=Status name=status_s /
       field column=Title name=title_t /
       entity name=related query=select last_name FROM person per inner
 join project proj on proj.pi_pid = per.pid where proj.ID = ${project.ID}
           field column=last_name name=pi_s /
       /entity
    /entity
   /document

 this data import doesnt seems to end .. it just keeps going .. i only have
 about 15000 records in the main table and about 22000 in the joined table ..
 but the Fetch count in dataimport handler status indicator thing shows that
 it has fetched close to half a million records or something .. i m not sure
 what those records are .. is there a way to see exactly what queries are
 being run by dataimport handler .. is there something wrong with my nested
 query ..

 THanks
 Adeel


 On Fri, Oct 21, 2011 at 3:05 PM, Alireza Salimi 
 alireza.sal...@gmail.comwrote:

 So to me it heightens the probability of classloader conflicts,
 I haven't worked with Solr 4.0, so I don't know if set of JAR files
 are the same with Solr 3.4. Anyway, make sure that there is only
 ONE instance of apache-solr-dataimporthandler-***.jar in your
 whole tomcat+webapp.

 Maybe you have this jar file in CATALINA_HOME\lib folder.

 On Fri, Oct 21, 2011 at 3:06 PM, Adeel Qureshi adeelmahm...@gmail.com
 wrote:

  its deployed on a tomcat server ..
 
  On Fri, Oct 21, 2011 at 12:49 PM, Alireza Salimi
  alireza.sal...@gmail.comwrote:
 
   Hi,
  
   How do you start Solr, through start.jar or you deploy it to a web
   container?
   Sometimes problems like this are because of different class loaders.
   I hope my answer would help you.
  
   Regards
  
  
   On Fri, Oct 21, 2011 at 12:47 PM, Adeel Qureshi 
 adeelmahm...@gmail.com
   wrote:
  
Hi I am trying to setup the data import handler with solr 4.0 and
  having
some unexpected problems. I have a multi-core setup and only one
 core
needed
the dataimport handler so I have added the request handler to it and
   added
the lib imports in config file
   
lib dir=../../dist/
 regex=apache-solr-dataimporthandler-\d.*\.jar
  /
lib dir=../../dist/
regex=apache-solr-dataimporthandler-extras-\d.*\.jar /
   
for some reason this doesnt works .. it still keeps giving me
   ClassNoFound
error message so I moved the jars files to the shared lib folder and
  then
atleast I was able to see the admin screen with the dataimport
 plugin
loaded. But when I try to do the import its throwing this error
 message
   
INFO: Starting Full Import
Oct 21, 2011 11:35:41 AM org.apache.solr.core.SolrCore execute
INFO: [DW] webapp=/solr path=/select
   params={command=statusqt=/dataimport}
status=0 QTime=0
Oct 21, 2011 11:35:41 AM
 org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
WARNING: Unable to read: dataimport.properties
Oct 21, 2011 11:35:41 AM
  org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.lang.NoSuchMethodError:
  

Re: solr break up word

2011-10-27 Thread Erick Erickson
Hmmm, I'm not sure what happens when you specify
analyzer (without type=index and
analyzer type=query. I have no clue which one
is used.

Look at the admin/analysis page to understand how things are
broken up.

Did you re-index after you added the ngram filter?

You'll get better help if you include example queries with
debugQuery=on appended, it'll give us a lot more to
work with.

Best
Erick

On Wed, Oct 26, 2011 at 4:14 PM, Boris Quiroz boris.qui...@menco.it wrote:
 Hi,

 I've solr running on a CentOS server working OK, but sometimes my application 
 needs to index some parts of a word. For example, if I search 'dislike' word 
 fine but if I search 'disl' it returns zero. Also, if I search 'disl*' 
 returns some values (the same if I search for 'dislike') but if I search 
 'dislike*' it returns zero too.

 So, I've two questions:

 1. How exactly the asterisk works as a wildcard?

 2. What can I do to index properly parts of a word? I added this lines to my 
 schema.xml:

 fieldType name=text class=solr.TextField omitNorms=false
      analyzer
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.StandardFilterFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.NGramFilterFactory minGramSize=2 
 maxGramSize=15/
      /analyzer

      analyzer type=query
        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.StandardFilterFactory/
        filter class=solr.LowerCaseFilterFactory/
      /analyzer
 /fieldType

 But I can't get it to work. Is OK what I did or I'm wrong?

 Thanks.

 --
 Boris Quiroz
 boris.qui...@menco.it




Re: changing omitNorms on an already built index

2011-10-27 Thread Simon Willnauer
we are not actively removing norms. if you set omitNorms=true and
index documents they won't have norms for this field. Yet, other
segment still have norms until they get merged with a segment that has
no norms for that field ie. omits norms. omitNorms is anti-viral so
once you set it to true it will be true for other segment eventually.
If you optimize you index you should see that norms go away.

simon

On Thu, Oct 27, 2011 at 11:17 PM, Marc Sturlese marc.sturl...@gmail.com wrote:
 As far as I know there's no issue about this. You have to reindex and that's
 it.
 In which kind of field are you changing the norms? (You just will see
 changes in text fields)
 Using debugQuery=true you can see how norms affect the score (in case you
 have them not omited)

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/changing-omitNorms-on-an-already-built-index-tp3459132p3459169.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: changing omitNorms on an already built index

2011-10-27 Thread Robert Muir
On Thu, Oct 27, 2011 at 6:00 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 we are not actively removing norms. if you set omitNorms=true and
 index documents they won't have norms for this field. Yet, other
 segment still have norms until they get merged with a segment that has
 no norms for that field ie. omits norms. omitNorms is anti-viral so
 once you set it to true it will be true for other segment eventually.
 If you optimize you index you should see that norms go away.


This is only true in trunk (4.x!)
https://issues.apache.org/jira/browse/LUCENE-2846

-- 
lucidimagination.com


Re: Query/Delete performance difference between straight HTTP and SolrJ

2011-10-27 Thread Shawn Heisey

On 10/27/2011 5:56 AM, Michael Sokolov wrote:
From everything you've said, it certainly sounds like a low-level I/O 
problem in the client, not a server slowdown of any sort.  Maybe Perl 
is using the same connection over and over (keep-alive) and Java is 
not.  I really don't know.  One thing I've heard is that 
StreamingUpdateSolrServer (I think that's what it's called) can give 
better throughput for large request batches.  If you're not using 
that, you may be having problems w/closing and re-opening connections?


Although I can't claim to know for sure, I'm fairly sure that the simple 
LWP classes I'm using don't do keepalive unless you specifically 
configure the user agent to do so.  I'll look into it some more.


The StreamingUpdateSolrServer says that they only recommend using it 
with the /update handler, not for queries.  I'm not having a problem 
with the deletes themselves, they go pretty fast.  It's all of the 
queries before each delete that are relatively slow.  Doing those 
queries really adds up.  With multithreading, it does all the shards at 
once, but it still can only query for a limited number of values at a 
time due to maxBooleanClauses.  Now I'm checking and deleting 1000 
values at a time, on all shards simultanously.  I use 
CommonsHttpSolrServer, and each of those objects is created only once, 
when the program first starts up.


I figure there are three possibilities:

1) A glaring inefficiency in CommonsHttpSolrServer queries as compared 
to a straight HTTP POST request.
2) The compartmentalization provided by the virtual machine architecture 
creates an odd synergy that is not present when there are only two Solr 
instances on physical machines instead of eight of them (seven shards 
plus a search broker) on virtual machines.
3) The extra physical memory on the servers with virtualization is 
granting more of a disk-cache-related performance improvement than the 
lack of virtualization on the others.


Only the first of those possible problems is something that can be 
determined or fixed without migrating the other servers to my new 
system.  I'm having one other problem with the new build program.  I 
haven't figured out exactly what that problem is, so I am very reluctant 
to switch everything over.  So far it seems to be related to the MySQL 
JDBC connector or my attempt at threading, not Solr.


I mentioned that the hardware is identical except for memory.  That's 
not quite true - the servers accessed by the java program are better.  
One of them has a slightly faster CPU than its counterpart with 
virtualization, and they all have 1TB hard drives as opposed to the 
mixed 500GB  750GB drives in the other servers.  All of the servers are 
Dell 2950 with six-drive RAID10 arrays.





Re: joins and filter queries effecting scoring

2011-10-27 Thread Jason Toy
Does anyone have any idea on this issue?

On Tue, Oct 25, 2011 at 11:40 AM, Jason Toy jason...@gmail.com wrote:

 Hi Yonik,

 Without a Join I would normally query user docs with:
 q=data_text:testfq=is_active_boolean:true

 With joining users with posts, I get no no results:
 q={!join from=self_id_i
 to=user_id_i}data_text:testfq=is_active_boolean:truefq=posts_text:hello



 I am able to use this query, but it gives me the results in an order that I
 dont want(nor do I understand its order):
 q={!join from=self_id_i to=user_id_i}data_text:test AND
 is_active_boolean:truefq=posts_text:hello

 I want the order to be the same as I would get from my original
 q=data_text:testfq=is_active_boolean:true, but with the ability to join
 with the Posts docs.





 On Tue, Oct 25, 2011 at 11:30 AM, Yonik Seeley yo...@lucidimagination.com
  wrote:

 Can you give an example of the request (URL) you are sending to Solr?

 -Yonik
 http://www.lucidimagination.com



 On Mon, Oct 24, 2011 at 3:31 PM, Jason Toy jason...@gmail.com wrote:
  I have 2 types of docs, users and posts.
  I want to view all the docs that belong to certain users by joining
 posts
  and users together.  I have to filter the users with a filter query of
  is_active_boolean:true so that the score is not effected,but since I
 do a
  join, I have to move the filter query to the query parameter so that I
 can
  get the filter applied. The problem is that since the is_active_boolean
 is
  moved to the query, the score is affected which returns back an order
 that I
  don't want.
   If I leave the is_active_boolean:true in the fq paramater, I get no
  results back.
 
  My question is how can I apply a filter query to users so that the score
 is
  not effected?
 




 --
 - sent from my mobile





-- 
- sent from my mobile


Re: Search for the single hash # character never returns results

2011-10-27 Thread Erick Erickson
NP. By the way, kudos for posting enough information to diagnose
the problem first time round!

Erick

On Thu, Oct 27, 2011 at 8:46 AM, Daniel Bradley
daniel.brad...@adfero.co.uk wrote:
 Fantastic, thanks, yes I completely overlooked that case, separating the
 analysers worked a treat.

 Had also posted on stack overflow but the mailing list proved to be
 superior!

 Many thanks,

 Daniel

 On 27 October 2011 13:09, Erick Erickson erickerick...@gmail.com wrote:

 Take a look at your admin/analysis page and put your tokens in for both
 index and query times. What I think you'll see is that the # is being
 stripped at query time due to the first PatternReplaceFilterFactory.

 You probably want to split your analyzers into an index-time and query-time
 pair and do the appropriate replacements to keep # at quer time.


 Best
 Erick

 On Tue, Oct 25, 2011 at 12:27 PM, Daniel Bradley
 daniel.brad...@adfero.co.uk wrote:
  When running a search such as:
   field_name:#
   field_name:#
   field_name:\#
 
  where there is a record with the value of exactly #, solr returns 0
 rows.
 
  The workaround we are having to use is to use a range query on the
  field such as:
   field_name:[# TO #]
  and this returns the correct documents.
 
  Use case details:
  We have a field that indexes a text field and calculates a letter
  group. This keeps only the first significant character from a value
  (number or letter), and if it is a number the simply stores # as we
  want all numbered items grouped together.
 
  I'm also aware that we could also fix this by using a specific number
  instead of the hash character, however, I though I'd raise this to see
  if there is a wider issue. I've listed some specific details below.
 
  Thanks for your time,
 
  Daniel Bradley
 
 
  Field definition:
     fieldType name=letterGrouping class=solr.TextField
  sortMissingLast=true omitNorms=true
       analyzer
         tokenizer class=solr.PatternTokenizerFactory
  pattern=^([a-zA-Z0-9]).* group=1/
         filter class=solr.LowerCaseFilterFactory /
         filter class=solr.TrimFilterFactory /
         filter class=solr.PatternReplaceFilterFactory
                 pattern=([^a-z0-9]) replacement= replace=all
         /
         filter class=solr.PatternReplaceFilterFactory
                 pattern=([0-9]) replacement=# replace=all
         /
       /analyzer
     /fieldType
 
  Server information:
  Solr Specification Version: 3.2.0
  Solr Implementation Version: 3.2.0 1129474 - rmuir - 2011-05-30 23:07:15
  Lucene Specification Version: 3.2.0
  Lucene Implementation Version: 3.2.0 1129474 - 2011-05-30 23:08:57
  




Re: Faceting on multiple fields, with multiple where clauses

2011-10-27 Thread Erick Erickson
Hmmm, this may be one of those things that's so ingrained it's not
mentioned. Certainly the CommonQueryParameters page never
explicitly says that there can only be one q parameter

But the problem is how would multiple q params be combined?
An implied AND? OR? NOT? the syntax would be a mess

The rule for fq is that they are intersections, that is an implied AND.
Also, the results of fq clauses can be cached. And fqs don't contribute
to the scores of documents, they just contribute a yes/no.


FWIW
Erick

On Thu, Oct 27, 2011 at 9:03 AM, Rubinho ru...@gekiere.com wrote:
 Hi Erik,

 Thank you very much.
 Your hint did solve the problem.

 Acutally, i don't understand why (i read the difference between Q and QF,
 but it's still not clear to me why it did'nt work with Q).

 But it's solved, that's the most important :)


 Thanks,
 Ruben

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Faceting-on-multiple-fields-with-multiple-where-clauses-tp3457432p3457569.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: help needed on solr-uima integration

2011-10-27 Thread Xue-Feng Yang
Thanks Koji,

I finally found a method not found error in SOLR 3.4. The method 
resolveUpdateChainParam(SolrParams params, org.slf4j.Logger log) is not in the 
class 
org.apache.solr.util.SolrPluginUtils. It was very strange there were no errors 
message. I found the problems after loaded source code to eclipse.


Then I checked both SOLR 4.0 and 3.5. Both have this method and again strange 
to me it is deprecated. When tried 4.0, the number of errors was showed in new 
admin page but no details.

When I tried, I met the following attached errors.

Xue-Feng


message null java.lang.NullPointerException at 
org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readAEOverridingParameters(SolrUIMAConfigurationReader.java:101)
 at 
org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readSolrUIMAConfiguration(SolrUIMAConfigurationReader.java:42)
 at 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:44)
 at 
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74)
 at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:199)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1369) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:217)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655) 
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595) 
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98) at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162) 
at 
org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:330) 
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231) 
at
 
com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174)
 at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:828) at 
com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:725) at 
com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1019) at 
com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:225)
 at 
com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
 at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104) 
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90) 
at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79) at 
com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
 at 
com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59) 
at com.sun.grizzly.ContextTask.run(ContextTask.java:71) at
 
com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
 at 
com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513) 
at java.lang.Thread.run(Thread.java:662) 



From: Koji Sekiguchi k...@r.email.ne.jp
To: solr-user@lucene.apache.org
Sent: Thursday, October 27, 2011 7:25:09 AM
Subject: Re: help needed on solr-uima integration

(11/10/27 9:12), Xue-Feng Yang wrote:
 Hi,

  From Solr Info page, I can see my solr-uima core is there, but 
updateRequestProcessorChain is not there. What is the reason?

Because UpdateRequestProcessor(and Chain) is not type of SolrInfoMBean.
(As those classes in the page implement SolrInfoMBean, you can see them)

koji
-- 
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: Passing system parameters to solr at runtime

2011-10-27 Thread Erick Erickson
Would it be acceptable to change a central slave config? Because
it's possible to
have the replication process distribute solrconfig.xml files to the
slaves that are
different from the master.

That way, your master has it's own solrconfig.xml, and a solrconfig_slave.xml in
the conf directory. At replication, the solrconfig_slave.xml is what's
sent to the
slave as solrconfig.xml, presumably this file has the whole master
thing removed.

See: http://wiki.apache.org/solr/SolrReplication, the
replicating solrconfig.xml section

which is another way of saying that I have no clue how to
do what you asked, but this solution seems like it might do.

Best
Erick

On Thu, Oct 27, 2011 at 5:34 PM, Michael Dodd md...@vocus.com wrote:
 I've been given the project of setting up a CentOS-based solr replication 
 slave for a project here at work.  I think it's configured correctly, and 
 replication seems to be happening correctly.

 I've got some CentOS experience, but I'm having to get up to speed on Solr in 
 a short period of time.  The guy who was working on this piece of the project 
 is no longer available and I'm not sure he knew what he was doing anyway.

 The main problem I'm having is that the project lead wants to make sure the 
 slaves have master disabled for all cores without changing the solrconfig.xml 
 for every core.

 This link seems to apply to what I'm trying to do:

 http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node

 My first question is this:
 In the tomcat/solr implementation I'm using, I can't tell where/how to pass 
 system parameters (-Denable.master=false) to solr.  I've found how to pass 
 this sort of thing into tomcat, but it doesn't seem like this is the same 
 thing.

 Next question:
 The link references setting these properties in a solrcore.properties file.  
 I've created the file and landed it next to the applicable solrconfig.xml but 
 it doesn't seem to apply it's settings.

 Both the master and the slave node are on solr v3.3
 The master is running on windows server 2008 r2, and was set up well before 
 my involvement
 The slave is running  CentOS 6.0

 Thanks for reading.  I'm happy to provide more info as needed.



Re: help needed on solr-uima integration

2011-10-27 Thread Xue-Feng Yang


Thanks Koji,

I finally found a method not found error in SOLR 3.4. The method 
resolveUpdateChainParam(SolrParams params, org.slf4j.Logger log) is not in the 
class 
org.apache.solr.util.SolrPluginUtils. It was very strange there were no errors 
message. I found the problems after loaded source code to eclipse.


Then I checked both SOLR 4.0 and 3.5. Both have this method and again strange 
to me it is deprecated. When tried 4.0, the number of errors was showed in new 
admin page but no details.

When I tried 3.5, I met the following attached errors.

Xue-Feng


message null java.lang.NullPointerException at 
org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readAEOverridingParameters(SolrUIMAConfigurationReader.java:101)
 at 
org.apache.solr.uima.processor.SolrUIMAConfigurationReader.readSolrUIMAConfiguration(SolrUIMAConfigurationReader.java:42)
 at 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:44)
 at 
org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:74)
 at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:199)
 at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1369) at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) 
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:256)
 at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:217)
 at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:279)
 at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:655) 
at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:595) 
at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:98) at 
com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:91)
 at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:162) 
at 
org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:330) 
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:231) 
at
com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:174)
 at com.sun.grizzly.http.ProcessorTask.invokeAdapter(ProcessorTask.java:828) at 
com.sun.grizzly.http.ProcessorTask.doProcess(ProcessorTask.java:725) at 
com.sun.grizzly.http.ProcessorTask.process(ProcessorTask.java:1019) at 
com.sun.grizzly.http.DefaultProtocolFilter.execute(DefaultProtocolFilter.java:225)
 at 
com.sun.grizzly.DefaultProtocolChain.executeProtocolFilter(DefaultProtocolChain.java:137)
 at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:104) 
at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:90) 
at com.sun.grizzly.http.HttpProtocolChain.execute(HttpProtocolChain.java:79) at 
com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:54)
 at 
com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:59) 
at com.sun.grizzly.ContextTask.run(ContextTask.java:71) at
com.sun.grizzly.util.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:532)
 at 
com.sun.grizzly.util.AbstractThreadPool$Worker.run(AbstractThreadPool.java:513) 
at java.lang.Thread.run(Thread.java:662) 



From: Koji Sekiguchi k...@r.email.ne.jp
To: solr-user@lucene.apache.org
Sent: Thursday, October 27, 2011 7:25:09 AM
Subject: Re: help needed on solr-uima integration

(11/10/27 9:12), Xue-Feng Yang wrote:
 Hi,

  From Solr Info page, I can see my solr-uima core is there, but 
updateRequestProcessorChain is not there. What is the reason?

Because UpdateRequestProcessor(and Chain) is not type of SolrInfoMBean.
(As those classes in the page implement SolrInfoMBean, you can see them)

koji
-- 
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/

Re: changing omitNorms on an already built index

2011-10-27 Thread Erick Erickson
Well, this could be explained if your fields are very short. Norms
are encoded into (part of?) a byte, so your ranking may be unaffected.

Try adding debugQuery=on and looking at the explanation. If you've
really omitted norms, I think you should see clauses like:

1.0 = fieldNorm(field=features, doc=1)
in the output, never something like

0.25 = fieldNorm(field=features, doc=1)
i.e. in the absence of norm information, 1 is used.

Also, in your index, see if your *.nrm files change in size.

And I recommend that, when you're experimenting, you remove
your entire solr home/data/index directory (the directory too,
not just sub-files) before re-indexing. As Simon and Robert say,
eventually the norm data will be purged, but by removing the
directory first, you can look at things like the .nrm file with
confidence that you're not seeing remnants that haven't been
cleaned up quite yet.

Best
Erick


On Thu, Oct 27, 2011 at 5:00 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 So Solr 1.4.  I decided I wanted to change a field to have omitNorms=true
 that didn't previously.

 So I changed the schema to have omitNorms=true.  And I reindexed all
 documents.

 But it seems to have had absolutely no effect. All relevancy rankings seem
 to be the same.

 Now, I could have a mistake somewhere else, maybe I didn't do what I
 thought.

 But I'm wondering if there are any known issues related to this, is there
 something special you have to do to change a field from omitNorms=false to
 omitNorms=true on an already built index?  Other than re-indexing
 everything?    Any known issues relevant here?

 Thanks for any help,

 Jonathan



Re: Analyzers from schema.xml with custom parser

2011-10-27 Thread Erick Erickson
You've really got to give a lot more information about what you're trying to do
here, what you've tried and what you mean by associate. Please review:

http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Wed, Oct 26, 2011 at 6:29 PM, Milan Dobrota mi...@milandobrota.com wrote:
 I created a custom plugin parser, and it seems like it is ignoring analyzers
 from schema.xml. Is there any way to associate the two?



Applying hl.requireFieldMatch to groups of fields

2011-10-27 Thread Michael Ryan
I am trying to highlight FieldA when a user searches on either FieldA or FieldB,
but I do not want to highlight FieldA when a user searches on FieldC.

To explain further: I have a field named content and a field named
contentCS. The content field is a stored text field that uses
LowerCaseFilterFactory (i.e., case-insensitive). The contentCS field is a copy
of the content field, but is not stored and does not use LowerCaseFilterFactory
(i.e., case-sensitive).

My query looks like q=...fl=contenthl.requireFieldMatch=truehl.fl=content.
I use requireFieldMatch because I do not want certain other things I put in the
query to be highlighted in the content field.

When I search on either the content or the contentCS fields, I would like the
content field to be highlighted. But when searching on any other fields, I do
not want the terms for those fields to be highlighted in the content field.

I was thinking I could hack this into DefaultSolrHighlighter, QueryTermScorer,
and QueryTermExtractor. Perhaps the syntax could look like
hl.content.useMatchesFromTheseFields=content,contentCS, and then I would
pass an array of field names down into QueryTermExtractor. Anyone have any
tips/comments on this? I never looked at the highlighting code before, so not
sure what I'm getting myself into...

-Michael