Re: FunctionQuery score=0

2011-11-18 Thread Andre Bois-Crettez
Definitely worked for me, with a classic full text search on ipod and 
such.

Changing the lower bound changed the number of results.

Follow Chris advice, and give more details.


John wrote:

Doesn't seem to work.
I though that FilterQueries work before the search is performed and not
after... no?

Debug doesn't include filter query only the below (changed a bit):

BoostedQuery(boost(+fieldName:,boostedFunction(ord(fieldName),query)))


On Thu, Nov 17, 2011 at 5:04 PM, Andre Bois-Crettez
andre.b...@kelkoo.comwrote:

  

John wrote:



Some of the results are receiving score=0 in my function and I would like
them not to appear in the search results.


  

you can use frange, and filter by score:

q=ipodfq={!frange l=0 incl=false}query($q)

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/





  


--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/



Re: fieldCache problem OOM exception

2011-11-18 Thread topcat
dear erolagnab,
it is your code in the solr server?
which class i can put it?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/fieldCache-problem-OOM-exception-tp3067057p3517780.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: delta-import of rich documents like word and pdf files!

2011-11-18 Thread neuron005
When I set my fileSize of type string. It shows error as I have posted above.
Then I changed it to slong and results was severe..here is log
18 Nov, 2011 3:00:54 PM
org.apache.solr.response.BinaryResponseWriter$Resolver getDoc
WARNING: Error reading a field from document : SolrDocument[{}]
java.lang.StringIndexOutOfBoundsException: String index out of range: 3
at java.lang.String.charAt(String.java:694)
at 
org.apache.solr.util.NumberUtils.SortableStr2long(NumberUtils.java:152)
at
org.apache.solr.schema.SortableLongField.toObject(SortableLongField.java:70)
at
org.apache.solr.schema.SortableLongField.toObject(SortableLongField.java:37)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.getDoc(BinaryResponseWriter.java:148)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.writeDocList(BinaryResponseWriter.java:122)
at
org.apache.solr.response.BinaryResponseWriter$Resolver.resolve(BinaryResponseWriter.java:86)
at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:144)
at
org.apache.solr.common.util.JavaBinCodec.writeNamedList(JavaBinCodec.java:134)
at
org.apache.solr.common.util.JavaBinCodec.writeKnownType(JavaBinCodec.java:222)
at 
org.apache.solr.common.util.JavaBinCodec.writeVal(JavaBinCodec.java:139)
at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:87)
at
org.apache.solr.response.BinaryResponseWriter.getParsedResponse(BinaryResponseWriter.java:191)
at
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:57)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:964)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:304)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)


what does this mean?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/delta-import-of-rich-documents-like-word-and-pdf-files-tp3502039p3518171.html
Sent from the Solr - User mailing list archive at Nabble.com.


write-lock issue

2011-11-18 Thread Husain, Yavar
Environment: Solr 1.4 on Windows/MS SQL Server

A write lock is getting created whenever I am trying to do a full-import of 
documents using DIH. Logs say Creating a connection with the database. 
and the process is not going forward (Not getting a database connection). So 
the indexes are not getting created. Note that no other process is accessing 
the index and even I restarted my MS SQL Server service. However still I see a 
write.lock file in my index directory.

What could be the reason for this? Even I have set the flag unlockOnStartup in 
solrconfig to be true, still the indexing is not happening.


/PRE
BR
**BRThis
 message may contain confidential or proprietary information intended only for 
the use of theBRaddressee(s) named above or may contain information that is 
legally privileged. If you areBRnot the intended addressee, or the person 
responsible for delivering it to the intended addressee,BRyou are hereby 
notified that reading, disseminating, distributing or copying this message is 
strictlyBRprohibited. If you have received this message by mistake, please 
immediately notify us byBRreplying to the message and delete the original 
message and any copies immediately thereafter.BR
BR
Thank you.~BR
**BR
FAFLDBR
PRE


Multivalued Boolean Search

2011-11-18 Thread Ahson Iqbal
Hi

I have a multivalued field say MulField in my index that have values in a 
document like 

str name=DocID1/str
arr name=MulField
    strAuto Mobiles/str
    strToyota Corolla/str
/arr

No let say I specified a search criteria as 

+MulField:Mobiles +MulField:Toyota 

now my question is it is possible that this document should not appear in the 
search results.

Regards
Ahsan

wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
Hello,

Here is one puzzle I couldn't yet find a key for:

for the wild-card query:

*ocvd

SOLR 3.4 returns hits. But for

*OCVD

it doesn't

On the indexing side two following tokenizers/filters are defined:

tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ReversedWildcardFilterFactory withOriginal=true
maxPosAsterisk=3 maxPosQuestion=2 maxFractionAsterisk=0.33/

On the query side:
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/

SOLR analysis tool shows, that OCVD gets lower-cased to ocvd. Does SOLR
skip a lower-casing step when doing the actual wild-card search?

BTW, the same issue for a trailing wild-card:

mocv*

produces hits, while

MOCV*

doesn't. Appreciate any help or pointers.


-- 
Regards,

Dmitry Kan


Only a subset of edismax pf fields are used for the phrase part DisjunctionMaxQuery

2011-11-18 Thread Jean-Claude Dauphin
Hello,

The parsedQuery is displayed as follow:

parsedquery=+(DisjunctionMaxQuery((title:responsable^4.0 |
keywords:responsable^3.0 | organizationName:responsable |
location:responsable | formattedDescription:responsable^2.0 |
nafCodeText:responsable^2.0 | jobCodeText:responsable^3.0 |
categoryPayloads:responsable | labelLocation:responsable)~0.1)
DisjunctionMaxQuery((title:boutique^4.0 | keywords:boutique^3.0 |
organizationName:boutique | location:boutique |
formattedDescription:boutique^2.0 | nafCodeText:boutique^2.0 |
jobCodeText:boutique^3.0 | categoryPayloads:boutique |
labelLocation:boutique)~0.1) DisjunctionMaxQuery((title:lingerie^4.0 |
keywords:lingerie^3.0 | organizationName:lingerie | location:lingerie |
formattedDescription:lingerie^2.0 | nafCodeText:lingerie^2.0 |
jobCodeText:lingerie^3.0 | categoryPayloads:lingerie |
labelLocation:lingerie)~0.1))

*DisjunctionMaxQuery*((title:responsable boutique lingerie~10^4.0 |
formattedDescription:responsable boutique lingerie~10^2.0 |
categoryPayloads:responsable boutique lingerie~10)~0.1)

The search query is 'responsable boutique lingerie'
The qf and pf fields are the same:

qf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0
organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0
categoryPayloads^1.0,

pf= title^4.0 formattedDescription^2.0 nafCodeText^2.0 jobCodeText^3.0
organizationName^1.0 keywords^3.0 location^1.0 labelLocation^1.0
categoryPayloads^1.0,

I would have expect to retrieve the whole set of pf fields for the phrase
part of the parsed query!

Is it comming from the field definition in the schema.xml?

Best,

Jean-Claude Dauphin



-- 
Jean-Claude Dauphin

jc.daup...@gmail.com
jc.daup...@afus.unesco.org

http://kenai.com/projects/j-isis/
http://www.unesco.org/isis/
http://www.unesco.org/idams/
http://www.greenstone.org


handling query errors

2011-11-18 Thread Alan Miller
I'm new to SolR and just got things working.

I can query my index  retrieve JSON results via: HTTP GET using: wt=json
and q=num_cpu parameters:
e.g.:
http://127.0.0.1:8080/solr/select?indent=onversion=2.2q=num_cpu%3A16fq=start=0rows=10fl=*%2Cscoreqt=wt=jsonexplainOther=debugQuery=on

When the query is syntactically correct SolR responds with HTTP Status
200 and my JSON data.

When the query is incorrect (e.g. specifying an undefined field) SolR
responds with a HTTP status 400
with HTML that explains the error. For example when I submit number_cpu
instead of num_cpu I get:

  h1HTTP Status 400 - undefined field number_cpu/h1
  HR size=1 noshade=noshade
  pbtype/b Status report/p
  pbmessage/b uundefined field number_cpu/u/p
  pbdescription/b uThe request sent by the client was syntactically
incorrect (undefined field number_cpu)./u/p

What's the best way to programatically access the error message undefined
field number_cpu?
Is it possible to configure SolR to always return error messages in a
different way.

Thanks
Alan


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan
 Here is one puzzle I couldn't yet find a key for:
 
 for the wild-card query:
 
 *ocvd
 
 SOLR 3.4 returns hits. But for
 
 *OCVD
 
 it doesn't

This is a FAQ. Please see 

http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F


Re: Multivalued Boolean Search

2011-11-18 Thread Ahmet Arslan
 I have a multivalued field say MulField in my index that
 have values in a document like 
 
 str name=DocID1/str
 arr name=MulField
     strAuto Mobiles/str
     strToyota Corolla/str
 /arr
 
 No let say I specified a search criteria as 
 
 +MulField:Mobiles +MulField:Toyota 
 
 now my question is it is possible that this document should
 not appear in the search results.


No. You should index separate two documents in your example. e.g. Normalize 
your data.

However there is one trick that you can use. Issuing a phrase query may satisfy 
your needs. e.g. q=MulField:Mobiles Toyota~99

Use a big positionIncrementGap value in your field definition.  

http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html




Re: Can files be faceted based on their size ?

2011-11-18 Thread Ahmet Arslan
 *Can fileSize be faceted? *I tried to
 facet them, but fileSize is of type
 string and can not be faceted.
 I want to facet my doc and pdf files according to their
 size. I can
 calculate file Size but they are of type string.
 What should I do in order to achieve that?
 Thanks in advance

Since you want to apply range faceting you need to use trie based types.
http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range


Re: wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
Hi Ahmet,

Thanks for the link.

I'm a bit puzzled with the explanation found there regarding lower casing:

These queries are case-insensitive anyway because QueryParser makes them
lowercase.

that's exactly what I want to achieve, but somehow the queries *are*
case-sensitive. Probably I should play around with code of a query parser.

On Fri, Nov 18, 2011 at 2:50 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Here is one puzzle I couldn't yet find a key for:
 
  for the wild-card query:
 
  *ocvd
 
  SOLR 3.4 returns hits. But for
 
  *OCVD
 
  it doesn't

 This is a FAQ. Please see


 http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F




-- 
Regards,

Dmitry Kan


NewSolrCloud/ES Comparison

2011-11-18 Thread darren
Hi,
  I know its too early for this given that the NewSolrCloud for Solr 4 is
still in development, but what is interesting for those of us anxiously
awaiting NewSolrCloud 4 is to understand how it compares to existing
cloud-like search such as ElasticSearch (which I only recently learned
about).

I use Solr in my project and am hoping to gain some of the distributed
features mention in ES with Solr.

Does anyone know what the similarities/differences might be?

Many thanks,
Darren


Re: Implications of setting catenateAll=1

2011-11-18 Thread Erick Erickson
The main one is that you can get an explosion in the number of terms,
depending on your input, especially if you have things that aren't
regular text. Imagine
partone-1
partone-2
partone-3

parttwo-1
parttwo-2
parttwo-3

if catenateall is set to 0, you;d get 5 tokens here. If it was set to
1 you'd get  11 tokens.

Which doesn't seem like a lot until you have hundreds of thousands of
patterns like this.

So, give it a whirl and see what pops out with your particular corpus,
but keep an eye on the number of unique terms that end up in the
field.

Best
Erick

On Thu, Nov 17, 2011 at 12:18 PM, Brendan Grainger
brendan.grain...@gmail.com wrote:
 Hi,

 The default for catenateAll is 0 which we've been using on the 
 WordDelimiterFilter. What would be the possibly negative implications of 
 setting this to 1? So that:

 wi-fi-800

 would produce the tokens:

 wi, fi, wifi, 800, wifi800

 for example?

 Thanks


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan
 Hi Ahmet,
 
 Thanks for the link.
 
 I'm a bit puzzled with the explanation found there
 regarding lower casing:
 
 These queries are case-insensitive anyway because
 QueryParser makes them
 lowercase.
 
 that's exactly what I want to achieve, but somehow the
 queries *are*
 case-sensitive. Probably I should play around with code of
 a query parser.

There is an effort for this : 
https://issues.apache.org/jira/browse/SOLR-218
You can vote this issue. For the time being you can lowercase them in the 
client side.


Re: wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
OK.

Actually I have just checked the source code of Lucene's QueryParser and
lowercaseExpandedTerms there is set to true by default (version 3.4). The
code there does lower-casing by default. So in that sense I don't need to
do anything in the client code. Is something wrong here?

On Fri, Nov 18, 2011 at 3:49 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Ahmet,
 
  Thanks for the link.
 
  I'm a bit puzzled with the explanation found there
  regarding lower casing:
 
  These queries are case-insensitive anyway because
  QueryParser makes them
  lowercase.
 
  that's exactly what I want to achieve, but somehow the
  queries *are*
  case-sensitive. Probably I should play around with code of
  a query parser.

 There is an effort for this :
 https://issues.apache.org/jira/browse/SOLR-218
 You can vote this issue. For the time being you can lowercase them in the
 client side.




-- 
Regards,

Dmitry Kan


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan
 Actually I have just checked the source code of Lucene's
 QueryParser and
 lowercaseExpandedTerms there is set to true by default
 (version 3.4). The
 code there does lower-casing by default. So in that sense I
 don't need to
 do anything in the client code. Is something wrong here?

But SolrQueryParser extends that and default behavior may different. For 
clarification see source code of SolrQueryParser.


Re: wild card search and lower-casing

2011-11-18 Thread Dmitry Kan
You're right:

public SolrQueryParser(IndexSchema schema, String defaultField) {
...
setLowercaseExpandedTerms(false);
...
}

OK, thanks for pointing.

On Fri, Nov 18, 2011 at 4:12 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Actually I have just checked the source code of Lucene's
  QueryParser and
  lowercaseExpandedTerms there is set to true by default
  (version 3.4). The
  code there does lower-casing by default. So in that sense I
  don't need to
  do anything in the client code. Is something wrong here?

 But SolrQueryParser extends that and default behavior may different. For
 clarification see source code of SolrQueryParser.




-- 
Regards,

Dmitry Kan


Re: wild card search and lower-casing

2011-11-18 Thread Ahmet Arslan

 You're right:
 
 public SolrQueryParser(IndexSchema schema, String
 defaultField) {
 ...
 setLowercaseExpandedTerms(false);
 ...
 }

Please note that lowercaseExpandedTerms uses String.toLowercase() (uses  
default Locale) which is a Locale sensitive operation. 

In Lucene AnalyzingQueryParser exists for this purposes, but I am not sure if 
it is ported to solr.

  
http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/queryParser/analyzing/AnalyzingQueryParser.html


Re: Boosting is slow

2011-11-18 Thread Brian Lamb
Any ideas on this one?

On Thu, Nov 17, 2011 at 3:53 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Sorry, the query is actually:

 http://localhost:8983/solr/mycore/search/?q=test{!boost
 b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}start=sort=score+desc,mydate_field+descwt=xslttr=mysite.xsl


 On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb brian.l...@journalexperts.com
  wrote:

 Hi all,

 I have about 20 million records in my solr index. I'm running into a
 problem now where doing a boost drastically slows down my search
 application. A typical query for me looks something like:

 http://localhost:8983/solr/mycore/search/?q=test {!boost
 b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}

 I've tried several variations on the boost to see if that was the problem
 but even when doing something simple like:

 http://localhost:8983/solr/mycore/search/?q=test {!boost b=2}

 it is still really slow. Is there a different approach I should be taking?

 Thanks,

 Brian Lamb





Re: Boosting is slow

2011-11-18 Thread Yonik Seeley
On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb
brian.l...@journalexperts.com wrote:
 http://localhost:8983/solr/mycore/search/?q=test {!boost b=2}

 it is still really slow. Is there a different approach I should be taking?

I just tried what something similar to this (a non-boosted query vs a
simple boosted query)
on a 10M document test index.

#Non boosted
q=myfield:[* TO *] dummy_i:1

#Boosted
q={!boost b=2 v=$qq}
qq=myfield:[* TO *] dummy_i:1

Notes:
  - the dummy was just used to change the query so there would be no
cache it (I set it to something different for each try)
  - myfield is a single valued field that only has 10 unique terms
(so the range query should be fast), and does select all 10M docs

My results:  normal=386ms   boosted=481ms

-Yonik
http://www.lucidimagination.com


Solr filterCache size settings...

2011-11-18 Thread Andrew Lundgren
I am new to solr in general and trying to get a handle on the memory 
requirements for caching.   Specifically I am looking at the filterCache right 
now.  The documentation on size setting seems to indicate that it is the number 
of values to be cached.  Did I read that correctly, or is it really the amount 
of memory that will be set aside for the cache?

How do you determine how much cache each fq will consume?

Thank you!

--
Andrew Lundgren
lundg...@familysearch.org


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




RE: How to read values from dataimport.properties in a production environment

2011-11-18 Thread Ephraim Ofir
You can add it yourself in admin-extra.html

Ephraim Ofir

-Original Message-
From: Nico Luna [mailto:nicolaslun...@gmail.com] 
Sent: Friday, November 11, 2011 7:57 PM
To: solr-user@lucene.apache.org
Subject: How to read values from dataimport.properties in a production 
environment

I'm trying to know the values stored in the dataimport.properties file in a
production environment using the solr admin feature, so I copied the same
behaviour as:
[Schema] and [Config] but changing the contentType property (from
contentType=text/xml to contentType=text):

http://localhost:8080/solr/admin/file/?contentType=text;charset=utf-8file=dataimport.properties

I want to know if there is other way to know the dataimport.properties
values using the solr admin and if that solution may be a possible feature
to add in the solr admin. For example adding into he index.jsp file:

[ a
href=file/?contentType=text;charset=utf-8file=dataimport.propertiesdataimport.properties
]

Thanks, Nicolás

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-read-values-from-dataimport-properties-in-a-production-environment-tp3500453p3500453.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: keeping master server indexes in sync after failover recovery

2011-11-18 Thread Uomesh
I am having similar scenario and we are having one Primary Master and One
Back Up Master. Load switching is happen using Big IP  load balancer. When
Primary master goes down Back Up Master will become active Primary master. 

We have added a heath check API in solr and when Primary master is back to
normal it will check last delta/full import completion time with server
startup time and heath API will only return Active Status  if delta/full
import completion time is greater than server startup time.

Hope this make sense.

Thanks,
Umesh


 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/keeping-master-server-indexes-in-sync-after-failover-recovery-tp3497417p3520451.html
Sent from the Solr - User mailing list archive at Nabble.com.


Index Update Strategy

2011-11-18 Thread David T. Webb
What are the general schools of thought on how to update an index?

 

I have a medium volume OLTP SaaS system.  I think my options are:

 

1)  Run the DIH delta-query every minutes to pull in changes

2)  Use Update events on the app to asynchronously create a bean
that represents my solr doc, then use SolrJ to add/update the doc.

 

Are there any other options.  Any advice from the veterans who have been
down this road before?

 

Thank you.

 

--

Sincerely,

David Webb