Re: Boosting non synonyms result

2011-05-18 Thread Paul Libbrecht

I do it the same but do not use the Dismax query which is a lot too unflexible.

In CurrikiSolr, I have my own QueryComponent which does all sorts of query 
expansion:
- it expands a simple term query to a query for the text in the stemmed variant 
and in the unstemmed variant with more boost
- it pre-parses to make sure that phrase-queries remain phrase queries and thus 
become unstemmed queries
- it converts prefix queries to queries in the unstemmed field only
- it uses parameters (used in the advanced search) to add queries (e.g. only 
resources with that topic)
- it applies some rights protections
- it would be the place to expand along the multiple languages if indexing each 
language in a separate field as I would do it
- it applies some application specific quality boosting (higher-ranked 
resources go higher)

I find that such a component is kind of best practice because it makes a server 
that can apply business logic (independently of hackers in the client), and 
gives me java to perform deep query processing instead of javascript for 
fragile string processing.
I guess I could find a way to extend intelligently, but I have not found it.

paul


Le 18 mai 2011 à 00:52, Jonathan Rochkind a écrit :

 I do it with two fields exactly how you say, but then use dismax to boost the 
 non-synonom-ed field higher than the synonym-ed field.  That is a lot easier 
 than trying to use a function query, which I'm not sure how to do either.
 
 On 5/17/2011 6:45 PM, Dmitriy Shvadskiy wrote:
 Hello,
 Is there a way to boost the result that is an exact match as oppose to
 synonym match when using query time synonyms?
 Given the query John Smith and synonyms
 Jonathan,Jonathan,John,Jon,Nat,Nathan
 
 I'd like result containing John Smith to be ranked higher then Jonathan
 Smith.
 My thinking was to do it by defining 2 fields: 1 with query time synonyms
 and 1 without and sort by a function query of a non-synonym field. Is it
 even possible? I can't quite figure out the syntax for this.
 I'm using Solr 3.1.
 
 Thanks,
 Dmitriy
 



solr/home property seting

2011-05-18 Thread kun xiong
Hi all,

I am wondering how could I set a path for solr/home property.

Our solr home is inside the solr.war, so I don't want a absolute path(will
deploy to different boxes).

Currently I hard code a relative path as solr/home property in web.xml.

  !-- People who want to hardcode their Solr Home directly into the
   WAR File can set the JNDI property here...
   --
   env-entry
   env-entry-namesolr/home/env-entry-name
   env-entry-value*../webapps/solr/home*/env-entry-value
   env-entry-typejava.lang.String/env-entry-type
/env-entry

But in this way, I have to start tomcat under bin/. Seems the root path here
is start path.

How could I set solr/home property to get ride of tomcat start path.

Thanks

Kun


Updating a multi-valued field

2011-05-18 Thread karanveer singh
I've been using ExternalFileField for external scoring so far, so that
the external field gets updated and not deleted and added

Now, I have a field which is multivalued. I cannot use
ExternalFileField as I need this field in the suggest component too.

Is there something other than ExternalFileField which will help me in
doing this?

Thanks!


Re: solr/home property seting

2011-05-18 Thread Grijesh
Why you have putted your solr home inside the Tomcat's webapps directory.
This is not the correct way.Put your Solr home somewhere other place outside
the servlet container and set your solr/home path accordingly as
env-entry 
   env-entry-namesolr/home/env-entry-name 
   env-entry-value/opt/solr/home/env-entry-value 
   env-entry-typejava.lang.String/env-entry-type 
/env-entry 

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-home-property-seting-tp2956003p2956047.html
Sent from the Solr - User mailing list archive at Nabble.com.


questions about request logging

2011-05-18 Thread Bernd Fehling

Dear list,
the poll about solr logging directed my interest to my log files.

Right out of the box the jetty request logs have all information
needed for the GET requests but only the path of POST requests.
Is it possible to have the POST requests logged the same way the
GET requests are logged?

If not, may be with a different logger?

The console is redirected to a file and there are both (GET and POST)
requests logged but it is mixed with all kind of log messages and
the request logs are not usable with webalizer or other log analyzer.
Is it somehow possible to get a useful log file from the console output?

Regards
Bernd


Re: filter cache and negative filter query

2011-05-18 Thread Juan Antonio Farré Basurte
Mmm... I had wondered whether solr reused filters this way (not having both the 
positive and negative versions) and I'm glad to see it does indeed reuse them.
What I don't like is that it systematically uses the positive version. 
Sometimes the negative version will give many less results (for example, in 
some cases I filter by documents not having a given field, and there are very 
few of them).
I think it would be much better that solr performed exactly the query requested 
and, if there's more than a 50% of documents that match the query, then it just 
stored the negated one. I think (without knowing almost at all how things are 
implemented) this shouldn't be a problem.
Is there any place where you can post a suggestion of improvement? :)
Anyway, it would be very useful to know exactly how the current versions work 
(I think the info in the message I'm answering is about version 1.1 and could 
have changed), because knowing it, one can sometimes manage to write, for 
example, a positive query that in fact returns the negative results. As a 
simple example, I believe that, for a boolean field, -field:true is exactly the 
same as +field:false, but the former is a negative query and the latter is a 
positive one.
So, knowing the exact behaviour of solr can help you write optimized filters 
when you know that one version will give many less hits than the other.

El 18/05/2011, a las 00:26, Yonik Seeley escribió:

 On Tue, May 17, 2011 at 6:17 PM, Markus Jelsma
 markus.jel...@openindex.io wrote:
 I'm not sure. The filter cache uses your filter as a key and a negation is a
 different key. You can check this easily in a controlled environment by
 issueing these queries and watching the filter cache statistics.
 
 Gotta hate crossing emails ;-)
 Anyway, this goes back to Solr 1.1
 
 5. SOLR-80: Negative queries are now allowed everywhere.  Negative queries
are generated and cached as their positive counterpart, speeding
generation and generally resulting in smaller sets to cache.
Set intersections in SolrIndexSearcher are more efficient,
starting with the smallest positive set, subtracting all negative
sets, then intersecting with all other positive sets.  (yonik)
 
 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco
 
 
 
 If I have a query with a filter query such as :  q=artfq=history and
 then run a second query  q=artfq=-history, will Solr realize that it
 can use the cached results of the previous filter query history  (in the
 filter cache) or will it not realize this and have to actually do a second
 filter query against the index  for not history?
 
 Tom
 



Re: How to test Solr Integartion - how to get EmbeddedSolrServer?

2011-05-18 Thread Gabriele Kahlout
Thinking more about it, I can solve my immediate problem by just
copy-pasting the classes I need into my own project packages (KISS
like herehttps://github.com/Filirom1/solr-test-exemple
).

I'd however suggest to refactor Solr code structure to be much more
defaults-compliant making it easier for external developers to understand,
and hopefully easier to maintain for committers (with fewer special-needs
configurations). I've done some of those refactorings on my local copy of
Solr and would be glad to contribute.

For this particular problem the KISS solution would be to create yet one
more module for Tests which depend on Solr Core and on the Test Framework.
The org burden of that extra module, versus the ease of building
configuration, I believe, outweights.



On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:


 http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency


 On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:




 On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:



 On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote:
  Solr Core should declare a test dependency on Solr Test Framework.

 I agree:

 - Solr Core should have a test-scope dependency on Solr Test Framework.
 - Solr Test Framework should have a compile-scope dependency on Solr
 Core.

 But Maven views this as a circular dependency.


 I've seen, but adding it with scope test /scope works. The logic:
 the src is compiled first and then re-used (I'm assuming maven does
 something smart about not including the full jar).


 Not quite. I've tried a demo and the reactor complains. I'll try to see if
 maven could become 'smarter', or if the 2-build phase solution will work.

 The projects in the reactor contain a cyclic reference: Edge between
 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and
 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in
 the graph org.apache:DummyCore:1.0-SNAPSHOT --
 com.mysimpatico:TestFramework:1.0-SNAPSHOT --
 org.apache:DummyCore:1.0-SNAPSHOT - [Help 1]







 The workaround: Solr Core includes the source of Solr Test Framework as
 part of its test source code.  It's not pretty, but it works.

 I'd be happy to entertain other (functional) approaches.


 In dp4j.com pom.xml I build in 2 phases to compile with the same
 annotations in the project itself (but i don't think we need that here)



 Steve




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A valid
 code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


How to list/see all the indexed terms of a particular field in a document?

2011-05-18 Thread Gnanakumar
Hi,

I'm using Apache Solr v3.1.

How do I list/get to see all the indexed terms of a particular field in a
document (by passing Unique Key ID of the document)?

For example, I've the following field definition in schema.xml:

field name=mydocumentid type=string indexed=true stored=true
required=true /
field name=mytextcontent type=text indexed=true stored=true
required=true /

In this case, I expect/want to list/see all the indexed terms of a
particular document (mydocumentid:x) for the document field
mytextcontent.

Regards,
Gnanam



Re: How to list/see all the indexed terms of a particular field in a document?

2011-05-18 Thread Gabriele Kahlout
ant luke?

On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote:

 Hi,

 I'm using Apache Solr v3.1.

 How do I list/get to see all the indexed terms of a particular field in a
 document (by passing Unique Key ID of the document)?

 For example, I've the following field definition in schema.xml:

 field name=mydocumentid type=string indexed=true stored=true
 required=true /
 field name=mytextcontent type=text indexed=true stored=true
 required=true /

 In this case, I expect/want to list/see all the indexed terms of a
 particular document (mydocumentid:x) for the document field
 mytextcontent.

 Regards,
 Gnanam




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Using solandra

2011-05-18 Thread karanveer singh
I've recently switched from solr+cassandra to solandra.

When I try to run solandra using java -jar start.jar in solandra-app, it
gives me the following error:


java.lang.ExceptionInInitializerError
at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
configuration.*
at
org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)

How exactly do I configure the log4j configuration?

Karan


Re: Using solandra

2011-05-18 Thread Stefan Matheis
Karan,

following the Readme (https://github.com/tjake/Solandra#readme) it's:

From the Solandra base directory:
$ mkdir /tmp/cassandra-data
$ ant
$ cd solandra-app
$ ./start-solandra.sh

Regards
Stefan

On Wed, May 18, 2011 at 12:40 PM, karanveer singh
karan.korn...@gmail.com wrote:
 I've recently switched from solr+cassandra to solandra.

 When I try to run solandra using java -jar start.jar in solandra-app, it
 gives me the following error:


 java.lang.ExceptionInInitializerError
 at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
 at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
 at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
 at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
 at
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
 at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
 at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
 at org.mortbay.jetty.Server.doStart(Server.java:224)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.mortbay.start.Main.invokeMain(Main.java:183)
 at org.mortbay.start.Main.start(Main.java:497)
 at org.mortbay.start.Main.main(Main.java:115)
 Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
 configuration.*
 at
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)

 How exactly do I configure the log4j configuration?

 Karan



sorting on date field in facet query

2011-05-18 Thread Dmitry Kan
Hello list,

Is it possible to sort on date field in a facet query in SOLR 3.1?

-- 
Regards,

Dmitry Kan


Re: Using solandra

2011-05-18 Thread karanveer singh
Thanks Stefan! I got it started.

Also, is there a way to import xml documents?
When I run 2-import-data.sh with only xml documents in the data directory,
it gives me the following :

Loading data to solandra, note: this importer uses a slow xml parser
Exception in thread main java.lang.RuntimeException: Directory doesn't
contain sgml files!
at
org.apache.solr.solrjs.sgml.reuters.ReutersService.readDirectory(ReutersService.java:207)
at
org.apache.solr.solrjs.sgml.reuters.ReutersService.main(ReutersService.java:64)
Data loaded, now open ./website/index.html in your favorite browser!


On Wed, May 18, 2011 at 4:20 PM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Karan,

 following the Readme (https://github.com/tjake/Solandra#readme) it's:

 From the Solandra base directory:
 $ mkdir /tmp/cassandra-data
 $ ant
 $ cd solandra-app
 $ ./start-solandra.sh

 Regards
 Stefan

 On Wed, May 18, 2011 at 12:40 PM, karanveer singh
 karan.korn...@gmail.com wrote:
  I've recently switched from solr+cassandra to solandra.
 
  When I try to run solandra using java -jar start.jar in solandra-app, it
  gives me the following error:
 
 
  java.lang.ExceptionInInitializerError
  at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
  at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
  at
 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
  at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
  at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
  at org.mortbay.jetty.Server.doStart(Server.java:224)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at org.mortbay.start.Main.invokeMain(Main.java:183)
  at org.mortbay.start.Main.start(Main.java:497)
  at org.mortbay.start.Main.main(Main.java:115)
  Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
  configuration.*
  at
 
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)
 
  How exactly do I configure the log4j configuration?
 
  Karan
 



RE: How to list/see all the indexed terms of a particular field in a document?

2011-05-18 Thread Gnanakumar
So this cannot be queried/listed using Apache Solr?

-Original Message-
From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] 
Sent: Wednesday, May 18, 2011 3:36 PM
To: solr-user@lucene.apache.org; gna...@zoniac.com
Subject: Re: How to list/see all the indexed terms of a particular field in a 
document?

ant luke?

On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote:

 Hi,

 I'm using Apache Solr v3.1.

 How do I list/get to see all the indexed terms of a particular field in a
 document (by passing Unique Key ID of the document)?

 For example, I've the following field definition in schema.xml:

 field name=mydocumentid type=string indexed=true stored=true
 required=true /
 field name=mytextcontent type=text indexed=true stored=true
 required=true /

 In this case, I expect/want to list/see all the indexed terms of a
 particular document (mydocumentid:x) for the document field
 mytextcontent.

 Regards,
 Gnanam




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ? L(LON*) ? ?x. (x ? MyInbox ? Acknowledges(x, this) ? time(x)
 Now + 48h) ? resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
?x. x ? MyInbox ? from(x) ? MySafeSenderList ? (?y. y ? subject(x) ? y ?
L(-[a-z]+[0-9]X)).



Re: Using solandra

2011-05-18 Thread Stefan Matheis
Karan,

this data-import script is made especially for importing the
demo-data. To index xml documents (like you'd do it normally w/ solr)
use for example
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/exampledocs/post.sh
- and don't forget to adjust the URL, according to your solandra
setup.

Regards
Stefan

On Wed, May 18, 2011 at 1:25 PM, karanveer singh
karan.korn...@gmail.com wrote:
 Thanks Stefan! I got it started.

 Also, is there a way to import xml documents?
 When I run 2-import-data.sh with only xml documents in the data directory,
 it gives me the following :

 Loading data to solandra, note: this importer uses a slow xml parser
 Exception in thread main java.lang.RuntimeException: Directory doesn't
 contain sgml files!
 at
 org.apache.solr.solrjs.sgml.reuters.ReutersService.readDirectory(ReutersService.java:207)
 at
 org.apache.solr.solrjs.sgml.reuters.ReutersService.main(ReutersService.java:64)
 Data loaded, now open ./website/index.html in your favorite browser!


 On Wed, May 18, 2011 at 4:20 PM, Stefan Matheis 
 matheis.ste...@googlemail.com wrote:

 Karan,

 following the Readme (https://github.com/tjake/Solandra#readme) it's:

 From the Solandra base directory:
 $ mkdir /tmp/cassandra-data
 $ ant
 $ cd solandra-app
 $ ./start-solandra.sh

 Regards
 Stefan

 On Wed, May 18, 2011 at 12:40 PM, karanveer singh
 karan.korn...@gmail.com wrote:
  I've recently switched from solr+cassandra to solandra.
 
  When I try to run solandra using java -jar start.jar in solandra-app, it
  gives me the following error:
 
 
  java.lang.ExceptionInInitializerError
  at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
  at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
  at
 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
  at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
  at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
  at org.mortbay.jetty.Server.doStart(Server.java:224)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at org.mortbay.start.Main.invokeMain(Main.java:183)
  at org.mortbay.start.Main.start(Main.java:497)
  at org.mortbay.start.Main.main(Main.java:115)
  Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
  configuration.*
  at
 
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)
 
  How exactly do I configure the log4j configuration?
 
  Karan
 




Re: How to list/see all the indexed terms of a particular field in a document?

2011-05-18 Thread Stefan Matheis
Gnanam,

have a look to http://wiki.apache.org/solr/LukeRequestHandler

Regards
Stefan

On Wed, May 18, 2011 at 1:30 PM, Gnanakumar gna...@zoniac.com wrote:
 So this cannot be queried/listed using Apache Solr?

 -Original Message-
 From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
 Sent: Wednesday, May 18, 2011 3:36 PM
 To: solr-user@lucene.apache.org; gna...@zoniac.com
 Subject: Re: How to list/see all the indexed terms of a particular field in a 
 document?

 ant luke?

 On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote:

 Hi,

 I'm using Apache Solr v3.1.

 How do I list/get to see all the indexed terms of a particular field in a
 document (by passing Unique Key ID of the document)?

 For example, I've the following field definition in schema.xml:

 field name=mydocumentid type=string indexed=true stored=true
 required=true /
 field name=mytextcontent type=text indexed=true stored=true
 required=true /

 In this case, I expect/want to list/see all the indexed terms of a
 particular document (mydocumentid:x) for the document field
 mytextcontent.

 Regards,
 Gnanam




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ? L(LON*) ? ?x. (x ? MyInbox ? Acknowledges(x, this) ? time(x)
  Now + 48h) ? resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ?x. x ? MyInbox ? from(x) ? MySafeSenderList ? (?y. y ? subject(x) ? y ?
 L(-[a-z]+[0-9]X)).




how to work cache and improve performance phrase query included wildcard

2011-05-18 Thread Jason, Kim
Hi, all

I have two questions.
First,
I'm wondering how filterCache, queryResultCache, documentCache are applied.
After searching query1  OR query2 OR query3 ... , I searched query0  OR
query2 OR query3 ... .
Just query1 and query0 are difference.
But query time was not fast.
When are the caches applied?

Second,
I have 5 or more phrase queries included wildcard per query such as query1*
query2*~2 OR query3* query4*~2 ...
In the worst case, phrase queries included wildcard in one query are more
than 30.
QTime is more than 60 second.

Please give any idea to improve performance.

I have 2.5 million full text index.
That is running 10 shards on 1 tomcat.

Thanks,
Jason

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-work-cache-and-improve-performance-phrase-query-included-wildcard-tp2956671p2956671.html
Sent from the Solr - User mailing list archive at Nabble.com.


I need to improve highlighting

2011-05-18 Thread bryan rasmussen
Hi,

If I do a search
http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
the lst name=highlighting subtree I get
arr name=all_text
−
str
Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
/str
/arr
/lst


What I need to do is to either

 1. Return all of all_text which should be possible by setting
hl.fragsize=0 but I still never go beyond the default for the field (I
can go less than 100 but not more)
2. Get a count of number of highlighted instances(preferable) or
return each highlighted text in a separate str element - so
strkongeriget/strstrkongeriget/str


thanks,
Bryan Rasmussen


Re: How to test Solr Integartion - how to get EmbeddedSolrServer?

2011-05-18 Thread Erick Erickson
You've probably seen this page: http://wiki.apache.org/solr/HowToContribute,
but here it is for reference

Go ahead and open a JIRA at https://issues.apache.org/jira/browse/SOLR
(you need to create an account) and attach your changes as a patch. That
gets it into the system and folks can start commenting on what they
think the implications are. One of the committers needs to pick it up,
but you can prompt G...

Yonik's law of patches reads:

A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at all.

So don't worry about a completely polished patch for the first cut, it's often
helpful for people to see the early stages to help steer the effort.

Best
Erick

On Wed, May 18, 2011 at 5:41 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Thinking more about it, I can solve my immediate problem by just
 copy-pasting the classes I need into my own project packages (KISS
 like herehttps://github.com/Filirom1/solr-test-exemple
 ).

 I'd however suggest to refactor Solr code structure to be much more
 defaults-compliant making it easier for external developers to understand,
 and hopefully easier to maintain for committers (with fewer special-needs
 configurations). I've done some of those refactorings on my local copy of
 Solr and would be glad to contribute.

 For this particular problem the KISS solution would be to create yet one
 more module for Tests which depend on Solr Core and on the Test Framework.
 The org burden of that extra module, versus the ease of building
 configuration, I believe, outweights.



 On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout
 gabri...@mysimpatico.comwrote:


 http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency


 On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:




 On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:



 On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote:
  Solr Core should declare a test dependency on Solr Test Framework.

 I agree:

 - Solr Core should have a test-scope dependency on Solr Test Framework.
 - Solr Test Framework should have a compile-scope dependency on Solr
 Core.

 But Maven views this as a circular dependency.


 I've seen, but adding it with scope test /scope works. The logic:
 the src is compiled first and then re-used (I'm assuming maven does
 something smart about not including the full jar).


 Not quite. I've tried a demo and the reactor complains. I'll try to see if
 maven could become 'smarter', or if the 2-build phase solution will work.

 The projects in the reactor contain a cyclic reference: Edge between
 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and
 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in
 the graph org.apache:DummyCore:1.0-SNAPSHOT --
 com.mysimpatico:TestFramework:1.0-SNAPSHOT --
 org.apache:DummyCore:1.0-SNAPSHOT - [Help 1]







 The workaround: Solr Core includes the source of Solr Test Framework as
 part of its test source code.  It's not pretty, but it works.

 I'd be happy to entertain other (functional) approaches.


 In dp4j.com pom.xml I build in 2 phases to compile with the same
 annotations in the project itself (but i don't think we need that here)



 Steve




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A valid
 code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the 

Re: sorting on date field in facet query

2011-05-18 Thread Erick Erickson
Can you provide an example of what you are trying to do? Are you
referring to ordering the result set or the facet information?

Best
Erick

On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote:
 Hello list,

 Is it possible to sort on date field in a facet query in SOLR 3.1?

 --
 Regards,

 Dmitry Kan



Re: I need to improve highlighting

2011-05-18 Thread Stefan Matheis
Bryan, on Q2 - what about using xpath like 'str/em' ?

Regards
Stefan

On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen



Re: how to work cache and improve performance phrase query included wildcard

2011-05-18 Thread Erick Erickson
See below:

On Wed, May 18, 2011 at 8:15 AM, Jason, Kim hialo...@gmail.com wrote:
 Hi, all

 I have two questions.
 First,
 I'm wondering how filterCache, queryResultCache, documentCache are applied.
 After searching query1  OR query2 OR query3 ... , I searched query0  OR
 query2 OR query3 ... .
 Just query1 and query0 are difference.
 But query time was not fast.
 When are the caches applied?


Caches don't really count here. You're not using filter queries so
filterCache isn't
germane. documentCache is only for holding the document read off disk,
it probably
isn't doing much in your example that would impact differences in search
time unless you're returning massive numbers of documents.

queryResultCache isn't getting re-used. Think of this as a list of document IDs
keyed by the *entire* query. by making any changes to the query you're not
going to use the cache. To understand this, consider that the clauses aren't
really separate. Any additional clause could easily change the scoring of
a document that matched both queries. So re-using the cache on a by clause
basis wouldn't produce correct results.

In other words, caches aren't going to help you here.

 Second,
 I have 5 or more phrase queries included wildcard per query such as query1*
 query2*~2 OR query3* query4*~2 ...
 In the worst case, phrase queries included wildcard in one query are more
 than 30.
 QTime is more than 60 second.


Can we see the results of attaching debugQuery=on to the URL?
your pseudo-code may well be hiding the issue. We don't know what
query parser you're using. Wildcards aren't usually analyzed for phrase
queries for instance, so on the face of it there's not much that can be said...

Additionally, the field type and field definitions from your schema.xml
would be helpful for the fields you're searching on.

Best
Erick

 Please give any idea to improve performance.

 I have 2.5 million full text index.
 That is running 10 shards on 1 tomcat.

 Thanks,
 Jason

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/how-to-work-cache-and-improve-performance-phrase-query-included-wildcard-tp2956671p2956671.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: I need to improve highlighting

2011-05-18 Thread bryan rasmussen
 Bryan, on Q2 - what about using xpath like 'str/em' ?

How do I do that? The highlighting result, at least in the solr
installation I have (3. something) returns the em as escaped markup.
Is there an xpath parameter or configuration I can set for
highlighting, or a way to change the em elements to be actual
elements (hl.fomatter maybe?)

Thanks,
Bryan Rasmussen


 On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen




Re: I need to improve highlighting

2011-05-18 Thread Erick Erickson
Just checking, but have you tried setting
hl.fragsize=very large number as suggested here:

http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

If that's not the problem, please show us the results of
attaching debugQuery=on to the request, that may shed
some light on the problem.

Best
Erick

On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen



Re: I need to improve highlighting

2011-05-18 Thread bryan rasmussen
yeah but you just got me to check again, what I thought was ignoring
my setting of hl.fragsize and always using the default ended up just
returning a smaller field higher ranked, so when I set it to 1000 and
saw the same as what I saw with 100 was the just the off chance that
there was only 100 to see in the first 10 results. funny.

thanks,
Bryan Rasmussen

On Wed, May 18, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.com wrote:
 Just checking, but have you tried setting
 hl.fragsize=very large number as suggested here:

 http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

 If that's not the problem, please show us the results of
 attaching debugQuery=on to the request, that may shed
 some light on the problem.

 Best
 Erick

 On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen




Re: Anyone having these Replication issues as well?

2011-05-18 Thread kenf_nc
Thanks Markus, for your patience with getting the response in as well the
comments.

This is my Dev environment, I'm actually going to be setting up a new
master-slave configuration in a different environment today. I'll see if
it's environment specific or not. One thing I didn't mention, wasn't sure it
was germane, is that these servers are in Amazon EC2. Also, the master is
currently on a 32 bit OS the slaves are on 64 bit OS's. Just the order in
which the servers are getting upgraded in dev. 

The master has AutoCommit turned on at 30 second intervals. Even if nothing
is getting indexed, could an AutoCommit occurring during a replication
request cause a failed replication?

Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyone-having-these-Replication-issues-as-well-tp2954365p2957127.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: I need an available solr lucene consultant

2011-05-18 Thread bill
I am interested in hearing more about this opportunity. Feel free to contact
me at b...@csrinstitute.net. 

Thanks
Bill 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-an-available-solr-lucene-consultant-tp2954023p2957137.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Exact match

2011-05-18 Thread Jan Høydahl
There's a JIRA issue assigned to this feature: 
https://issues.apache.org/jira/browse/SOLR-1980
However, it's not yet implemented. Anyone?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 17. mai 2011, at 15.51, Alex Grilo wrote:

 Hi,
 
 Can I make a query that returns only exact match or do I have to change the
 fields to achieve that?
 
 Thanks in advance
 
 Alex Grilo



Does every Solr request-response require a running server?

2011-05-18 Thread Gabriele Kahlout
Hello,

I'm wondering if Solr Test framework at the end of the day always runs an
embedded/jetty server (which is the only way to interact with solr, i.e. no
web server -- no solr) or in the tests they interact without one, calling
directly the under line methods?

The latter seems to be the case trying to understand SolrTestCaseJ4. That
would be more white-box than otherwise.

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: UIMA analysisEngine path

2011-05-18 Thread Tommaso Teofili
2011/5/17 chamara chama...@gmail.com

 Hi
  My solr version is 3.1.0. I actually figured out what my problem was. I
 used the guide

 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt
 and
 it seems that i have placed the code snippet inside an another xml element
 not under config.


One more thing is that you are using Solr 3.1.0 but reading README from
trunk (4.0-SNAPSHOT), you should use this one instead:
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/README.txt


 Will the UIMA work with solr version 1.4.1 as well?


the UpdateRequestProcessorChain API has changed from 1.4.1 to 3.1.0 so,
although it should be easy to back port, it's not compatible with Solr 1.4.1
out of the box.
Tommaso



 Thanks again



 On Tue, May 17, 2011 at 12:13 PM, Tommaso Teofili [via Lucene] 
 ml-node+2952043-2093755785-399...@n3.nabble.com wrote:

  Hi again Chamara,
 
  2011/5/17 chamara [hidden email]
 http://user/SendEmail.jtp?type=nodenode=2952043i=0
 
 
   Thanks Tommaso, yes this occurred after copying the .jar files to the
 lib
 
   folder. When i do not copy them from contrib/uima/lib and have the
   solrconfig.xml to point to those libs i get the following error. I am a
  bit
   confused why a class path was chosen to get the analysis engine
  descriptor
   .
  
 
  I think it'd be nice if you could tell which version of Solr you're
 using,
  how you configured the Solr-UIMA module in solrconfig.xml.
 
 
 
   The error is prompted when the /update request handler calls this looks
   like
   it is related to the class path(/org/apache/uima/desc/).
   SEVERE: Error in xpath:java.lang.RuntimeException: solrconfig.xml
 missing
 
   /confi
   g/uimaConfig/analysisEngine
  
 
  This seems to be related to a missing /config/uimaConfig/analysisEngine
  element inside solrconfig.xml.
  Regards,
  Tommaso
 
 
  
   On Mon, May 16, 2011 at 6:19 PM, Tommaso Teofili [via Lucene] 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=2952043i=1
  wrote:
  
The error you pasted doesn't seem to be related to a (class)path
 issue
   but
more likely to be related to a Solr instance at 1.4.1/3.1.0 and
  Solr-UIMA
module at 3.1.0/4.0-SNAPSHOT(trunk); it seems that the error raises
  from
UpdateRequestProcessorFactory API changed.
Hope this helps,
Tommaso
   
   
Il giorno 16/mag/2011, alle ore 18.54, chamara ha scritto:
   
 Hi Tommaso,
 Thanks for the quick reply. I had copied the lib files and
 followed instructions on
http://wiki.apache.org/solr/SolrUIMA#Installation.
 However i get this error. The AnalysisEngine has the default class
  path
 which is /org/apache/uima/desc/.

 SEVERE: org.apache.solr.common.SolrException: Error Instantiating
 UpdateRequestP
 rocessorFactory,
 org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor
 y is not a
   org.apache.solr.update.processor.UpdateRequestProcessorFactory
   


 Regards,
 Chamara


 On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] 
 [hidden email] 
  http://user/SendEmail.jtp?type=nodenode=2948866i=0
 
wrote:

 Hello,

 if you want to take the descriptor from a jar, provided that you
configured

 the jar inside a lib element in solrconfig, then you just need
 to
write
 the correct classpath in the analysisEngine element.
 For example if your descriptor resides in com/something/desc/ path
inside
 the jar then you should set the analysisEngine element as
 /com/something/desc/descriptorname.xml
 If you instead need to get the descriptor from filesystem try the
   patch
in
 SOLR-2501 [1].
 Hope this helps,
 Tommaso

 [1] :  https://issues.apache.org/jira/browse/SOLR-2501

 2011/5/13 chamara [hidden email]
http://user/SendEmail.jtp?type=nodenode=2946920i=0
   


 Hi,
 Is this code line 57 needs to be changed to the location where
 the
   jar
 files(library files) resides?
 URL url = this.getClass().getResource(location of the jar
  files);
   I
   
 did
 change it but no luck so far. Let me know what i am doing wrong?

 --
 View this message in context:


   
  
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 If you reply to this email, your message will be added to the
   discussion
   
 below:


   





 --
 --- Chamara 


 --
 View this message in context:
   
  
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html
   
 Sent from the Solr - User mailing list archive at Nabble.com.
   
   
   
--
 If you reply to this email, your message will be added 

Re: lucene parser, negative OR operands

2011-05-18 Thread Jonathan Rochkind

On 5/17/2011 8:00 PM, Yonik Seeley wrote:

This doesn't have to do with Solr's support of pure-negative top-level
queries, but does have to do with
a long standing confusion of how the lucene queryparser works with
some of the operators (i.e. not really boolean logic).

In a Lucene BooleanQuery, clauses are mandatory, optional, or prohibited.
-foo OR -bar actually parses to a boolean query with two prohibited
clauses... essentially the
same as -foo AND -bar.  You can see this by adding debugQuery=true to
the request.


Thanks Yonik. I recall hearing about this before, but was vague on the 
details, thanks for supplying some and refreshing my memory.


So I guess there is no such thing as an optional prohibited clause.  
Which is what makes -one OR -two the same thing as -one AND -two.  
Actually, yeah, an optional prohibited clause doesn't reallly even 
make sense. Hmm.


If I want to understand more about how the lucene query parser does it's 
thing, can anyone suggest the source files I should be looking at?


If I really do want actual boolean logic behavior, what are my options?  
I guess one is trying to write my own query parser.


Hmm, for that particular query, what about using parens to force a 
sub-query?


(-one) OR (-two)

Ha, nope, that runs into a different problem (or is it the same 
problem?), and always returns 0 hits.  It looks like the lucene query 
parser can't handle a pure-negative sub-query like that seperate by OR?  
Not sure why, can anyone explain that one?


For that particular pattern, this crazy refactoring of the query does 
work and get the actual boolean logic result of (not 'one') OR (not 
'two'):


(*:* AND -one) OR (*:* AND -two)

Phew, crazy stuff. So that's a weird solution to getting actual boolean 
logic behavior for that pattern, but in general, I'm kind of wanting a 
parser that will give actual boolean logic behavior. Maybe someday I can 
find time to write it in Java (not the quickest thing for me, not 
familiar with the code at all).


Jonathan



Re: Does every Solr request-response require a running server?

2011-05-18 Thread Yonik Seeley
On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Hello,

 I'm wondering if Solr Test framework at the end of the day always runs an
 embedded/jetty server (which is the only way to interact with solr, i.e. no
 web server -- no solr) or in the tests they interact without one, calling
 directly the under line methods?

 The latter seems to be the case trying to understand SolrTestCaseJ4. That
 would be more white-box than otherwise.

Solr does either, depending on the test.  Most tests start only an
embedded solr server w/ no web server, but others use an embedded
jetty server so one can talk HTTP to it.  JettySolrRunner is used for
the latter.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Does every Solr request-response require a running server?

2011-05-18 Thread Gabriele Kahlout
On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I'm wondering if Solr Test framework at the end of the day always runs an
  embedded/jetty server (which is the only way to interact with solr, i.e.
 no
  web server -- no solr) or in the tests they interact without one,
 calling
  directly the under line methods?
 
  The latter seems to be the case trying to understand SolrTestCaseJ4. That
  would be more white-box than otherwise.

 Solr does either, depending on the test.

 Most tests start only an
 embedded solr server w/ no web server,


What is confusing me is the solr server. Is it SolrCore? In what aspects is
it a 'server'? In my understanding it's the core of the Solr Web application
which makes up the servlets interface, i.e. it's under the servlets not on
top of them.


 but others use an embedded
 jetty server so one can talk HTTP to it.  JettySolrRunner is used for
 the latter.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Set operations on multiple queries with different qf parameters

2011-05-18 Thread Jonathan Rochkind
Don't know of any other documentation. There might be some minimal page 
on the wiki somewhere, but I can never find it either although I have 
some memory of seeing it once, it didn't have anything that the blog 
post didn't.


I think 'mm' _should_ work as a LocalParam in a nested query, I use it 
myself in code and it seems to work.


But not too surprised that 'fq' doesn't (although I haven't verified 
that myself). If indeed it doesn't, here would be a hacky way to get the 
same semantics, although it won't use the filter cache for the fq.


If this doesn't work:

defType=luceneq= _query_:{!edismax qf='p,q,r' fq='field1:xyz'}abc
def AND _query_:{!edismax mm=100% qf='q, r, s'}jlk


Then this should, we can just put it in our top-level lucene query as an 
additional condition.


defType=luceneq= (_query_:{!edismax qf='p,q,r'}abc def  AND 
field1:xyz) AND _query_:{!edismax mm=100% qf='q, r, s'}jlk


Yeah, this starts to get painful, agreed, with unclear performance 
implications.





On 5/17/2011 10:44 PM, Nikhil Chhaochharia wrote:

Thanks, this looks good.  mm and fq don't seem to be working for a nested 
query, but I should be able to work around it.
I was unable to find much documentation on the Wiki, API docs or in the Solr 
book - please let me know if you are aware of any other documentation for this 
feature apart from the mentioned blog post.


Thanks,

Nikhil



- Original Message -
From: Jonathan Rochkindrochk...@jhu.edu
To: solr-user@lucene.apache.org; Nikhil Chhaochharianikhil...@yahoo.com
Cc:
Sent: Tuesday, 17 May 2011 8:52 PM
Subject: Re: Set operations on multiple queries with different qf parameters

One way to do it might be to use the Solr 'nested query' functionality.

http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

Not entirely sure this will work exactly as I've written it, but give
you some ideas of what nested query can do.  Note not fully URL-encoded
for clarity:

defType=luceneq= _query_:{!edismax qf='p,q,r' fq='field1:xyz'}abc
def AND _query_:{!edismax mm=100% qf='q, r, s'}jlk



On 5/17/2011 2:55 AM, Nikhil Chhaochharia wrote:

Hi,

I am using Solr 3.1 with edismax.  My frontend allows the user to create 
arbitrarily complex queries by modifying q, fq, qf and mm (only 1 and 100% are 
allowed) parameters.  The queries can then be saved by the user.

The user should be able to perform set operations on the saved searches.  For 
example, the user may want to see all documents which are returned both by 
saved search 1 and saved search 2 (equivalent to intersection of the two).

If the saved searches contain q, fq and/or mm, then I can combine the saved 
searches to create a new query which will be equivalent to their intersection.  
However, I can't figure out how to handle qf?

For example,

Query 1 = q=abc deffq=field1:xyzmm=1qf=p,q,r
Query 2 = q=jklmm=100%qf=q,r,s

How do I get the list of common documents which are present in the result set 
of both queries?



Thanks,
Nikhil




Re: Does every Solr request-response require a running server?

2011-05-18 Thread Yonik Seeley
On Wed, May 18, 2011 at 11:14 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:


 On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I'm wondering if Solr Test framework at the end of the day always runs
  an
  embedded/jetty server (which is the only way to interact with solr, i.e.
  no
  web server -- no solr) or in the tests they interact without one,
  calling
  directly the under line methods?
 
  The latter seems to be the case trying to understand SolrTestCaseJ4.
  That
  would be more white-box than otherwise.

 Solr does either, depending on the test.

  Most tests start only an
 embedded solr server w/ no web server,

 What is confusing me is the solr server. Is it SolrCore? In what aspects is
 it a 'server'? In my understanding it's the core of the Solr Web application
 which makes up the servlets interface, i.e. it's under the servlets not on
 top of them.

Look at TestHarness - it instantiates a CoreContainer.
When running as a webapp in a Jetty server, a DispatchFilter is
registered that instantiates the CoreContainer.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco





 but others use an embedded
 jetty server so one can talk HTTP to it.  JettySolrRunner is used for
 the latter.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Shawn Heisey

On 5/17/2011 10:00 AM, Chris Hostetter wrote:

: If I understand what you've said above correctly, removing the binding in
: solr.war would make it inherit the binding in jetty/tomcat/whatever, is that
: right?  That sounds like an awesome plan to me.  The example jetty server can
: be configured instead of solr.war.  Once you've answered this, I can submit my
: vote.

no, removing the bindings in solr.war would result in solr not logging
*anything* unless you manually added a jar (defining the bindings you
want) to the jetty (or tomcat) system classloader.


What I'd want to have is the ability to download Solr source code, not 
modify anything, create a .war, and drop it into an existing system that 
has my preferred logging already set up, which from what you are saying 
would also require that the example have a jar with the JDK bindings, 
and that everyone who sets up a more custom system create their own jar 
and put it somewhere it can be found.


What's involved in creating that jar?  Is it something that a novice 
could get done?  Is it something that could be prepackaged for the most 
common choices, or possibly already available on the Internet?


Thanks,
Shawn



JSON delete error with latest branch_3x

2011-05-18 Thread Paul Dlug
I updated to the latest branch_3x (r1124339) and I'm now getting the
error below when trying a delete by query or id. Adding documents with
the new format works as do the commit and optimize commands. Possible
regression due to SOLR-2496?

curl 'http://localhost:8988/solr/update/json?wt=json' -H
'Content-type:application/json' -d '{delete:{query:*:*}}'

Error 400 meaningless command:
delete:query=`*:*`,fromPending=false,fromCommitted=false

Problem accessing /solr/update/json. Reason:
  meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false


Re: JSON delete error with latest branch_3x

2011-05-18 Thread Yonik Seeley
On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false

Hmmm, looks like unit tests must be inadequate for the JSON format.
I'll look into it.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Anyone familiar with Solandra or Lucandra?

2011-05-18 Thread Jake Luciani
This will be possible once triggers are finished for cassandra, then we can
hook into CF inserts and auto index in solandra.

On Tue, May 17, 2011 at 5:10 PM, kenf_nc ken.fos...@realestate.com wrote:

 Ah. I see. That reduces its usefulness to me some. The multi-master aspect
 is
 still a big draw of course. But I was hoping this also added an integrated
 persistence layer to Solr as well.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2954320.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
http://twitter.com/tjake


Re: JSON delete error with latest branch_3x

2011-05-18 Thread Yonik Seeley
OK, I just fixed this on branch_3x.
Trunk is fine (it was an error in the 3x backport that wasn't caught
because the test doesn't go through the complete solr stack to the
update handkler).

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: 
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Hmmm, looks like unit tests must be inadequate for the JSON format.
 I'll look into it.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



Re: JSON delete error with latest branch_3x

2011-05-18 Thread Paul Dlug
Thanks Yonik, all my app's test cases now pass again.


--Paul

On Wed, May 18, 2011 at 2:04 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 OK, I just fixed this on branch_3x.
 Trunk is fine (it was an error in the 3x backport that wasn't caught
 because the test doesn't go through the complete solr stack to the
 update handkler).

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco


 On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: 
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Hmmm, looks like unit tests must be inadequate for the JSON format.
 I'll look into it.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco




Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread arian487
bump

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless
As far as I know this is not possible today with either Solr's 4.0
grouping impl or the new grouping module (soon to be grouping in Solr
3.x).

I'm not sure about the patch on SOLR-236 though.

But it's an interesting use case; it's a compound group key, right?
You want to group by a tuple (X, Y).  Can you open a Lucene issue for
this?  I'm not sure we can fix it today but I think the use case is
reasonable so we can at least discuss it on an issue...

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 2:23 PM, arian487 akarb...@tagged.com wrote:
 bump

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread arian487
Thanks for the reply!  How exactly do I open an issue?  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disable IDF scoring on certain fields

2011-05-18 Thread Brian Lamb
I believe I have applied the patch correctly. However, I cannot seem to
figure out where the similarity class I create should reside. Any tips on
that?

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 4:00 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thank you Robert for pointing this out. This is not being used for
 autocomplete. I already have another core set up for that :-)

 The idea is like I outlined above. I just want a multivalued field that
 treats every term in the field the same so that the only way documents
 separate themselves is by an unrelated boost and/or matching on multiple
 terms in that field.


 On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 Well, if you're experimental you can try trunk as Robert points out it has
 been fixed there. If not, i guess you're stuck with creating another core.

 If this fieldType specifically used for auto-completion? If so, another
 core,
 preferably on another machine, is in my opinion the way to go.
 Auto-completion
 is tough in terms of performance.

 Thanks Robert for pointing to the Jira ticket.

 Cheers

  Hi Markus,
 
  I was just looking at overriding DefaultSimilarity so your email was
 well
  timed. The problem I have with it is as you mentioned, it does not seem
  possible to do it on a field by field basis. Has anyone had any luck
 with
  doing some of the similarity functions on a field by field basis? I have
  need to do more than one of them and from what I can find, it seems that
  only computeNorm accounts for the name of the field.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Hi,
  
   Although you can configure per field TF (by omitTermFreqAndPositions)
 you
   can't
   do this for IDF. If you index is only used for this specific purpose
   (seems like an auto-complete index) then you can override
   DefaultSimilarity and return a static value for IDF. If you still want
   IDF for other fields then i
   think you have a problem because Solr doesn't yet support per-field
   similarity.
  
  
  
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
   a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
   Cheers,
  
Hi all,
   
I have a field defined in my schema.xml file as
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   
   analyzer
   
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
maxGramSize=25 side=front /
   
   /analyzer
   
/fieldType
field name=myfield multiValued=true type=edgengram
indexed=true stored=true required=false omitNorms=true /
   
I would like do disable IDF scoring on this field. I am not
 interested
in how rare the term is, I only care if the term is present or not.
The idea is that if a user does a search for myfield:dog OR
myfield:pony, that any document containing dog or pony would be
scored identically. In the case that both showed up, that record
 would
be moved to the top but all the records where they both showed up
would have the same score.
   
So long story short, how can I disable the idf score for this
particular field?
   
Thanks,
   
Brian Lamb





Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless
Start here: https://issues.apache.org/jira/browse/LUCENE

Create an account (it's free), open an issue and set the component to
modules/grouping, fill in the fields, and submit it :)

Then maybe make a patch and attach it!  Genericizing the per-doc
grouping key is important; we have an issue open for this already:
https://issues.apache.org/jira/browse/LUCENE-3099

So in theory if we had LUCENE-3099 done, a sub-class could create a
compound group key.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 3:34 PM, arian487 akarb...@tagged.com wrote:
 Thanks for the reply!  How exactly do I open an issue?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread arian487
https://issues.apache.org/jira/browse/SOLR-2526

modules/grouping was not a valid component so I just put it in search. 
Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless
Ahh, that's because you opened a Solr not a Lucene issue ;)

The modules (incl. new grouping module) are under Lucene.  That's
fine, we can leave it as a Solr issue.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 4:10 PM, arian487 akarb...@tagged.com wrote:
 https://issues.apache.org/jira/browse/SOLR-2526

 modules/grouping was not a valid component so I just put it in search.
 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Judioo
Hi,
I'm new to solr so apologies if the solution is already documented.
I have installed and populated a solr index using the examples as a template
with a version of the data below.

I have XML in the form of

  entity
resource
  guid123898-2092099098982/guid
  media_formatBlu-Ray/media_format
  updated2011-05-05T11:25:35+0500/updated
/resource
price currency=usd3.99price
discounts
  discount type=percentage rate=30
start=2011-05-03T00:00:00 end=2011-05-10T00:00:00 /

  discount type=decimal amount=1.99 coupon=1 /
  .
/discounts
aspect_ratio16:9/aspect_ratio
duration1620/duration
categories
  category id=drama /
  category id=horror /
/categories
rating
  rate id=D1contains some scenes which some viewers may find
upsetting/rate
/rating
...
media_typeVideo/media_type
/entity


Can I populate solr directly with this document (like I believe marklogic
does )?
If yes
Can I search on any attribute ( i.e. find all records where
/entity/resource/media_format equals blu-ray )

If no
What is the best practice to import the attributes above into solr ( i.e.
patterns for sub dividing / flattening document ).
Does solr support attached documents and if so is this advised ( how does it
affect performance ).

Any help is greatly appreciated. Pointers to documentation that address my
issues is even more helpful.

Thanks again


OJ


Re: Replication Clarification Please

2011-05-18 Thread Ravi Solr
Alexander, sorry for the delay in replying. I wanted to test out a few
hunches that I had before I get back to you.
Hurray!!!  I was able to resolve the issue. The problem was with the
cache settings in the solrconfig.xml. It was taking almost 15-20
minutes to warm up the caches on each commit, as we are commit heavy
(every 5 minutes) the replication was screaming for the new searcher
to be warmed and it would never get a chance to finish so it was
perennially backed up. We reduced the cache and autowarm counts and
now the replication is happy finishing within 20 seconds!! Thank you
again for all your support.

Thanks,

Ravi Kiran Bhaskar
The Washington Post
1150 15th St. NW
Washington, DC 20071

On Sun, May 15, 2011 at 3:12 AM, Alexander Kanarsky
alexan...@trulia.com wrote:
 Ravi,

 what is the replication configuration on both master and slave?
 Also could you list of files in the index folder on master and slave
 before and after the replication?

 -Alexander


 On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote:
 Sorry guys spoke too soon I guess. The replication still remains very
 slow even after upgrading to 3.1 and setting the compression off. Now
 Iam totally clueless. I have tried everything that I know of to
 increase the speed of replication but failed. if anybody faced the
 same issue, can you please tell me how you solved it.

 Ravi Kiran Bhaskar

 On Thu, May 12, 2011 at 6:42 PM, Ravi Solr ravis...@gmail.com wrote:
  Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
  from 1.4.1 to 3.1 and have made several changes to configuration. The
  configuration changes have worked nicely till now and the replication
  is finishing within the interval and not backing up. The changes we
  made are as follows
 
  1. Increased the mergeFactor from 10 to 15
  2. Increased ramBufferSizeMB to 1024
  3. Changed lockType to single (previously it was simple)
  4. Set maxCommitsToKeep to 1 in the deletionPolicy
  5. Set maxPendingDeletes to 0
  6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
  well over 75% to increase warming speed
  7. Increased the poll interval to 6 minutes and re-indexed all content.
 
  Thanks,
 
  Ravi Kiran Bhaskar
 
  On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
  alexan...@trulia.com wrote:
  Ravi,
 
  if you have what looks like a full replication each time even if the
  master generation is greater than slave, try to watch for the index on
  both master and slave the same time to see what files are getting
  replicated. You probably may need to adjust your merge factor, as Bill
  mentioned.
 
  -Alexander
 
 
 
  On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
  Hello Mr. Kanarsky,
                  Thank you very much for the detailed explanation,
  probably the best explanation I found regarding replication. Just to
  be sure, I wanted to test solr 3.1 to see if it alleviates the
  problems...I dont think it helped. The master index version and
  generation are greater than the slave, still the slave replicates the
  entire index form master (see replication admin screen output below).
  Any idea why it would get the whole index everytime even in 3.1 or am
  I misinterpreting the output ? However I must admit that 3.1 finished
  the replication unlike 1.4.1 which would hang and be backed up for
  ever.
 
  Master        http://masterurl:post/solr-admin/searchcore/replication
        Latest Index Version:null, Generation: null
        Replicatable Index Version:1296217097572, Generation: 12726
 
  Poll Interval         00:03:00
 
  Local Index   Index Version: 1296217097569, Generation: 12725
 
        Location: /data/solr/core/search-data/index
        Size: 944.32 MB
        Times Replicated Since Startup: 148
        Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
        Config Files Replicated At: null
        Config Files Replicated: null
        Times Config Files Replicated Since Startup: null
        Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
 
  Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
        Files Downloaded: 18 / 108
        Downloaded: 317.48 KB / 436.24 MB [0.0%]
        Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
        Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 
  KB/s
 
 
  Thanks,
  Ravi Kiran Bhaskar
 
  On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
  alexan...@trulia.com wrote:
   Ravi,
  
   as far as I remember, this is how the replication logic works (see
   SnapPuller class, fetchLatestIndex method):
  
   1. Does the Slave get the whole index every time during replication or
   just the delta since the last replication happened ?
  
  
   It look at the index version AND the index generation. If both slave's
   version and generation are the same as on master, nothing gets
   replicated. if the master's generation is greater than on slave, the
   slave fetches the delta files only (even if the 

Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Yury Kats
On 5/18/2011 4:19 PM, Judioo wrote:

 Any help is greatly appreciated. Pointers to documentation that address my
 issues is even more helpful.

I think this would be a good start:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource


Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Judioo
The data is being imported directly from mysql. The document is however
indeed a good starting place.
Thanks

2011/5/18 Yury Kats yuryk...@yahoo.com

 On 5/18/2011 4:19 PM, Judioo wrote:

  Any help is greatly appreciated. Pointers to documentation that address
 my
  issues is even more helpful.

 I think this would be a good start:

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource



Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Judioo
Great document. I can see how to import the data direct from the database.
However it seems as though I need to write xpath's in the config to extract
the fields that I wish to transform into an solr document.

So it seems that there is no way of storing the document structure in solr
as is?


2011/5/18 Yury Kats yuryk...@yahoo.com

 On 5/18/2011 4:19 PM, Judioo wrote:

  Any help is greatly appreciated. Pointers to documentation that address
 my
  issues is even more helpful.

 I think this would be a good start:

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource



Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Jan Høydahl
Hi,

If you've setup your Tomcat with log4j logging, and want to add Solr, within 
the same logging config, you need to:
 #1. Remove slf4j-jdk14-1.6.1.jar from solr.war (unpack, remove, repack)
 #2. Download slf4j-log4j12-1.6.1.jar (from slf4j.org) and place it in e.g. 
tomcat/shared/lib

If solr.war shipped without a pre-packaged binding, you could skip #1. The 
binding jar you deploy to appserver lib would also take effect for any other 
webapp using slf4j deployed to the same app-server.

An alternative to manually repackage solr.war as in #1, is Hoss' suggestion in 
SOLR-2487 of a new ANT option to build Solr artifacts without the JUL binding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. mai 2011, at 18.33, Shawn Heisey wrote:

 On 5/17/2011 10:00 AM, Chris Hostetter wrote:
 : If I understand what you've said above correctly, removing the binding in
 : solr.war would make it inherit the binding in jetty/tomcat/whatever, is 
 that
 : right?  That sounds like an awesome plan to me.  The example jetty server 
 can
 : be configured instead of solr.war.  Once you've answered this, I can 
 submit my
 : vote.
 
 no, removing the bindings in solr.war would result in solr not logging
 *anything* unless you manually added a jar (defining the bindings you
 want) to the jetty (or tomcat) system classloader.
 
 What I'd want to have is the ability to download Solr source code, not modify 
 anything, create a .war, and drop it into an existing system that has my 
 preferred logging already set up, which from what you are saying would also 
 require that the example have a jar with the JDK bindings, and that everyone 
 who sets up a more custom system create their own jar and put it somewhere it 
 can be found.
 
 What's involved in creating that jar?  Is it something that a novice could 
 get done?  Is it something that could be prepackaged for the most common 
 choices, or possibly already available on the Internet?
 
 Thanks,
 Shawn
 



Using Boost fields for a sum total score.

2011-05-18 Thread ronveenstra
I have a sizable index with a main content field, and 5 defined boost
fields (boost_low, boost_med, boost_high, boost_max, and boost_neg).  The
idea and hope was to allow searches on the content field to be
influenced/boosted by the boosting fields if the search term was present.  

I had set up a dismax query with a qf' setting that boosted the content
field significantly, and the 5 boost fields with descending values. (e.g.
content^5.0 boost_max^1.2 boost_high^1.0 etc...)

After some testing and reading, I'm of the understanding that this setup
will result search the fields (content and boost fields), and apply the
boost to each, then choose the field with the highest score as the score for
that result (essentially taking the MAX() score from the various fields, and
not the SUM() of the fields' scores.)  If this is the case, is there an
alternate setup, config item, or means of combining these scores to return a
SUM() score instead?

Any direction or help would be most appreciated.
Ron 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2958968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Two XPathEntityProcessor questions

2011-05-18 Thread Jeff Crump
Hi,

Can anyone tell me if the XPathEntityProcessor handles expresions like this:

xpath=/a/b[c='value']/d/e

That is, return a node that has a predecessor with a given text value?

I would like to map various XPath expressions of that form to the same
document in the index (I have a unique key constraint).

Also, is it possible to assign a value to a unique key from an HTTP
parameter?  Something like this:

field column=id${dataimporter.request.id}/field

I'm using a ContentStreamDataSource to fetch data from a POST.

Thanks,

Jeff


Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Shawn Heisey
I usually build solr using 'ant test dist' to run tests and build the 
.war and other jars, in particular the dataimporthandler.  Having an 
alternate ant option to build without the binding would work for me.  
If/when I get around to changing logging mechanisms, I wouldn't be able 
to use the binary distribution, but with 3.1 I am already including 
selected patches from branch_3x and building it myself.


I can see that there is a lot of resistance to just removing the binding 
entirely.  I think that's a better option, but I know it's important to 
take care of the complete novices and their initial experience with the 
software.


[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with 
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose 
at deploy time
[X]  Let me choose whether to bundle a binding or not at build time, 
using an ANT option

[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!


On 5/18/2011 3:31 PM, Jan Høydahl wrote:
 Hi,

 If you've setup your Tomcat with log4j logging, and want to add Solr, 
within the same logging config, you need to:

  #1. Remove slf4j-jdk14-1.6.1.jar from solr.war (unpack, remove, repack)
  #2. Download slf4j-log4j12-1.6.1.jar (from slf4j.org) and place it 
in e.g. tomcat/shared/lib


 If solr.war shipped without a pre-packaged binding, you could skip 
#1. The binding jar you deploy to appserver lib would also take effect 
for any other webapp using slf4j deployed to the same app-server.


 An alternative to manually repackage solr.war as in #1, is Hoss' 
suggestion in SOLR-2487 of a new ANT option to build Solr artifacts 
without the JUL binding.





Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Chris Hostetter

: An alternative to manually repackage solr.war as in #1, is Hoss' 
: suggestion in SOLR-2487 of a new ANT option to build Solr artifacts 
: without the JUL binding.

More specificly, i'm advocating a new ANT property that would let you 
specify (by path) whatever SLF4J binding jar you want to include, or 
that you don't want any SLF4J binding jar included (by specifying a path 
to a jar that doesn't exist)

I want the default...
ant dist

I don't want a binding in solr.war...
ant -Dslf4j.jar.path=BOGUS_FILE_PATH dist

I want a specific binding in solr.war...
ant -Dslf4j.jar.path=/my/lib/slf4j-jcl-*.jar dist


-Hoss


Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Erick Erickson
You're right, you can't store an XML document directly in Solr.
You have to pull it apart and index it such that you can get whatever
information back you need.

How you flatten data depends entirely upon your needs. The high-level
idea is that you want to create fields such that text searches work. The
moment you start thinking about how can I express a relationship
in the query, back up and try to flatten the data so you can just *search*.

This is vague, I know. But so much depends on how you want to use
the data that specifics are hard to give.

You've gotta take off your DB hat and not worry about duplicating
data. De-normalize lots and lots and lots first...

Best
Erick

On Wed, May 18, 2011 at 5:27 PM, Judioo cont...@judioo.com wrote:
 Great document. I can see how to import the data direct from the database.
 However it seems as though I need to write xpath's in the config to extract
 the fields that I wish to transform into an solr document.

 So it seems that there is no way of storing the document structure in solr
 as is?


 2011/5/18 Yury Kats yuryk...@yahoo.com

 On 5/18/2011 4:19 PM, Judioo wrote:

  Any help is greatly appreciated. Pointers to documentation that address
 my
  issues is even more helpful.

 I think this would be a good start:

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource




Re: Using Boost fields for a sum total score.

2011-05-18 Thread Erick Erickson
You might look at edismax on the 3.1 and trunk, it calculates scores a
bit differently.

You could always just form the query yourself in the app and not use
dismax I think.

Best
Erick

On Wed, May 18, 2011 at 6:06 PM, ronveenstra ron-s...@agathongroup.com wrote:
 I have a sizable index with a main content field, and 5 defined boost
 fields (boost_low, boost_med, boost_high, boost_max, and boost_neg).  The
 idea and hope was to allow searches on the content field to be
 influenced/boosted by the boosting fields if the search term was present.

 I had set up a dismax query with a qf' setting that boosted the content
 field significantly, and the 5 boost fields with descending values. (e.g.
 content^5.0 boost_max^1.2 boost_high^1.0 etc...)

 After some testing and reading, I'm of the understanding that this setup
 will result search the fields (content and boost fields), and apply the
 boost to each, then choose the field with the highest score as the score for
 that result (essentially taking the MAX() score from the various fields, and
 not the SUM() of the fields' scores.)  If this is the case, is there an
 alternate setup, config item, or means of combining these scores to return a
 SUM() score instead?

 Any direction or help would be most appreciated.
 Ron


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2958968.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Fuzzy search and solr 4.0

2011-05-18 Thread Guilherme Aiolfi
Hi,

I want to do a fuzzy search that compare a phrase to a field in solr. For
example:

abc company ltda will be compared to abc comp, abc corporation, def
company ltda, nothing to match here.

The thing is the it has to always returns documents sorted by its score.

I've found some good algorithms to do that, like StrikeAMatch[1] and
JaroWinkler.

Using the JaroWinkler with strdist() I can do exactly that. But, I rather
prefer to use the StrikeAMatch that had a patch in the lucene jira that was
never commited.

So, I contacted the author of that patch and he told me that I should use
the solr 4.0 that it has now some pretty good new fuzzy search enhancements
that made StrikeAMatch seems toys for kids.

Anyone know how can I achieve that using solr 4.0?

[1] http://www.catalysoft.com/articles/StrikeAMatch.html


RE: Solr Range Facets

2011-05-18 Thread Chris Hostetter

: Thanks for explaining the point system, please find below the complete

Sorry .. that part was ment to be a joke, I think i was really tired when 
i wrote that.  The key take away: details matter.


:   int
: name=2011-05-02T05:30:00Z4/int
:   int
: name=2011-05-03T05:30:00Z63/int
:   int
: name=2011-05-04T05:30:00Z0/int
:   int
: name=2011-05-05T05:30:00Z0/int
...
: Now if you notice that the response show 4 records for the 2th of May 2011
: which will fall in the IST timezone (+330MINUTES), but when I try to get the

right.

: results I see that there is only 1 result for the 5th why is this happening.

Why do you say that?

According to those facet results, there are 0 docs between 
2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i 
assume you mean by the 5th ... ie: May 5th, in that timezone offset)

Not only that, but the query you posted isn't attempting to filter on the 
5th by any possible definition of the concept...

:   str
: name=fqcreatedOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *]
:   /str

...that's saying you want all docs with a date on or after the 1st.

: If I don't apply the offset the results match with the facet count, is there
: something wrong in my query?

it looks like your query is just plain wrong.  if you're goal was to 
drill down and show only documents from the 5th it should have been 
something like...

fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 
2011-05-05T00:00:00Z+330MINUTES+1DAY]

...but note also that there is the question of edge inclusion and when 
you want to use [A TO B] vs [A TO B}.  The facet.range.include option is 
how you control wether the edges are used in the facet counts...

http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include


-Hoss


Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread arian487
Ah, my mistake.  Thanks alot, this would be a really cool feature :)

For now I'm resorting to like making more then one query and cross
referencing the two separate queries.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2959439.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: lucene parser, negative OR operands

2011-05-18 Thread Chris Hostetter

: Thanks Yonik. I recall hearing about this before, but was vague on the
: details, thanks for supplying some and refreshing my memory.

matching in Lucene is addative ... queries must match *something*, a 
clause ofa boolean query can be the negation of a query, but that only 
defines how documents should be removed from the set matched by the other 
queries in that boolean.

To put it another way: imagine modeling the list of documents matching a 
query as a bitset.  you can set bits to true, and you can set bits to 
false, but the bitset starts out with all bits as false, so if all 
you do is set bits to false, your bitset will *end* will all bits as false 

: If I want to understand more about how the lucene query parser does it's
: thing, can anyone suggest the source files I should be looking at?

the QueryParser.jj is the grammer for parsing, but the crux is to 
understand that the BooleanQuery class supports three types of clauses: 
PROHIBITED, MANDATORY, and OPTIONAL.  The QueryParser implements those as 
-, + and the default beahvior when neither +/- is present. The 
QueryParser also jumps through some hoops to support AND, OR, NOT but not 
all permutations of those are viable

: If I really do want actual boolean logic behavior, what are my options?  I
: guess one is trying to write my own query parser.

boolean logic generally is defined in some form relative the universe 
.. so a pure negative query like -red really means all things IN THE 
UNIVERSE that are not 'red' ... you can express that using *:* -red

What solr does (and how this thread started) is pointing out that for top 
level queries, (like q=-red or fq=-red) solr adds the *:* to the 
boolean query for you.

: Hmm, for that particular query, what about using parens to force a sub-query?
: 
: (-one) OR (-two)
: 
: Ha, nope, that runs into a different problem (or is it the same problem?), and
: always returns 0 hits.  It looks like the lucene query parser can't handle a
: pure-negative sub-query like that seperate by OR?  Not sure why, can anyone
: explain that one?

the query parser can handle it, and it produces a valid query object, but 
that query object doesn't match anything. -one matches nothing, -two 
matchines nothing ... nothing union nothing is still nothing.

: For that particular pattern, this crazy refactoring of the query does work and
: get the actual boolean logic result of (not 'one') OR (not 'two'):
: 
: (*:* AND -one) OR (*:* AND -two)

correct -- that is you formally saying give me all docs IN THE UNIVERSE 
that are not 'one', and union that with all docs IN THE UNIVERSE that are 
not 'two'

: behavior for that pattern, but in general, I'm kind of wanting a parser that
: will give actual boolean logic behavior. Maybe someday I can find time to
: write it in Java (not the quickest thing for me, not familiar with the code at
: all).

You could implement a parser like that relatively easily -- just make sure 
you put a MatchAllDocsQuery in every BooleanQuery object thta you 
construct, and only ever use the PROHIBITED and MANDATORY clause types 
(never OPTIONAL) ...  the thing is, a parser like that isn't as useful 
as you think it might be when dealing with search results.  OPTIONAL 
clauses are where most of the useful factors of scoring documents ocme 
into play.

-Hoss


Too slow indexing while using 2 different data sources

2011-05-18 Thread deniz
Is it normal to observe slow speed while using an URL datasource and also a
DB? it was something around 30 seconds with only DB source, but when I add
URL datasource too, then it takes 24 - 25 mins to index exactly the same
amount of docs


Is there anyway of overcoming this thing? or i have to suffer?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2959551.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread kun xiong
 [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
 [X ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can
choose at deploy time
 [x ]  Let me choose whether to bundle a binding or not at build time,
using an ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr
elsewhere!
 [ ]  What? Solr can do logging? How cool!

2011/5/19 Chris Hostetter hossman_luc...@fucit.org


 : An alternative to manually repackage solr.war as in #1, is Hoss'
 : suggestion in SOLR-2487 of a new ANT option to build Solr artifacts
 : without the JUL binding.

 More specificly, i'm advocating a new ANT property that would let you
 specify (by path) whatever SLF4J binding jar you want to include, or
 that you don't want any SLF4J binding jar included (by specifying a path
 to a jar that doesn't exist)

 I want the default...
ant dist

 I don't want a binding in solr.war...
ant -Dslf4j.jar.path=BOGUS_FILE_PATH dist

 I want a specific binding in solr.war...
ant -Dslf4j.jar.path=/my/lib/slf4j-jcl-*.jar dist


 -Hoss



Re: Too slow indexing while using 2 different data sources

2011-05-18 Thread deniz
Some details? well i think its clear but still here is the part of my
solrconfig


requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdbconfig.xml/str
  lst name=datasource
 str name=namedatabase/str
 str name=typeJdbcDataSource/str
 str name=drivercom.mysql.jdbc.Driver/str
 str name=urljdbc:mysql://abcd/efgh/str
 str name=usersome/str
 str name=passwordsome/str
  /lst
  lst name=datasource
 str name=nameurl_data/str
 str name=typeURLDataSource/str
 str name=processorXPathEntityProcessor/str
  /lst
/lst
  /requestHandler


and my dbconfig
/* Fields from DB */
/* Fields from DB */
/* Fields from DB */
/* Fields from DB */
...
...
..
entity name=universal dataSource=url_data 
url=http://..com/fddgtr.php/${sa.somevalue};
processor=XPathEntityProcessor
forEach=/some/somefield
field column=info  
xpath=/some/somefield/info/
/entity

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2959626.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Too slow indexing while using 2 different data sources

2011-05-18 Thread Gora Mohanty
On Thu, May 19, 2011 at 6:59 AM, deniz denizdurmu...@gmail.com wrote:
 Is it normal to observe slow speed while using an URL datasource and also a
 DB? it was something around 30 seconds with only DB source, but when I add
 URL datasource too, then it takes 24 - 25 mins to index exactly the same
 amount of docs
[...]

What is the time for indexing just the URL data source? Is
it possible that your URL data source is slow in serving
data?

Regards,
Gora


Re: filter cache and negative filter query

2011-05-18 Thread Chris Hostetter

: What I don't like is that it systematically uses the positive version. 
: Sometimes the negative version will give many less results (for example, 
: in some cases I filter by documents not having a given field, and there 
: are very few of them). I think it would be much better that solr 

the positive version of the filter is the only one that can be executed, 
so it's the one that gets cached today, but the principle you are 
describing is still sound -- in fact I'm pretty sure there is a note in 
the code about this exact idea as a possible performance enhancment:

if the cardinality of a filter is very large (regardless of wether the 
query was positive or negative) it's negative relative the set of all 
docs could be cached in it's place to save space...

...but...

...the complication would comes later when doing lookups -- for cache 
lookups to work with an arbitrary query, you would either need to changed 
the cache structure from Query=DocSet to a mapping of 
Query=[DocSet,inverseionBit] and store the same cache value needs needs 
with two keys -- both the positive and the negative; or you keep the 
current cache structure, store whichever Query=DocSet pair has the 
smallest cardinality, but then every logical cache lookup requires a 
second actual cache lookup under the covers (for the negation of the 
query) if the first one doesn't match anything.

it would require some benchmarking and hard decisions about whether the 
(hypothetical) memory savings are worth the (hypothetical) CPU cost.

: query that in fact returns the negative results. As a simple example, 
: I believe that, for a boolean field, -field:true is exactly the same as 
: +field:false, but the former is a negative query and the latter is a 

that's not strictly true in all cases... 

 * if the field is multivalued=true, a doc may contain both false and 
   true in field, in which case it would match +field:false but it 
   would not match -field:true

 * if the field is not multivalued-false, and required=false, a doc
   may not contain any value, in which case it would match -field:true but 
   it would not match +field:false


-Hoss


Re: K-Stemmer for Solr 3.1

2011-05-18 Thread Otis Gospodnetic
I see KStem being mentioned lately.  It's been 5+ years since I looked at the 
original KStem stuff, but I recall there being a license issue with the 
*original* KStem.  I think it was under some flavour of GPL and that was the 
reason why we didn't include it in Lucene/Solr back then.  I say this now 
because I saw people said KStem was released under BSD license, which doesn't 
match what I saw 5+ years ago.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Smiley, David W. dsmi...@mitre.org
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Mon, May 16, 2011 5:33:00 PM
 Subject: Re: K-Stemmer for Solr 3.1
 
 Lucid's KStemmer is LGPL and the Solr committers have shown that they don't 
want  LGPL libraries shipping with Solr. If you are intent on releasing your 
changes,  I suggest attaching both the modified source and the compiled jar 
onto 
Solr's  k-stemmer wiki page; and of course say that it's LGPL licensed.
 
 ~ David  Smiley
 
 On May 16, 2011, at 2:24 AM, Bernd Fehling wrote:
 
  I  don't know if it is allowed to modify Lucid code and add it to jira.
  If  someone from Lucid would give me the permission and the Solr developers
   have nothing against it I won't mind adding the Lucid KStemmer to jira
   for Solr 3.x and 4.x.
  
  There are several Lucid KStemmer users  which I can see from the many 
requests
  which I got. Also the Lucid  KStemmer is faster than the standard KStemmer.
  
  Bernd
  
  Am 16.05.2011 06:33, schrieb Bill Bell:
  Did you upload the  code to Jira?
  
  On 5/13/11 12:28 AM, Bernd  Fehlingbernd.fehl...@uni-bielefeld.de
   wrote:
  
  I backported a Lucid KStemmer version from  solr 4.0 which I found
  somewhere.
  Just changed  from
  import org.apache.lucene.analysis.util.CharArraySet;   // solr4.0
  to
  import  org.apache.lucene.analysis.CharArraySet;  // solr3.1
  
  Bernd
  
  
  Am  12.05.2011 16:32, schrieb Mark:
   java.lang.AbstractMethodError:
   org.apache.lucene.analysis.TokenStream.incrementToken()Z
  
  Would you mind explaining your modifications?  Thanks
  
  On 5/11/11 11:14 PM, Bernd  Fehling wrote:
  
  Am 12.05.2011  02:05, schrieb Mark:
  It appears that the older  version of the Lucid Works KStemmer is
  incompatible  with Solr 3.1. Has anyone been able to get this to  work?
  If not,
  what are  you using as an alternative?
  
  Thanks
  
  Lucid KStemmer works nice with Solr3.1 after some minor  mods to
  KStemFilter.java and  KStemFilterFactory.java.
  What problems do you  have?
  
  Bernd
  
  
  
  -- 
   *
  Bernd  Fehling Universitätsbibliothek Bielefeld
  Dipl.-Inform. (FH) Universitätsstr.  25
  Tel. +49 521 106-4060Fax. +49 521 106-4052
  bernd.fehl...@uni-bielefeld.de 33615 Bielefeld
  
  BASE - Bielefeld Academic Search Engine - www.base-search.net
   *
 



Re: K-Stemmer for Solr 3.1

2011-05-18 Thread Otis Gospodnetic
Hm, maybe I was wrong.  I don't see any mention of *GPL on KStem download page. 
 
I only see http://ciir.cs.umass.edu/downloads/agreements/general.html.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Wed, May 18, 2011 11:35:32 PM
 Subject: Re: K-Stemmer for Solr 3.1
 
 I see KStem being mentioned lately.  It's been 5+ years since I looked at  
 the 

 original KStem stuff, but I recall there being a license issue with the 
 *original* KStem.  I think it was under some flavour of GPL and that  was the 
 reason why we didn't include it in Lucene/Solr back then.  I  say this now 
 because I saw people said KStem was released under BSD license,  which 
 doesn't 

 match what I saw 5+ years ago.
 
 Otis
 
 Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message  
  From: Smiley, David W. dsmi...@mitre.org
  To: solr-user@lucene.apache.org  solr-user@lucene.apache.org
   Sent: Mon, May 16, 2011 5:33:00 PM
  Subject: Re: K-Stemmer for Solr  3.1
  
  Lucid's KStemmer is LGPL and the Solr committers have shown  that they 
  don't 

 want  LGPL libraries shipping with Solr. If you are  intent on releasing 
 your 

 changes,  I suggest attaching both the  modified source and the compiled jar 
onto 

 Solr's  k-stemmer wiki  page; and of course say that it's LGPL licensed.
  
  ~ David   Smiley
  
  On May 16, 2011, at 2:24 AM, Bernd Fehling  wrote:
  
   I  don't know if it is allowed to modify Lucid  code and add it to jira.
   If  someone from Lucid would give me  the permission and the Solr 
developers
have nothing against  it I won't mind adding the Lucid KStemmer to jira
for Solr  3.x and 4.x.
   
   There are several Lucid KStemmer  users  which I can see from the many 
 requests
   which I  got. Also the Lucid  KStemmer is faster than the standard 
KStemmer.

   Bernd
   
   Am 16.05.2011 06:33, schrieb  Bill Bell:
   Did you upload the  code to Jira?

   On 5/13/11 12:28 AM, Bernd  Fehlingbernd.fehl...@uni-bielefeld.de
 wrote:
   
   I backported a  Lucid KStemmer version from  solr 4.0 which I found
somewhere.
   Just changed  from
import org.apache.lucene.analysis.util.CharArraySet;   // solr4.0
to
   import   org.apache.lucene.analysis.CharArraySet;  // solr3.1
   
   Bernd
   
   
Am  12.05.2011 16:32, schrieb Mark:
 java.lang.AbstractMethodError:
  org.apache.lucene.analysis.TokenStream.incrementToken()Z

   Would you mind explaining your  modifications?  Thanks
   
On 5/11/11 11:14 PM, Bernd  Fehling wrote:
   
   Am 12.05.2011  02:05, schrieb Mark:
It appears that the older  version of the Lucid  Works KStemmer is
   incompatible  with Solr  3.1. Has anyone been able to get this to  
work?
If not,
   what are   you using as an alternative?
   
Thanks
   
Lucid KStemmer works nice with Solr3.1 after some  minor  mods to
   KStemFilter.java and   KStemFilterFactory.java.
   What problems do you   have?
   
   Bernd

   
   
   -- 
 *
Bernd  Fehling  Universitätsbibliothek Bielefeld
   Dipl.-Inform. (FH)   Universitätsstr.  25
   Tel. +49 521 106-4060 Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de  33615 Bielefeld
   
   BASE - Bielefeld Academic Search Engine -  www.base-search.net
 *
  
 



Re: indexing directed graph

2011-05-18 Thread Otis Gospodnetic
Maybe Gora was referring to Siren: http://search-lucene.com/?q=siren+-sami

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: dani.b.angelov dani.b.ange...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, May 17, 2011 2:44:55 AM
 Subject: Re: indexing directed graph
 
 Gora, thank you for your reply!
 
 Could you point me a link regarding  There was a discussion earlier on this
 topic
 
 --
 View this  message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2951418.html

 Sent  from the Solr - User mailing list archive at Nabble.com.
 


Re: Solr Range Facets

2011-05-18 Thread Rohit Gupta
Hi Chris,

I made a mistake in explaining the second part of my question. 

If you notice the faceted result, you will notice for results for the 2nd May 
2011 there are 4 results, but when I query for the 2nd May I  should get only 1 
result since after apply the offset all the remaining results should be shifted 
to the 3rd of May.

But I think i got the reason for this, I guess offset is applied to only the 
edges and not to the actual result. I mean when we say facet with an offset of 
+330MINUTES, what solr actually does is just move the facets by +330MINUTES, 
but 
not each and every document.

Regards,
Rohit




From: Chris Hostetter hossman_luc...@fucit.org
To: solr-user@lucene.apache.org
Sent: Thu, 19 May, 2011 6:16:53 AM
Subject: RE: Solr Range Facets


: Thanks for explaining the point system, please find below the complete

Sorry .. that part was ment to be a joke, I think i was really tired when 
i wrote that.  The key take away: details matter.


: int
: name=2011-05-02T05:30:00Z4/int
: int
: name=2011-05-03T05:30:00Z63/int
: int
: name=2011-05-04T05:30:00Z0/int
: int
: name=2011-05-05T05:30:00Z0/int
...
: Now if you notice that the response show 4 records for the 2th of May 2011
: which will fall in the IST timezone (+330MINUTES), but when I try to get the

right.

: results I see that there is only 1 result for the 5th why is this happening.

Why do you say that?

According to those facet results, there are 0 docs between 
2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i 
assume you mean by the 5th ... ie: May 5th, in that timezone offset)

Not only that, but the query you posted isn't attempting to filter on the 
5th by any possible definition of the concept...

: str
: name=fqcreatedOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *]
: /str

...that's saying you want all docs with a date on or after the 1st.

: If I don't apply the offset the results match with the facet count, is there
: something wrong in my query?

it looks like your query is just plain wrong.  if you're goal was to 
drill down and show only documents from the 5th it should have been 
something like...

fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 
2011-05-05T00:00:00Z+330MINUTES+1DAY]

...but note also that there is the question of edge inclusion and when 
you want to use [A TO B] vs [A TO B}.  The facet.range.include option is 
how you control wether the edges are used in the facet counts...

http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include


-Hoss


Re: indexing directed graph

2011-05-18 Thread Gora Mohanty
On Thu, May 19, 2011 at 9:12 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Maybe Gora was referring to Siren: http://search-lucene.com/?q=siren+-sami
[...]

That does look interesting, but is not what I was referring to.

I seem to remember a discussion on this list some
3-4 months about someone wanting to make a
customised Lucene index, specifically for graphs.
I believe that he even wrote up a Wiki (?) page
on it.

Sorry, Dani, I have been busy, and so far my
Google-fu has been unable to turn up the thread,
or the Wiki page. Will let you know if I come
across it.

Regards,
Gora


Re: K-Stemmer for Solr 3.1

2011-05-18 Thread Bernd Fehling

Hi Otis,

conclusion, if we check that the license agreement is included in all
source files and as a seperate license file then we are clear about
KStem itself.
What about the modifications from Lucid, do you know if they publish under GPL?

Bernd
-
BASE - Bielefeld Academic Search Engine
http://www.base-search.net/


Am 19.05.2011 05:39, schrieb Otis Gospodnetic:

Hm, maybe I was wrong.  I don't see any mention of *GPL on KStem download page.
I only see http://ciir.cs.umass.edu/downloads/agreements/general.html.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: Otis Gospodneticotis_gospodne...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Wed, May 18, 2011 11:35:32 PM
Subject: Re: K-Stemmer for Solr 3.1

I see KStem being mentioned lately.  It's been 5+ years since I looked at  the



original KStem stuff, but I recall there being a license issue with the
*original* KStem.  I think it was under some flavour of GPL and that  was the
reason why we didn't include it in Lucene/Solr back then.  I  say this now
because I saw people said KStem was released under BSD license,  which doesn't



match what I saw 5+ years ago.

Otis

Sematext  :: http://sematext.com/ ::  Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message  

From: Smiley, David W.dsmi...@mitre.org
To: solr-user@lucene.apache.orgsolr-user@lucene.apache.org
  Sent: Mon, May 16, 2011 5:33:00 PM
Subject: Re: K-Stemmer for Solr  3.1

Lucid's KStemmer is LGPL and the Solr committers have shown  that they don't



want  LGPL libraries shipping with Solr. If you are  intent on releasing your



changes,  I suggest attaching both the  modified source and the compiled jar

onto


Solr's  k-stemmer wiki  page; and of course say that it's LGPL licensed.

~ David   Smiley

On May 16, 2011, at 2:24 AM, Bernd Fehling  wrote:


I  don't know if it is allowed to modify Lucid  code and add it to jira.
If  someone from Lucid would give me  the permission and the Solr

developers

  have nothing against  it I won't mind adding the Lucid KStemmer to jira
  for Solr  3.x and 4.x.

There are several Lucid KStemmer  users  which I can see from the many

requests

which I  got. Also the Lucid  KStemmer is faster than the standard

KStemmer.

  

Bernd

Am 16.05.2011 06:33, schrieb  Bill Bell:

Did you upload the  code to Jira?

  

On 5/13/11 12:28 AM, Bernd  Fehlingbernd.fehl...@uni-bielefeld.de

 wrote:



I backported a  Lucid KStemmer version from  solr 4.0 which I found
  somewhere.
Just changed  from
  import org.apache.lucene.analysis.util.CharArraySet;   // solr4.0

to

import   org.apache.lucene.analysis.CharArraySet;  // solr3.1

Bernd



Am  12.05.2011 16:32, schrieb Mark:
 java.lang.AbstractMethodError:
  org.apache.lucene.analysis.TokenStream.incrementToken()Z
  

Would you mind explaining your  modifications?  Thanks

  On 5/11/11 11:14 PM, Bernd  Fehling wrote:


Am 12.05.2011  02:05, schrieb Mark:

It appears that the older  version of the Lucid  Works KStemmer is

incompatible  with Solr  3.1. Has anyone been able to get this to

work?

If not,

what are   you using as an alternative?


Thanks



Lucid KStemmer works nice with Solr3.1 after some  minor  mods to

KStemFilter.java and   KStemFilterFactory.java.
What problems do you   have?

Bernd

  




--
   *
  Bernd  Fehling  Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)   Universitätsstr.  25
Tel. +49 521 106-4060 Fax. +49 521 106-4052

bernd.fehl...@uni-bielefeld.de  33615 Bielefeld


BASE - Bielefeld Academic Search Engine -  www.base-search.net
   *







--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*