date:20110518

Thinking more about it, I can solve my immediate problem by just
copy-pasting the classes I need into my own project packages (KISS
like herehttps://github.com/Filirom1/solr-test-exemple
).

I'd however suggest to refactor Solr code structure to be much more
defaults-compliant making it easier for external developers to understand,
and hopefully easier to maintain for committers (with fewer special-needs
configurations). I've done some of those refactorings on my local copy of
Solr and would be glad to contribute.

For this particular problem the KISS solution would be to create yet one
more module for Tests which depend on Solr Core and on the Test Framework.
The org burden of that extra module, versus the ease of building
configuration, I believe, outweights.



On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:


 http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency


 On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:




 On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:



 On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote:
  Solr Core should declare a test dependency on Solr Test Framework.

 I agree:

 - Solr Core should have a test-scope dependency on Solr Test Framework.
 - Solr Test Framework should have a compile-scope dependency on Solr
 Core.

 But Maven views this as a circular dependency.


 I've seen, but adding it with scope test /scope works. The logic:
 the src is compiled first and then re-used (I'm assuming maven does
 something smart about not including the full jar).


 Not quite. I've tried a demo and the reactor complains. I'll try to see if
 maven could become 'smarter', or if the 2-build phase solution will work.

 The projects in the reactor contain a cyclic reference: Edge between
 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and
 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in
 the graph org.apache:DummyCore:1.0-SNAPSHOT --
 com.mysimpatico:TestFramework:1.0-SNAPSHOT --
 org.apache:DummyCore:1.0-SNAPSHOT - [Help 1]







 The workaround: Solr Core includes the source of Solr Test Framework as
 part of its test source code.  It's not pretty, but it works.

 I'd be happy to entertain other (functional) approaches.


 In dp4j.com pom.xml I build in 2 phases to compile with the same
 annotations in the project itself (but i don't think we need that here)



 Steve




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A valid
 code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

How to list/see all the indexed terms of a particular field in a document?

2011-05-18 Thread Gnanakumar

Hi,

I'm using Apache Solr v3.1.

How do I list/get to see all the indexed terms of a particular field in a
document (by passing Unique Key ID of the document)?

For example, I've the following field definition in schema.xml:

field name=mydocumentid type=string indexed=true stored=true
required=true /
field name=mytextcontent type=text indexed=true stored=true
required=true /

In this case, I expect/want to list/see all the indexed terms of a
particular document (mydocumentid:x) for the document field
mytextcontent.

Regards,
Gnanam

Re: How to list/see all the indexed terms of a particular field in a document?

ant luke?

On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote:

 Hi,

 I'm using Apache Solr v3.1.

 How do I list/get to see all the indexed terms of a particular field in a
 document (by passing Unique Key ID of the document)?

 For example, I've the following field definition in schema.xml:

 field name=mydocumentid type=string indexed=true stored=true
 required=true /
 field name=mytextcontent type=text indexed=true stored=true
 required=true /

 In this case, I expect/want to list/see all the indexed terms of a
 particular document (mydocumentid:x) for the document field
 mytextcontent.

 Regards,
 Gnanam




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Using solandra

2011-05-18 Thread karanveer singh

I've recently switched from solr+cassandra to solandra.

When I try to run solandra using java -jar start.jar in solandra-app, it
gives me the following error:


java.lang.ExceptionInInitializerError
at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at org.mortbay.jetty.Server.doStart(Server.java:224)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
configuration.*
at
org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)

How exactly do I configure the log4j configuration?

Karan

Re: Using solandra

Karan,

following the Readme (https://github.com/tjake/Solandra#readme) it's:

From the Solandra base directory:
$ mkdir /tmp/cassandra-data
$ ant
$ cd solandra-app
$ ./start-solandra.sh

Regards
Stefan

On Wed, May 18, 2011 at 12:40 PM, karanveer singh
karan.korn...@gmail.com wrote:
 I've recently switched from solr+cassandra to solandra.

 When I try to run solandra using java -jar start.jar in solandra-app, it
 gives me the following error:


 java.lang.ExceptionInInitializerError
 at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
 at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
 at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
 at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
 at
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
 at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
 at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
 at org.mortbay.jetty.Server.doStart(Server.java:224)
 at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
 at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at org.mortbay.start.Main.invokeMain(Main.java:183)
 at org.mortbay.start.Main.start(Main.java:497)
 at org.mortbay.start.Main.main(Main.java:115)
 Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
 configuration.*
 at
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)

 How exactly do I configure the log4j configuration?

 Karan

sorting on date field in facet query

2011-05-18 Thread Dmitry Kan

Hello list,

Is it possible to sort on date field in a facet query in SOLR 3.1?

-- 
Regards,

Dmitry Kan

Re: Using solandra

2011-05-18 Thread karanveer singh

Thanks Stefan! I got it started.

Also, is there a way to import xml documents?
When I run 2-import-data.sh with only xml documents in the data directory,
it gives me the following :

Loading data to solandra, note: this importer uses a slow xml parser
Exception in thread main java.lang.RuntimeException: Directory doesn't
contain sgml files!
at
org.apache.solr.solrjs.sgml.reuters.ReutersService.readDirectory(ReutersService.java:207)
at
org.apache.solr.solrjs.sgml.reuters.ReutersService.main(ReutersService.java:64)
Data loaded, now open ./website/index.html in your favorite browser!


On Wed, May 18, 2011 at 4:20 PM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Karan,

 following the Readme (https://github.com/tjake/Solandra#readme) it's:

 From the Solandra base directory:
 $ mkdir /tmp/cassandra-data
 $ ant
 $ cd solandra-app
 $ ./start-solandra.sh

 Regards
 Stefan

 On Wed, May 18, 2011 at 12:40 PM, karanveer singh
 karan.korn...@gmail.com wrote:
  I've recently switched from solr+cassandra to solandra.
 
  When I try to run solandra using java -jar start.jar in solandra-app, it
  gives me the following error:
 
 
  java.lang.ExceptionInInitializerError
  at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
  at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
  at
 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
  at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
  at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
  at org.mortbay.jetty.Server.doStart(Server.java:224)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at org.mortbay.start.Main.invokeMain(Main.java:183)
  at org.mortbay.start.Main.start(Main.java:497)
  at org.mortbay.start.Main.main(Main.java:115)
  Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
  configuration.*
  at
 
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)
 
  How exactly do I configure the log4j configuration?
 
  Karan

RE: How to list/see all the indexed terms of a particular field in a document?

2011-05-18 Thread Gnanakumar

So this cannot be queried/listed using Apache Solr?

-Original Message-
From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com] 
Sent: Wednesday, May 18, 2011 3:36 PM
To: solr-user@lucene.apache.org; gna...@zoniac.com
Subject: Re: How to list/see all the indexed terms of a particular field in a 
document?

ant luke?

On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote:

 Hi,

 I'm using Apache Solr v3.1.

 How do I list/get to see all the indexed terms of a particular field in a
 document (by passing Unique Key ID of the document)?

 For example, I've the following field definition in schema.xml:

 field name=mydocumentid type=string indexed=true stored=true
 required=true /
 field name=mytextcontent type=text indexed=true stored=true
 required=true /

 In this case, I expect/want to list/see all the indexed terms of a
 particular document (mydocumentid:x) for the document field
 mytextcontent.

 Regards,
 Gnanam

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ? L(LON*) ? ?x. (x ? MyInbox ? Acknowledges(x, this) ? time(x)
 Now + 48h) ? resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
?x. x ? MyInbox ? from(x) ? MySafeSenderList ? (?y. y ? subject(x) ? y ?
L(-[a-z]+[0-9]X)).

Re: Using solandra

Karan,

this data-import script is made especially for importing the
demo-data. To index xml documents (like you'd do it normally w/ solr)
use for example
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/exampledocs/post.sh
- and don't forget to adjust the URL, according to your solandra
setup.

Regards
Stefan

On Wed, May 18, 2011 at 1:25 PM, karanveer singh
karan.korn...@gmail.com wrote:
 Thanks Stefan! I got it started.

 Also, is there a way to import xml documents?
 When I run 2-import-data.sh with only xml documents in the data directory,
 it gives me the following :

 Loading data to solandra, note: this importer uses a slow xml parser
 Exception in thread main java.lang.RuntimeException: Directory doesn't
 contain sgml files!
 at
 org.apache.solr.solrjs.sgml.reuters.ReutersService.readDirectory(ReutersService.java:207)
 at
 org.apache.solr.solrjs.sgml.reuters.ReutersService.main(ReutersService.java:64)
 Data loaded, now open ./website/index.html in your favorite browser!


 On Wed, May 18, 2011 at 4:20 PM, Stefan Matheis 
 matheis.ste...@googlemail.com wrote:

 Karan,

 following the Readme (https://github.com/tjake/Solandra#readme) it's:

 From the Solandra base directory:
 $ mkdir /tmp/cassandra-data
 $ ant
 $ cd solandra-app
 $ ./start-solandra.sh

 Regards
 Stefan

 On Wed, May 18, 2011 at 12:40 PM, karanveer singh
 karan.korn...@gmail.com wrote:
  I've recently switched from solr+cassandra to solandra.
 
  When I try to run solandra using java -jar start.jar in solandra-app, it
  gives me the following error:
 
 
  java.lang.ExceptionInInitializerError
  at lucandra.CassandraUtils.startupServer(CassandraUtils.java:249)
  at solandra.SolandraInitializer.initialize(SolandraInitializer.java:45)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
  at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
  at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
  at
 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
  at
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
  at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
  at org.mortbay.jetty.Server.doStart(Server.java:224)
  at
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
  at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:616)
  at org.mortbay.start.Main.invokeMain(Main.java:183)
  at org.mortbay.start.Main.start(Main.java:497)
  at org.mortbay.start.Main.main(Main.java:115)
  Caused by: java.lang.RuntimeException: *Couldn't figure out log4j
  configuration.*
  at
 
 org.apache.cassandra.service.AbstractCassandraDaemon.clinit(AbstractCassandraDaemon.java:75)
 
  How exactly do I configure the log4j configuration?
 
  Karan

Re: How to list/see all the indexed terms of a particular field in a document?

Gnanam,

have a look to http://wiki.apache.org/solr/LukeRequestHandler

Regards
Stefan

On Wed, May 18, 2011 at 1:30 PM, Gnanakumar gna...@zoniac.com wrote:
 So this cannot be queried/listed using Apache Solr?

 -Original Message-
 From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
 Sent: Wednesday, May 18, 2011 3:36 PM
 To: solr-user@lucene.apache.org; gna...@zoniac.com
 Subject: Re: How to list/see all the indexed terms of a particular field in a 
 document?

 ant luke?

 On Wed, May 18, 2011 at 11:47 AM, Gnanakumar gna...@zoniac.com wrote:

 Hi,

 I'm using Apache Solr v3.1.

 How do I list/get to see all the indexed terms of a particular field in a
 document (by passing Unique Key ID of the document)?

 For example, I've the following field definition in schema.xml:

 field name=mydocumentid type=string indexed=true stored=true
 required=true /
 field name=mytextcontent type=text indexed=true stored=true
 required=true /

 In this case, I expect/want to list/see all the indexed terms of a
 particular document (mydocumentid:x) for the document field
 mytextcontent.

 Regards,
 Gnanam




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ? L(LON*) ? ?x. (x ? MyInbox ? Acknowledges(x, this) ? time(x)
  Now + 48h) ? resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ?x. x ? MyInbox ? from(x) ? MySafeSenderList ? (?y. y ? subject(x) ? y ?
 L(-[a-z]+[0-9]X)).

how to work cache and improve performance phrase query included wildcard

2011-05-18 Thread Jason, Kim

Hi, all

I have two questions.
First,
I'm wondering how filterCache, queryResultCache, documentCache are applied.
After searching query1  OR query2 OR query3 ... , I searched query0  OR
query2 OR query3 ... .
Just query1 and query0 are difference.
But query time was not fast.
When are the caches applied?

Second,
I have 5 or more phrase queries included wildcard per query such as query1*
query2*~2 OR query3* query4*~2 ...
In the worst case, phrase queries included wildcard in one query are more
than 30.
QTime is more than 60 second.

Please give any idea to improve performance.

I have 2.5 million full text index.
That is running 10 shards on 1 tomcat.

Thanks,
Jason

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-work-cache-and-improve-performance-phrase-query-included-wildcard-tp2956671p2956671.html
Sent from the Solr - User mailing list archive at Nabble.com.

I need to improve highlighting

2011-05-18 Thread bryan rasmussen

Hi,

If I do a search
http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
the lst name=highlighting subtree I get
arr name=all_text
−
str
Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
/str
/arr
/lst


What I need to do is to either

 1. Return all of all_text which should be possible by setting
hl.fragsize=0 but I still never go beyond the default for the field (I
can go less than 100 but not more)
2. Get a count of number of highlighted instances(preferable) or
return each highlighted text in a separate str element - so
strkongeriget/strstrkongeriget/str


thanks,
Bryan Rasmussen

Re: How to test Solr Integartion - how to get EmbeddedSolrServer?

You've probably seen this page: http://wiki.apache.org/solr/HowToContribute,
but here it is for reference

Go ahead and open a JIRA at https://issues.apache.org/jira/browse/SOLR
(you need to create an account) and attach your changes as a patch. That
gets it into the system and folks can start commenting on what they
think the implications are. One of the committers needs to pick it up,
but you can prompt G...

Yonik's law of patches reads:

A half-baked patch in Jira, with no documentation, no tests
and no backwards compatibility is better than no patch at all.

So don't worry about a completely polished patch for the first cut, it's often
helpful for people to see the early stages to help steer the effort.

Best
Erick

On Wed, May 18, 2011 at 5:41 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Thinking more about it, I can solve my immediate problem by just
 copy-pasting the classes I need into my own project packages (KISS
 like herehttps://github.com/Filirom1/solr-test-exemple
 ).

 I'd however suggest to refactor Solr code structure to be much more
 defaults-compliant making it easier for external developers to understand,
 and hopefully easier to maintain for committers (with fewer special-needs
 configurations). I've done some of those refactorings on my local copy of
 Solr and would be glad to contribute.

 For this particular problem the KISS solution would be to create yet one
 more module for Tests which depend on Solr Core and on the Test Framework.
 The org burden of that extra module, versus the ease of building
 configuration, I believe, outweights.



 On Tue, May 17, 2011 at 7:11 PM, Gabriele Kahlout
 gabri...@mysimpatico.comwrote:


 http://stackoverflow.com/questions/6034513/can-i-avoid-a-dependency-cycle-with-one-edge-being-a-test-dependency


 On Tue, May 17, 2011 at 6:49 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:




 On Tue, May 17, 2011 at 3:52 PM, Gabriele Kahlout 
 gabri...@mysimpatico.com wrote:



 On Tue, May 17, 2011 at 3:44 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 On 5/17/2011 at 9:34 AM, Gabriele Kahlout wrote:
  Solr Core should declare a test dependency on Solr Test Framework.

 I agree:

 - Solr Core should have a test-scope dependency on Solr Test Framework.
 - Solr Test Framework should have a compile-scope dependency on Solr
 Core.

 But Maven views this as a circular dependency.


 I've seen, but adding it with scope test /scope works. The logic:
 the src is compiled first and then re-used (I'm assuming maven does
 something smart about not including the full jar).


 Not quite. I've tried a demo and the reactor complains. I'll try to see if
 maven could become 'smarter', or if the 2-build phase solution will work.

 The projects in the reactor contain a cyclic reference: Edge between
 'Vertex{label='com.mysimpatico:TestFramework:1.0-SNAPSHOT'}' and
 'Vertex{label='org.apache:DummyCore:1.0-SNAPSHOT'}' introduces to cycle in
 the graph org.apache:DummyCore:1.0-SNAPSHOT --
 com.mysimpatico:TestFramework:1.0-SNAPSHOT --
 org.apache:DummyCore:1.0-SNAPSHOT - [Help 1]







 The workaround: Solr Core includes the source of Solr Test Framework as
 part of its test source code.  It's not pretty, but it works.

 I'd be happy to entertain other (functional) approaches.


 In dp4j.com pom.xml I build in 2 phases to compile with the same
 annotations in the project itself (but i don't think we need that here)



 Steve




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the
 email does not contain a valid code then the email is not received. A valid
 code starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the

Re: sorting on date field in facet query

Can you provide an example of what you are trying to do? Are you
referring to ordering the result set or the facet information?

Best
Erick

On Wed, May 18, 2011 at 7:21 AM, Dmitry Kan dmitry@gmail.com wrote:
 Hello list,

 Is it possible to sort on date field in a facet query in SOLR 3.1?

 --
 Regards,

 Dmitry Kan

Re: I need to improve highlighting

Bryan, on Q2 - what about using xpath like 'str/em' ?

Regards
Stefan

On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen

Re: how to work cache and improve performance phrase query included wildcard

See below:

On Wed, May 18, 2011 at 8:15 AM, Jason, Kim hialo...@gmail.com wrote:
Hi, all

I have two questions.
First,
I'm wondering how filterCache, queryResultCache, documentCache are applied.
After searching query1 OR query2 OR query3 ... , I searched query0 OR
query2 OR query3 ... .
Just query1 and query0 are difference.
But query time was not fast.
When are the caches applied?

Caches don't really count here. You're not using filter queries so
filterCache isn't
germane. documentCache is only for holding the document read off disk,
it probably
isn't doing much in your example that would impact differences in search
time unless you're returning massive numbers of documents.

queryResultCache isn't getting re-used. Think of this as a list of document IDs
keyed by the *entire* query. by making any changes to the query you're not
going to use the cache. To understand this, consider that the clauses aren't
really separate. Any additional clause could easily change the scoring of
a document that matched both queries. So re-using the cache on a by clause
basis wouldn't produce correct results.

In other words, caches aren't going to help you here.

Second,
I have 5 or more phrase queries included wildcard per query such as query1*
query2*~2 OR query3* query4*~2 ...
In the worst case, phrase queries included wildcard in one query are more
than 30.
QTime is more than 60 second.

Can we see the results of attaching debugQuery=on to the URL?
your pseudo-code may well be hiding the issue. We don't know what
query parser you're using. Wildcards aren't usually analyzed for phrase
queries for instance, so on the face of it there's not much that can be said...

Additionally, the field type and field definitions from your schema.xml
would be helpful for the fields you're searching on.

Best
Erick

Please give any idea to improve performance.

I have 2.5 million full text index.
That is running 10 shards on 1 tomcat.

Thanks,
Jason

--
View this message in context:
http://lucene.472066.n3.nabble.com/how-to-work-cache-and-improve-performance-phrase-query-included-wildcard-tp2956671p2956671.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: I need to improve highlighting

2011-05-18 Thread bryan rasmussen

 Bryan, on Q2 - what about using xpath like 'str/em' ?

How do I do that? The highlighting result, at least in the solr
installation I have (3. something) returns the em as escaped markup.
Is there an xpath parameter or configuration I can set for
highlighting, or a way to change the em elements to be actual
elements (hl.fomatter maybe?)

Thanks,
Bryan Rasmussen


 On Wed, May 18, 2011 at 2:25 PM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen

Re: I need to improve highlighting

Just checking, but have you tried setting
hl.fragsize=very large number as suggested here:

http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

If that's not the problem, please show us the results of
attaching debugQuery=on to the request, that may shed
some light on the problem.

Best
Erick

On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen
rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen

Re: I need to improve highlighting

2011-05-18 Thread bryan rasmussen

yeah but you just got me to check again, what I thought was ignoring
my setting of hl.fragsize and always using the default ended up just
returning a smaller field higher ranked, so when I set it to 1000 and
saw the same as what I saw with 100 was the just the off chance that
there was only 100 to see in the first 10 results. funny.

thanks,
Bryan Rasmussen

On Wed, May 18, 2011 at 2:59 PM, Erick Erickson erickerick...@gmail.com wrote:
 Just checking, but have you tried setting
 hl.fragsize=very large number as suggested here:

 http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ?

 If that's not the problem, please show us the results of
 attaching debugQuery=on to the request, that may shed
 some light on the problem.

 Best
 Erick

 On Wed, May 18, 2011 at 8:25 AM, bryan rasmussen
 rasmussen.br...@gmail.com wrote:
 Hi,

 If I do a search
 http://localhost:8983/solr/tester/select/?q=kongerigethl=true then in
 the lst name=highlighting subtree I get
 arr name=all_text
 -
 str
 Aftale mellem emkongeriget/em Danmark og emkongeriget/em Sverige
 /str
 /arr
 /lst


 What I need to do is to either

  1. Return all of all_text which should be possible by setting
 hl.fragsize=0 but I still never go beyond the default for the field (I
 can go less than 100 but not more)
 2. Get a count of number of highlighted instances(preferable) or
 return each highlighted text in a separate str element - so
 strkongeriget/strstrkongeriget/str


 thanks,
 Bryan Rasmussen

Re: Anyone having these Replication issues as well?

2011-05-18 Thread kenf_nc

Thanks Markus, for your patience with getting the response in as well the
comments.

This is my Dev environment, I'm actually going to be setting up a new
master-slave configuration in a different environment today. I'll see if
it's environment specific or not. One thing I didn't mention, wasn't sure it
was germane, is that these servers are in Amazon EC2. Also, the master is
currently on a 32 bit OS the slaves are on 64 bit OS's. Just the order in
which the servers are getting upgraded in dev. 

The master has AutoCommit turned on at 30 second intervals. Even if nothing
is getting indexed, could an AutoCommit occurring during a replication
request cause a failed replication?

Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Anyone-having-these-Replication-issues-as-well-tp2954365p2957127.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: I need an available solr lucene consultant

2011-05-18 Thread bill

I am interested in hearing more about this opportunity. Feel free to contact
me at b...@csrinstitute.net. 

Thanks
Bill 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/I-need-an-available-solr-lucene-consultant-tp2954023p2957137.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exact match

2011-05-18 Thread Jan Høydahl

There's a JIRA issue assigned to this feature: 
https://issues.apache.org/jira/browse/SOLR-1980
However, it's not yet implemented. Anyone?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 17. mai 2011, at 15.51, Alex Grilo wrote:

 Hi,
 
 Can I make a query that returns only exact match or do I have to change the
 fields to achieve that?
 
 Thanks in advance
 
 Alex Grilo

Does every Solr request-response require a running server?

Hello,

I'm wondering if Solr Test framework at the end of the day always runs an
embedded/jetty server (which is the only way to interact with solr, i.e. no
web server -- no solr) or in the tests they interact without one, calling
directly the under line methods?

The latter seems to be the case trying to understand SolrTestCaseJ4. That
would be more white-box than otherwise.

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: UIMA analysisEngine path

2011-05-18 Thread Tommaso Teofili

2011/5/17 chamara chama...@gmail.com

 Hi
  My solr version is 3.1.0. I actually figured out what my problem was. I
 used the guide

 https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt
 and
 it seems that i have placed the code snippet inside an another xml element
 not under config.


One more thing is that you are using Solr 3.1.0 but reading README from
trunk (4.0-SNAPSHOT), you should use this one instead:
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/README.txt


 Will the UIMA work with solr version 1.4.1 as well?


the UpdateRequestProcessorChain API has changed from 1.4.1 to 3.1.0 so,
although it should be easy to back port, it's not compatible with Solr 1.4.1
out of the box.
Tommaso



 Thanks again



 On Tue, May 17, 2011 at 12:13 PM, Tommaso Teofili [via Lucene] 
 ml-node+2952043-2093755785-399...@n3.nabble.com wrote:

  Hi again Chamara,
 
  2011/5/17 chamara [hidden email]
 http://user/SendEmail.jtp?type=nodenode=2952043i=0
 
 
   Thanks Tommaso, yes this occurred after copying the .jar files to the
 lib
 
   folder. When i do not copy them from contrib/uima/lib and have the
   solrconfig.xml to point to those libs i get the following error. I am a
  bit
   confused why a class path was chosen to get the analysis engine
  descriptor
   .
  
 
  I think it'd be nice if you could tell which version of Solr you're
 using,
  how you configured the Solr-UIMA module in solrconfig.xml.
 
 
 
   The error is prompted when the /update request handler calls this looks
   like
   it is related to the class path(/org/apache/uima/desc/).
   SEVERE: Error in xpath:java.lang.RuntimeException: solrconfig.xml
 missing
 
   /confi
   g/uimaConfig/analysisEngine
  
 
  This seems to be related to a missing /config/uimaConfig/analysisEngine
  element inside solrconfig.xml.
  Regards,
  Tommaso
 
 
  
   On Mon, May 16, 2011 at 6:19 PM, Tommaso Teofili [via Lucene] 
   [hidden email] http://user/SendEmail.jtp?type=nodenode=2952043i=1
  wrote:
  
The error you pasted doesn't seem to be related to a (class)path
 issue
   but
more likely to be related to a Solr instance at 1.4.1/3.1.0 and
  Solr-UIMA
module at 3.1.0/4.0-SNAPSHOT(trunk); it seems that the error raises
  from
UpdateRequestProcessorFactory API changed.
Hope this helps,
Tommaso
   
   
Il giorno 16/mag/2011, alle ore 18.54, chamara ha scritto:
   
 Hi Tommaso,
 Thanks for the quick reply. I had copied the lib files and
 followed instructions on
http://wiki.apache.org/solr/SolrUIMA#Installation.
 However i get this error. The AnalysisEngine has the default class
  path
 which is /org/apache/uima/desc/.

 SEVERE: org.apache.solr.common.SolrException: Error Instantiating
 UpdateRequestP
 rocessorFactory,
 org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor
 y is not a
   org.apache.solr.update.processor.UpdateRequestProcessorFactory
   


 Regards,
 Chamara


 On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] 
 [hidden email] 
  http://user/SendEmail.jtp?type=nodenode=2948866i=0
 
wrote:

 Hello,

 if you want to take the descriptor from a jar, provided that you
configured

 the jar inside a lib element in solrconfig, then you just need
 to
write
 the correct classpath in the analysisEngine element.
 For example if your descriptor resides in com/something/desc/ path
inside
 the jar then you should set the analysisEngine element as
 /com/something/desc/descriptorname.xml
 If you instead need to get the descriptor from filesystem try the
   patch
in
 SOLR-2501 [1].
 Hope this helps,
 Tommaso

 [1] :  https://issues.apache.org/jira/browse/SOLR-2501

 2011/5/13 chamara [hidden email]
http://user/SendEmail.jtp?type=nodenode=2946920i=0
   


 Hi,
 Is this code line 57 needs to be changed to the location where
 the
   jar
 files(library files) resides?
 URL url = this.getClass().getResource(location of the jar
  files);
   I
   
 did
 change it but no luck so far. Let me know what i am doing wrong?

 --
 View this message in context:


   
  
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
 Sent from the Solr - User mailing list archive at Nabble.com.



 --
 If you reply to this email, your message will be added to the
   discussion
   
 below:


   





 --
 --- Chamara 


 --
 View this message in context:
   
  
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html
   
 Sent from the Solr - User mailing list archive at Nabble.com.
   
   
   
--
 If you reply to this email, your message will be added

Re: lucene parser, negative OR operands

2011-05-18 Thread Jonathan Rochkind


On 5/17/2011 8:00 PM, Yonik Seeley wrote:

This doesn't have to do with Solr's support of pure-negative top-level
queries, but does have to do with
a long standing confusion of how the lucene queryparser works with
some of the operators (i.e. not really boolean logic).

In a Lucene BooleanQuery, clauses are mandatory, optional, or prohibited.
-foo OR -bar actually parses to a boolean query with two prohibited
clauses... essentially the
same as -foo AND -bar.  You can see this by adding debugQuery=true to
the request.


Thanks Yonik. I recall hearing about this before, but was vague on the 
details, thanks for supplying some and refreshing my memory.


So I guess there is no such thing as an optional prohibited clause.  
Which is what makes -one OR -two the same thing as -one AND -two.  
Actually, yeah, an optional prohibited clause doesn't reallly even 
make sense. Hmm.


If I want to understand more about how the lucene query parser does it's 
thing, can anyone suggest the source files I should be looking at?


If I really do want actual boolean logic behavior, what are my options?  
I guess one is trying to write my own query parser.


Hmm, for that particular query, what about using parens to force a 
sub-query?


(-one) OR (-two)

Ha, nope, that runs into a different problem (or is it the same 
problem?), and always returns 0 hits.  It looks like the lucene query 
parser can't handle a pure-negative sub-query like that seperate by OR?  
Not sure why, can anyone explain that one?


For that particular pattern, this crazy refactoring of the query does 
work and get the actual boolean logic result of (not 'one') OR (not 
'two'):


(*:* AND -one) OR (*:* AND -two)

Phew, crazy stuff. So that's a weird solution to getting actual boolean 
logic behavior for that pattern, but in general, I'm kind of wanting a 
parser that will give actual boolean logic behavior. Maybe someday I can 
find time to write it in Java (not the quickest thing for me, not 
familiar with the code at all).


Jonathan

Re: Does every Solr request-response require a running server?

On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
 Hello,

 I'm wondering if Solr Test framework at the end of the day always runs an
 embedded/jetty server (which is the only way to interact with solr, i.e. no
 web server -- no solr) or in the tests they interact without one, calling
 directly the under line methods?

 The latter seems to be the case trying to understand SolrTestCaseJ4. That
 would be more white-box than otherwise.

Solr does either, depending on the test.  Most tests start only an
embedded solr server w/ no web server, but others use an embedded
jetty server so one can talk HTTP to it.  JettySolrRunner is used for
the latter.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: Does every Solr request-response require a running server?

On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I'm wondering if Solr Test framework at the end of the day always runs an
  embedded/jetty server (which is the only way to interact with solr, i.e.
 no
  web server -- no solr) or in the tests they interact without one,
 calling
  directly the under line methods?
 
  The latter seems to be the case trying to understand SolrTestCaseJ4. That
  would be more white-box than otherwise.

 Solr does either, depending on the test.

 Most tests start only an
 embedded solr server w/ no web server,


What is confusing me is the solr server. Is it SolrCore? In what aspects is
it a 'server'? In my understanding it's the core of the Solr Web application
which makes up the servlets interface, i.e. it's under the servlets not on
top of them.


 but others use an embedded
 jetty server so one can talk HTTP to it.  JettySolrRunner is used for
 the latter.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: Set operations on multiple queries with different qf parameters

2011-05-18 Thread Jonathan Rochkind

Don't know of any other documentation. There might be some minimal page 
on the wiki somewhere, but I can never find it either although I have 
some memory of seeing it once, it didn't have anything that the blog 
post didn't.


I think 'mm' _should_ work as a LocalParam in a nested query, I use it 
myself in code and it seems to work.


But not too surprised that 'fq' doesn't (although I haven't verified 
that myself). If indeed it doesn't, here would be a hacky way to get the 
same semantics, although it won't use the filter cache for the fq.


If this doesn't work:

defType=luceneq= _query_:{!edismax qf='p,q,r' fq='field1:xyz'}abc
def AND _query_:{!edismax mm=100% qf='q, r, s'}jlk


Then this should, we can just put it in our top-level lucene query as an 
additional condition.


defType=luceneq= (_query_:{!edismax qf='p,q,r'}abc def  AND 
field1:xyz) AND _query_:{!edismax mm=100% qf='q, r, s'}jlk


Yeah, this starts to get painful, agreed, with unclear performance 
implications.





On 5/17/2011 10:44 PM, Nikhil Chhaochharia wrote:

Thanks, this looks good.  mm and fq don't seem to be working for a nested 
query, but I should be able to work around it.
I was unable to find much documentation on the Wiki, API docs or in the Solr 
book - please let me know if you are aware of any other documentation for this 
feature apart from the mentioned blog post.


Thanks,

Nikhil



- Original Message -
From: Jonathan Rochkindrochk...@jhu.edu
To: solr-user@lucene.apache.org; Nikhil Chhaochharianikhil...@yahoo.com
Cc:
Sent: Tuesday, 17 May 2011 8:52 PM
Subject: Re: Set operations on multiple queries with different qf parameters

One way to do it might be to use the Solr 'nested query' functionality.

http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

Not entirely sure this will work exactly as I've written it, but give
you some ideas of what nested query can do.  Note not fully URL-encoded
for clarity:

defType=luceneq= _query_:{!edismax qf='p,q,r' fq='field1:xyz'}abc
def AND _query_:{!edismax mm=100% qf='q, r, s'}jlk



On 5/17/2011 2:55 AM, Nikhil Chhaochharia wrote:

Hi,

I am using Solr 3.1 with edismax.  My frontend allows the user to create 
arbitrarily complex queries by modifying q, fq, qf and mm (only 1 and 100% are 
allowed) parameters.  The queries can then be saved by the user.

The user should be able to perform set operations on the saved searches.  For 
example, the user may want to see all documents which are returned both by 
saved search 1 and saved search 2 (equivalent to intersection of the two).

If the saved searches contain q, fq and/or mm, then I can combine the saved 
searches to create a new query which will be equivalent to their intersection.  
However, I can't figure out how to handle qf?

For example,

Query 1 = q=abc deffq=field1:xyzmm=1qf=p,q,r
Query 2 = q=jklmm=100%qf=q,r,s

How do I get the list of common documents which are present in the result set 
of both queries?



Thanks,
Nikhil

Re: Does every Solr request-response require a running server?

On Wed, May 18, 2011 at 11:14 AM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:


 On Wed, May 18, 2011 at 5:09 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:

 On Wed, May 18, 2011 at 10:50 AM, Gabriele Kahlout
 gabri...@mysimpatico.com wrote:
  Hello,
 
  I'm wondering if Solr Test framework at the end of the day always runs
  an
  embedded/jetty server (which is the only way to interact with solr, i.e.
  no
  web server -- no solr) or in the tests they interact without one,
  calling
  directly the under line methods?
 
  The latter seems to be the case trying to understand SolrTestCaseJ4.
  That
  would be more white-box than otherwise.

 Solr does either, depending on the test.

  Most tests start only an
 embedded solr server w/ no web server,

 What is confusing me is the solr server. Is it SolrCore? In what aspects is
 it a 'server'? In my understanding it's the core of the Solr Web application
 which makes up the servlets interface, i.e. it's under the servlets not on
 top of them.

Look at TestHarness - it instantiates a CoreContainer.
When running as a webapp in a Jetty server, a DispatchFilter is
registered that instantiates the CoreContainer.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco





 but others use an embedded
 jetty server so one can talk HTTP to it.  JettySolrRunner is used for
 the latter.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco



 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).

Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Shawn Heisey


On 5/17/2011 10:00 AM, Chris Hostetter wrote:

: If I understand what you've said above correctly, removing the binding in
: solr.war would make it inherit the binding in jetty/tomcat/whatever, is that
: right?  That sounds like an awesome plan to me.  The example jetty server can
: be configured instead of solr.war.  Once you've answered this, I can submit my
: vote.

no, removing the bindings in solr.war would result in solr not logging
*anything* unless you manually added a jar (defining the bindings you
want) to the jetty (or tomcat) system classloader.


What I'd want to have is the ability to download Solr source code, not 
modify anything, create a .war, and drop it into an existing system that 
has my preferred logging already set up, which from what you are saying 
would also require that the example have a jar with the JDK bindings, 
and that everyone who sets up a more custom system create their own jar 
and put it somewhere it can be found.


What's involved in creating that jar?  Is it something that a novice 
could get done?  Is it something that could be prepackaged for the most 
common choices, or possibly already available on the Internet?


Thanks,
Shawn

JSON delete error with latest branch_3x

2011-05-18 Thread Paul Dlug

I updated to the latest branch_3x (r1124339) and I'm now getting the
error below when trying a delete by query or id. Adding documents with
the new format works as do the commit and optimize commands. Possible
regression due to SOLR-2496?

curl 'http://localhost:8988/solr/update/json?wt=json' -H
'Content-type:application/json' -d '{delete:{query:*:*}}'

Error 400 meaningless command:
delete:query=`*:*`,fromPending=false,fromCommitted=false

Problem accessing /solr/update/json. Reason:
  meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false

Re: JSON delete error with latest branch_3x

On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: delete:query=`*:*`,fromPending=false,fromCommitted=false

Hmmm, looks like unit tests must be inadequate for the JSON format.
I'll look into it.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: Anyone familiar with Solandra or Lucandra?

2011-05-18 Thread Jake Luciani

This will be possible once triggers are finished for cassandra, then we can
hook into CF inserts and auto index in solandra.

On Tue, May 17, 2011 at 5:10 PM, kenf_nc ken.fos...@realestate.com wrote:

 Ah. I see. That reduces its usefulness to me some. The multi-master aspect
 is
 still a big draw of course. But I was hoping this also added an integrated
 persistence layer to Solr as well.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Anyone-familiar-with-Solandra-or-Lucendra-tp2927357p2954320.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
http://twitter.com/tjake

Re: JSON delete error with latest branch_3x

OK, I just fixed this on branch_3x.
Trunk is fine (it was an error in the 3x backport that wasn't caught
because the test doesn't go through the complete solr stack to the
update handkler).

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: 
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Hmmm, looks like unit tests must be inadequate for the JSON format.
 I'll look into it.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco

Re: JSON delete error with latest branch_3x

2011-05-18 Thread Paul Dlug

Thanks Yonik, all my app's test cases now pass again.


--Paul

On Wed, May 18, 2011 at 2:04 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 OK, I just fixed this on branch_3x.
 Trunk is fine (it was an error in the 3x backport that wasn't caught
 because the test doesn't go through the complete solr stack to the
 update handkler).

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco


 On Wed, May 18, 2011 at 1:29 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Wed, May 18, 2011 at 1:24 PM, Paul Dlug paul.d...@gmail.com wrote:
 I updated to the latest branch_3x (r1124339) and I'm now getting the
 error below when trying a delete by query or id. Adding documents with
 the new format works as do the commit and optimize commands. Possible
 regression due to SOLR-2496?

 curl 'http://localhost:8988/solr/update/json?wt=json' -H
 'Content-type:application/json' -d '{delete:{query:*:*}}'

 Error 400 meaningless command:
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Problem accessing /solr/update/json. Reason:
  meaningless command: 
 delete:query=`*:*`,fromPending=false,fromCommitted=false

 Hmmm, looks like unit tests must be inadequate for the JSON format.
 I'll look into it.

 -Yonik
 http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
 25-26, San Francisco

Re: Field collapsing on multiple fields and/or ranges?

bump

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless

As far as I know this is not possible today with either Solr's 4.0
grouping impl or the new grouping module (soon to be grouping in Solr
3.x).

I'm not sure about the patch on SOLR-236 though.

But it's an interesting use case; it's a compound group key, right?
You want to group by a tuple (X, Y).  Can you open a Lucene issue for
this?  I'm not sure we can fix it today but I think the use case is
reasonable so we can at least discuss it on an issue...

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 2:23 PM, arian487 akarb...@tagged.com wrote:
 bump

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958029.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field collapsing on multiple fields and/or ranges?

Thanks for the reply!  How exactly do I open an issue?  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Disable IDF scoring on certain fields

2011-05-18 Thread Brian Lamb

I believe I have applied the patch correctly. However, I cannot seem to
figure out where the similarity class I create should reside. Any tips on
that?

Thanks,

Brian Lamb

On Tue, May 17, 2011 at 4:00 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Thank you Robert for pointing this out. This is not being used for
 autocomplete. I already have another core set up for that :-)

 The idea is like I outlined above. I just want a multivalued field that
 treats every term in the field the same so that the only way documents
 separate themselves is by an unrelated boost and/or matching on multiple
 terms in that field.


 On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.io
  wrote:

 Well, if you're experimental you can try trunk as Robert points out it has
 been fixed there. If not, i guess you're stuck with creating another core.

 If this fieldType specifically used for auto-completion? If so, another
 core,
 preferably on another machine, is in my opinion the way to go.
 Auto-completion
 is tough in terms of performance.

 Thanks Robert for pointing to the Jira ticket.

 Cheers

  Hi Markus,
 
  I was just looking at overriding DefaultSimilarity so your email was
 well
  timed. The problem I have with it is as you mentioned, it does not seem
  possible to do it on a field by field basis. Has anyone had any luck
 with
  doing some of the similarity functions on a field by field basis? I have
  need to do more than one of them and from what I can find, it seems that
  only computeNorm accounts for the name of the field.
 
  Thanks,
 
  Brian Lamb
 
  On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma
 
  markus.jel...@openindex.iowrote:
   Hi,
  
   Although you can configure per field TF (by omitTermFreqAndPositions)
 you
   can't
   do this for IDF. If you index is only used for this specific purpose
   (seems like an auto-complete index) then you can override
   DefaultSimilarity and return a static value for IDF. If you still want
   IDF for other fields then i
   think you have a problem because Solr doesn't yet support per-field
   similarity.
  
  
  
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav
   a/org/apache/lucene/search/DefaultSimilarity.java?view=markup
  
   Cheers,
  
Hi all,
   
I have a field defined in my schema.xml file as
   
fieldType name=edgengram class=solr.TextField
positionIncrementGap=1000
   
   analyzer
   
 tokenizer class=solr.LowerCaseTokenizerFactory /
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
   
maxGramSize=25 side=front /
   
   /analyzer
   
/fieldType
field name=myfield multiValued=true type=edgengram
indexed=true stored=true required=false omitNorms=true /
   
I would like do disable IDF scoring on this field. I am not
 interested
in how rare the term is, I only care if the term is present or not.
The idea is that if a user does a search for myfield:dog OR
myfield:pony, that any document containing dog or pony would be
scored identically. In the case that both showed up, that record
 would
be moved to the top but all the records where they both showed up
would have the same score.
   
So long story short, how can I disable the idf score for this
particular field?
   
Thanks,
   
Brian Lamb

Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless

Start here: https://issues.apache.org/jira/browse/LUCENE

Create an account (it's free), open an issue and set the component to
modules/grouping, fill in the fields, and submit it :)

Then maybe make a patch and attach it!  Genericizing the per-doc
grouping key is important; we have an issue open for this already:
https://issues.apache.org/jira/browse/LUCENE-3099

So in theory if we had LUCENE-3099 done, a sub-class could create a
compound group key.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 3:34 PM, arian487 akarb...@tagged.com wrote:
 Thanks for the reply!  How exactly do I open an issue?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958277.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field collapsing on multiple fields and/or ranges?

https://issues.apache.org/jira/browse/SOLR-2526

modules/grouping was not a valid component so I just put it in search. 
Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field collapsing on multiple fields and/or ranges?

2011-05-18 Thread Michael McCandless

Ahh, that's because you opened a Solr not a Lucene issue ;)

The modules (incl. new grouping module) are under Lucene.  That's
fine, we can leave it as a Solr issue.

Mike

http://blog.mikemccandless.com

On Wed, May 18, 2011 at 4:10 PM, arian487 akarb...@tagged.com wrote:
 https://issues.apache.org/jira/browse/SOLR-2526

 modules/grouping was not a valid component so I just put it in search.
 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2958408.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Judioo

Hi,
I'm new to solr so apologies if the solution is already documented.
I have installed and populated a solr index using the examples as a template
with a version of the data below.

I have XML in the form of

  entity
resource
  guid123898-2092099098982/guid
  media_formatBlu-Ray/media_format
  updated2011-05-05T11:25:35+0500/updated
/resource
price currency=usd3.99price
discounts
  discount type=percentage rate=30
start=2011-05-03T00:00:00 end=2011-05-10T00:00:00 /

  discount type=decimal amount=1.99 coupon=1 /
  .
/discounts
aspect_ratio16:9/aspect_ratio
duration1620/duration
categories
  category id=drama /
  category id=horror /
/categories
rating
  rate id=D1contains some scenes which some viewers may find
upsetting/rate
/rating
...
media_typeVideo/media_type
/entity


Can I populate solr directly with this document (like I believe marklogic
does )?
If yes
Can I search on any attribute ( i.e. find all records where
/entity/resource/media_format equals blu-ray )

If no
What is the best practice to import the attributes above into solr ( i.e.
patterns for sub dividing / flattening document ).
Does solr support attached documents and if so is this advised ( how does it
affect performance ).

Any help is greatly appreciated. Pointers to documentation that address my
issues is even more helpful.

Thanks again


OJ

Re: Replication Clarification Please

2011-05-18 Thread Ravi Solr

Alexander, sorry for the delay in replying. I wanted to test out a few
hunches that I had before I get back to you.
Hurray!!!  I was able to resolve the issue. The problem was with the
cache settings in the solrconfig.xml. It was taking almost 15-20
minutes to warm up the caches on each commit, as we are commit heavy
(every 5 minutes) the replication was screaming for the new searcher
to be warmed and it would never get a chance to finish so it was
perennially backed up. We reduced the cache and autowarm counts and
now the replication is happy finishing within 20 seconds!! Thank you
again for all your support.

Thanks,

Ravi Kiran Bhaskar
The Washington Post
1150 15th St. NW
Washington, DC 20071

On Sun, May 15, 2011 at 3:12 AM, Alexander Kanarsky
alexan...@trulia.com wrote:
 Ravi,

 what is the replication configuration on both master and slave?
 Also could you list of files in the index folder on master and slave
 before and after the replication?

 -Alexander


 On Fri, 2011-05-13 at 18:34 -0400, Ravi Solr wrote:
 Sorry guys spoke too soon I guess. The replication still remains very
 slow even after upgrading to 3.1 and setting the compression off. Now
 Iam totally clueless. I have tried everything that I know of to
 increase the speed of replication but failed. if anybody faced the
 same issue, can you please tell me how you solved it.

 Ravi Kiran Bhaskar

 On Thu, May 12, 2011 at 6:42 PM, Ravi Solr ravis...@gmail.com wrote:
  Thank you Mr. Bell and Mr. Kanarsky, as per your advise we have moved
  from 1.4.1 to 3.1 and have made several changes to configuration. The
  configuration changes have worked nicely till now and the replication
  is finishing within the interval and not backing up. The changes we
  made are as follows
 
  1. Increased the mergeFactor from 10 to 15
  2. Increased ramBufferSizeMB to 1024
  3. Changed lockType to single (previously it was simple)
  4. Set maxCommitsToKeep to 1 in the deletionPolicy
  5. Set maxPendingDeletes to 0
  6. Changed caches from LRUCache to FastLRUCache as we had hit ratios
  well over 75% to increase warming speed
  7. Increased the poll interval to 6 minutes and re-indexed all content.
 
  Thanks,
 
  Ravi Kiran Bhaskar
 
  On Wed, May 11, 2011 at 6:00 PM, Alexander Kanarsky
  alexan...@trulia.com wrote:
  Ravi,
 
  if you have what looks like a full replication each time even if the
  master generation is greater than slave, try to watch for the index on
  both master and slave the same time to see what files are getting
  replicated. You probably may need to adjust your merge factor, as Bill
  mentioned.
 
  -Alexander
 
 
 
  On Tue, 2011-05-10 at 12:45 -0400, Ravi Solr wrote:
  Hello Mr. Kanarsky,
                  Thank you very much for the detailed explanation,
  probably the best explanation I found regarding replication. Just to
  be sure, I wanted to test solr 3.1 to see if it alleviates the
  problems...I dont think it helped. The master index version and
  generation are greater than the slave, still the slave replicates the
  entire index form master (see replication admin screen output below).
  Any idea why it would get the whole index everytime even in 3.1 or am
  I misinterpreting the output ? However I must admit that 3.1 finished
  the replication unlike 1.4.1 which would hang and be backed up for
  ever.
 
  Master        http://masterurl:post/solr-admin/searchcore/replication
        Latest Index Version:null, Generation: null
        Replicatable Index Version:1296217097572, Generation: 12726
 
  Poll Interval         00:03:00
 
  Local Index   Index Version: 1296217097569, Generation: 12725
 
        Location: /data/solr/core/search-data/index
        Size: 944.32 MB
        Times Replicated Since Startup: 148
        Previous Replication Done At: Tue May 10 12:32:42 EDT 2011
        Config Files Replicated At: null
        Config Files Replicated: null
        Times Config Files Replicated Since Startup: null
        Next Replication Cycle At: Tue May 10 12:35:41 EDT 2011
 
  Current Replication Status    Start Time: Tue May 10 12:32:41 EDT 2011
        Files Downloaded: 18 / 108
        Downloaded: 317.48 KB / 436.24 MB [0.0%]
        Downloading File: _ayu.nrm, Downloaded: 4 bytes / 4 bytes [100.0%]
        Time Elapsed: 17s, Estimated Time Remaining: 23902s, Speed: 18.67 
  KB/s
 
 
  Thanks,
  Ravi Kiran Bhaskar
 
  On Tue, May 10, 2011 at 4:10 AM, Alexander Kanarsky
  alexan...@trulia.com wrote:
   Ravi,
  
   as far as I remember, this is how the replication logic works (see
   SnapPuller class, fetchLatestIndex method):
  
   1. Does the Slave get the whole index every time during replication or
   just the delta since the last replication happened ?
  
  
   It look at the index version AND the index generation. If both slave's
   version and generation are the same as on master, nothing gets
   replicated. if the master's generation is greater than on slave, the
   slave fetches the delta files only (even if the

Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Yury Kats

On 5/18/2011 4:19 PM, Judioo wrote:

 Any help is greatly appreciated. Pointers to documentation that address my
 issues is even more helpful.

I think this would be a good start:
http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Judioo

The data is being imported directly from mysql. The document is however
indeed a good starting place.
Thanks

2011/5/18 Yury Kats yuryk...@yahoo.com

 On 5/18/2011 4:19 PM, Judioo wrote:

  Any help is greatly appreciated. Pointers to documentation that address
 my
  issues is even more helpful.

 I think this would be a good start:

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

Re: Storing, indexing and searching XML documents in Solr

2011-05-18 Thread Judioo

Great document. I can see how to import the data direct from the database.
However it seems as though I need to write xpath's in the config to extract
the fields that I wish to transform into an solr document.

So it seems that there is no way of storing the document structure in solr
as is?


2011/5/18 Yury Kats yuryk...@yahoo.com

 On 5/18/2011 4:19 PM, Judioo wrote:

  Any help is greatly appreciated. Pointers to documentation that address
 my
  issues is even more helpful.

 I think this would be a good start:

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Jan Høydahl

Hi,

If you've setup your Tomcat with log4j logging, and want to add Solr, within 
the same logging config, you need to:
 #1. Remove slf4j-jdk14-1.6.1.jar from solr.war (unpack, remove, repack)
 #2. Download slf4j-log4j12-1.6.1.jar (from slf4j.org) and place it in e.g. 
tomcat/shared/lib

If solr.war shipped without a pre-packaged binding, you could skip #1. The 
binding jar you deploy to appserver lib would also take effect for any other 
webapp using slf4j deployed to the same app-server.

An alternative to manually repackage solr.war as in #1, is Hoss' suggestion in 
SOLR-2487 of a new ANT option to build Solr artifacts without the JUL binding.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. mai 2011, at 18.33, Shawn Heisey wrote:

 On 5/17/2011 10:00 AM, Chris Hostetter wrote:
 : If I understand what you've said above correctly, removing the binding in
 : solr.war would make it inherit the binding in jetty/tomcat/whatever, is 
 that
 : right?  That sounds like an awesome plan to me.  The example jetty server 
 can
 : be configured instead of solr.war.  Once you've answered this, I can 
 submit my
 : vote.
 
 no, removing the bindings in solr.war would result in solr not logging
 *anything* unless you manually added a jar (defining the bindings you
 want) to the jetty (or tomcat) system classloader.
 
 What I'd want to have is the ability to download Solr source code, not modify 
 anything, create a .war, and drop it into an existing system that has my 
 preferred logging already set up, which from what you are saying would also 
 require that the example have a jar with the JDK bindings, and that everyone 
 who sets up a more custom system create their own jar and put it somewhere it 
 can be found.
 
 What's involved in creating that jar?  Is it something that a novice could 
 get done?  Is it something that could be prepackaged for the most common 
 choices, or possibly already available on the Internet?
 
 Thanks,
 Shawn

Using Boost fields for a sum total score.

2011-05-18 Thread ronveenstra

I have a sizable index with a main content field, and 5 defined boost
fields (boost_low, boost_med, boost_high, boost_max, and boost_neg).  The
idea and hope was to allow searches on the content field to be
influenced/boosted by the boosting fields if the search term was present.  

I had set up a dismax query with a qf' setting that boosted the content
field significantly, and the 5 boost fields with descending values. (e.g.
content^5.0 boost_max^1.2 boost_high^1.0 etc...)

After some testing and reading, I'm of the understanding that this setup
will result search the fields (content and boost fields), and apply the
boost to each, then choose the field with the highest score as the score for
that result (essentially taking the MAX() score from the various fields, and
not the SUM() of the fields' scores.)  If this is the case, is there an
alternate setup, config item, or means of combining these scores to return a
SUM() score instead?

Any direction or help would be most appreciated.
Ron 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2958968.html
Sent from the Solr - User mailing list archive at Nabble.com.

Two XPathEntityProcessor questions

2011-05-18 Thread Jeff Crump

Hi,

Can anyone tell me if the XPathEntityProcessor handles expresions like this:

xpath=/a/b[c='value']/d/e

That is, return a node that has a predecessor with a given text value?

I would like to map various XPath expressions of that form to the same
document in the index (I have a unique key constraint).

Also, is it possible to assign a value to a unique key from an HTTP
parameter?  Something like this:

field column=id${dataimporter.request.id}/field

I'm using a ContentStreamDataSource to fetch data from a POST.

Thanks,

Jeff

Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread Shawn Heisey

I usually build solr using 'ant test dist' to run tests and build the 
.war and other jars, in particular the dataimporthandler.  Having an 
alternate ant option to build without the binding would work for me.  
If/when I get around to changing logging mechanisms, I wouldn't be able 
to use the binary distribution, but with 3.1 I am already including 
selected patches from branch_3x and building it myself.


I can see that there is a lot of resistance to just removing the binding 
entirely.  I think that's a better option, but I know it's important to 
take care of the complete novices and their initial experience with the 
software.


[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with 
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose 
at deploy time
[X]  Let me choose whether to bundle a binding or not at build time, 
using an ANT option

[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!


On 5/18/2011 3:31 PM, Jan Høydahl wrote:
 Hi,

 If you've setup your Tomcat with log4j logging, and want to add Solr, 
within the same logging config, you need to:

  #1. Remove slf4j-jdk14-1.6.1.jar from solr.war (unpack, remove, repack)
  #2. Download slf4j-log4j12-1.6.1.jar (from slf4j.org) and place it 
in e.g. tomcat/shared/lib


 If solr.war shipped without a pre-packaged binding, you could skip 
#1. The binding jar you deploy to appserver lib would also take effect 
for any other webapp using slf4j deployed to the same app-server.


 An alternative to manually repackage solr.war as in #1, is Hoss' 
suggestion in SOLR-2487 of a new ANT option to build Solr artifacts 
without the JUL binding.

Re: [POLL] How do you (like to) do logging with Solr


: An alternative to manually repackage solr.war as in #1, is Hoss' 
: suggestion in SOLR-2487 of a new ANT option to build Solr artifacts 
: without the JUL binding.

More specificly, i'm advocating a new ANT property that would let you 
specify (by path) whatever SLF4J binding jar you want to include, or 
that you don't want any SLF4J binding jar included (by specifying a path 
to a jar that doesn't exist)

I want the default...
ant dist

I don't want a binding in solr.war...
ant -Dslf4j.jar.path=BOGUS_FILE_PATH dist

I want a specific binding in solr.war...
ant -Dslf4j.jar.path=/my/lib/slf4j-jcl-*.jar dist


-Hoss

Re: Storing, indexing and searching XML documents in Solr

You're right, you can't store an XML document directly in Solr.
You have to pull it apart and index it such that you can get whatever
information back you need.

How you flatten data depends entirely upon your needs. The high-level
idea is that you want to create fields such that text searches work. The
moment you start thinking about how can I express a relationship
in the query, back up and try to flatten the data so you can just *search*.

This is vague, I know. But so much depends on how you want to use
the data that specifics are hard to give.

You've gotta take off your DB hat and not worry about duplicating
data. De-normalize lots and lots and lots first...

Best
Erick

On Wed, May 18, 2011 at 5:27 PM, Judioo cont...@judioo.com wrote:
 Great document. I can see how to import the data direct from the database.
 However it seems as though I need to write xpath's in the config to extract
 the fields that I wish to transform into an solr document.

 So it seems that there is no way of storing the document structure in solr
 as is?


 2011/5/18 Yury Kats yuryk...@yahoo.com

 On 5/18/2011 4:19 PM, Judioo wrote:

  Any help is greatly appreciated. Pointers to documentation that address
 my
  issues is even more helpful.

 I think this would be a good start:

 http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource

Re: Using Boost fields for a sum total score.

You might look at edismax on the 3.1 and trunk, it calculates scores a
bit differently.

You could always just form the query yourself in the app and not use
dismax I think.

Best
Erick

On Wed, May 18, 2011 at 6:06 PM, ronveenstra ron-s...@agathongroup.com wrote:
 I have a sizable index with a main content field, and 5 defined boost
 fields (boost_low, boost_med, boost_high, boost_max, and boost_neg).  The
 idea and hope was to allow searches on the content field to be
 influenced/boosted by the boosting fields if the search term was present.

 I had set up a dismax query with a qf' setting that boosted the content
 field significantly, and the 5 boost fields with descending values. (e.g.
 content^5.0 boost_max^1.2 boost_high^1.0 etc...)

 After some testing and reading, I'm of the understanding that this setup
 will result search the fields (content and boost fields), and apply the
 boost to each, then choose the field with the highest score as the score for
 that result (essentially taking the MAX() score from the various fields, and
 not the SUM() of the fields' scores.)  If this is the case, is there an
 alternate setup, config item, or means of combining these scores to return a
 SUM() score instead?

 Any direction or help would be most appreciated.
 Ron


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Using-Boost-fields-for-a-sum-total-score-tp2958968p2958968.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Fuzzy search and solr 4.0

2011-05-18 Thread Guilherme Aiolfi

Hi,

I want to do a fuzzy search that compare a phrase to a field in solr. For
example:

abc company ltda will be compared to abc comp, abc corporation, def
company ltda, nothing to match here.

The thing is the it has to always returns documents sorted by its score.

I've found some good algorithms to do that, like StrikeAMatch[1] and
JaroWinkler.

Using the JaroWinkler with strdist() I can do exactly that. But, I rather
prefer to use the StrikeAMatch that had a patch in the lucene jira that was
never commited.

So, I contacted the author of that patch and he told me that I should use
the solr 4.0 that it has now some pretty good new fuzzy search enhancements
that made StrikeAMatch seems toys for kids.

Anyone know how can I achieve that using solr 4.0?

[1] http://www.catalysoft.com/articles/StrikeAMatch.html

RE: Solr Range Facets


: Thanks for explaining the point system, please find below the complete

Sorry .. that part was ment to be a joke, I think i was really tired when 
i wrote that.  The key take away: details matter.


:   int
: name=2011-05-02T05:30:00Z4/int
:   int
: name=2011-05-03T05:30:00Z63/int
:   int
: name=2011-05-04T05:30:00Z0/int
:   int
: name=2011-05-05T05:30:00Z0/int
...
: Now if you notice that the response show 4 records for the 2th of May 2011
: which will fall in the IST timezone (+330MINUTES), but when I try to get the

right.

: results I see that there is only 1 result for the 5th why is this happening.

Why do you say that?

According to those facet results, there are 0 docs between 
2011-05-05T05:30:00Z and 2011-05-05T05:30:00Z+1DAY (which is what i 
assume you mean by the 5th ... ie: May 5th, in that timezone offset)

Not only that, but the query you posted isn't attempting to filter on the 
5th by any possible definition of the concept...

:   str
: name=fqcreatedOnGMTDate:[2011-05-01T00:00:00Z+330MINUTES TO *]
:   /str

...that's saying you want all docs with a date on or after the 1st.

: If I don't apply the offset the results match with the facet count, is there
: something wrong in my query?

it looks like your query is just plain wrong.  if you're goal was to 
drill down and show only documents from the 5th it should have been 
something like...

fq = createdOnGMTDate:[2011-05-05T00:00:00Z+330MINUTES TO 
2011-05-05T00:00:00Z+330MINUTES+1DAY]

...but note also that there is the question of edge inclusion and when 
you want to use [A TO B] vs [A TO B}.  The facet.range.include option is 
how you control wether the edges are used in the facet counts...

http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.include


-Hoss

Re: Field collapsing on multiple fields and/or ranges?

Ah, my mistake.  Thanks alot, this would be a really cool feature :)

For now I'm resorting to like making more then one query and cross
referencing the two separate queries.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Field-collapsing-on-multiple-fields-and-or-ranges-tp2929793p2959439.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: lucene parser, negative OR operands


: Thanks Yonik. I recall hearing about this before, but was vague on the
: details, thanks for supplying some and refreshing my memory.

matching in Lucene is addative ... queries must match *something*, a 
clause ofa boolean query can be the negation of a query, but that only 
defines how documents should be removed from the set matched by the other 
queries in that boolean.

To put it another way: imagine modeling the list of documents matching a 
query as a bitset.  you can set bits to true, and you can set bits to 
false, but the bitset starts out with all bits as false, so if all 
you do is set bits to false, your bitset will *end* will all bits as false 

: If I want to understand more about how the lucene query parser does it's
: thing, can anyone suggest the source files I should be looking at?

the QueryParser.jj is the grammer for parsing, but the crux is to 
understand that the BooleanQuery class supports three types of clauses: 
PROHIBITED, MANDATORY, and OPTIONAL.  The QueryParser implements those as 
-, + and the default beahvior when neither +/- is present. The 
QueryParser also jumps through some hoops to support AND, OR, NOT but not 
all permutations of those are viable

: If I really do want actual boolean logic behavior, what are my options?  I
: guess one is trying to write my own query parser.

boolean logic generally is defined in some form relative the universe 
.. so a pure negative query like -red really means all things IN THE 
UNIVERSE that are not 'red' ... you can express that using *:* -red

What solr does (and how this thread started) is pointing out that for top 
level queries, (like q=-red or fq=-red) solr adds the *:* to the 
boolean query for you.

: Hmm, for that particular query, what about using parens to force a sub-query?
: 
: (-one) OR (-two)
: 
: Ha, nope, that runs into a different problem (or is it the same problem?), and
: always returns 0 hits.  It looks like the lucene query parser can't handle a
: pure-negative sub-query like that seperate by OR?  Not sure why, can anyone
: explain that one?

the query parser can handle it, and it produces a valid query object, but 
that query object doesn't match anything. -one matches nothing, -two 
matchines nothing ... nothing union nothing is still nothing.

: For that particular pattern, this crazy refactoring of the query does work and
: get the actual boolean logic result of (not 'one') OR (not 'two'):
: 
: (*:* AND -one) OR (*:* AND -two)

correct -- that is you formally saying give me all docs IN THE UNIVERSE 
that are not 'one', and union that with all docs IN THE UNIVERSE that are 
not 'two'

: behavior for that pattern, but in general, I'm kind of wanting a parser that
: will give actual boolean logic behavior. Maybe someday I can find time to
: write it in Java (not the quickest thing for me, not familiar with the code at
: all).

You could implement a parser like that relatively easily -- just make sure 
you put a MatchAllDocsQuery in every BooleanQuery object thta you 
construct, and only ever use the PROHIBITED and MANDATORY clause types 
(never OPTIONAL) ...  the thing is, a parser like that isn't as useful 
as you think it might be when dealing with search results.  OPTIONAL 
clauses are where most of the useful factors of scoring documents ocme 
into play.

-Hoss

Too slow indexing while using 2 different data sources

2011-05-18 Thread deniz

Is it normal to observe slow speed while using an URL datasource and also a
DB? it was something around 30 seconds with only DB source, but when I add
URL datasource too, then it takes 24 - 25 mins to index exactly the same
amount of docs


Is there anyway of overcoming this thing? or i have to suffer?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2959551.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [POLL] How do you (like to) do logging with Solr

2011-05-18 Thread kun xiong

 [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
 [X ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can
choose at deploy time
 [x ]  Let me choose whether to bundle a binding or not at build time,
using an ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr
elsewhere!
 [ ]  What? Solr can do logging? How cool!

2011/5/19 Chris Hostetter hossman_luc...@fucit.org


 : An alternative to manually repackage solr.war as in #1, is Hoss'
 : suggestion in SOLR-2487 of a new ANT option to build Solr artifacts
 : without the JUL binding.

 More specificly, i'm advocating a new ANT property that would let you
 specify (by path) whatever SLF4J binding jar you want to include, or
 that you don't want any SLF4J binding jar included (by specifying a path
 to a jar that doesn't exist)

 I want the default...
ant dist

 I don't want a binding in solr.war...
ant -Dslf4j.jar.path=BOGUS_FILE_PATH dist

 I want a specific binding in solr.war...
ant -Dslf4j.jar.path=/my/lib/slf4j-jcl-*.jar dist


 -Hoss

Re: Too slow indexing while using 2 different data sources

2011-05-18 Thread deniz

Some details? well i think its clear but still here is the part of my
solrconfig


requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdbconfig.xml/str
  lst name=datasource
 str name=namedatabase/str
 str name=typeJdbcDataSource/str
 str name=drivercom.mysql.jdbc.Driver/str
 str name=urljdbc:mysql://abcd/efgh/str
 str name=usersome/str
 str name=passwordsome/str
  /lst
  lst name=datasource
 str name=nameurl_data/str
 str name=typeURLDataSource/str
 str name=processorXPathEntityProcessor/str
  /lst
/lst
  /requestHandler


and my dbconfig
/* Fields from DB */
/* Fields from DB */
/* Fields from DB */
/* Fields from DB */
...
...
..
entity name=universal dataSource=url_data 
url=http://..com/fddgtr.php/${sa.somevalue};
processor=XPathEntityProcessor
forEach=/some/somefield
field column=info  
xpath=/some/somefield/info/
/entity

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-slow-indexing-while-using-2-different-data-sources-tp2959551p2959626.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Too slow indexing while using 2 different data sources

2011-05-18 Thread Gora Mohanty

On Thu, May 19, 2011 at 6:59 AM, deniz denizdurmu...@gmail.com wrote:
 Is it normal to observe slow speed while using an URL datasource and also a
 DB? it was something around 30 seconds with only DB source, but when I add
 URL datasource too, then it takes 24 - 25 mins to index exactly the same
 amount of docs
[...]

What is the time for indexing just the URL data source? Is
it possible that your URL data source is slow in serving
data?

Regards,
Gora

Re: filter cache and negative filter query