Solr not returning all documents?

2010-03-29 Thread Adrian Pemsel
Hi,

As part of our application I have written a reindex task that runs through
all documents in a core one by one (using *:*, a start offset and a row
limit of 1) and adds them to a new core (potentially with a new schema).
However, while working well for small sets this approach somehow does not
seem to work for larger data sets. The Reindex task counts its offset into
the old core, this count stops at about 118000 and no more documents are
returned. However, numDocs says there are around 582000 documents in the old
core.
Am I making a wrong assumption in believing I should get all documents like
this?

Thanks,

Adrian


RE: Perfect Match

2010-03-29 Thread Nair, Manas
Awesome Ahmet.
Thanks for the reply. It seems to work now.
 
Thanks a ton.



From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Tue 3/23/2010 2:35 PM
To: solr-user@lucene.apache.org
Subject: RE: Perfect Match



 Thankyou Ahmet. You were right.
 artist_s:Dora is bringing results.
 But I need artist_s:Dora the explorer to bring only those
 results which contain Dora the explorer.
 
 I tried to give artist_s:Dora the explorer (phrase
 search).. that is working. But artist_s:Dora the explorer is
 not working. Any way to make this artist_s:Dora the explorer
 to return results that contain this in them.

I learned this from Chris Hostetter's message[1] You can use
q={!field f=artist_s}Dora the explorer
instead of q=artist_s:Dora the explorer.

[1]http://search-lucene.com/m/rrHVV1ZhO4j/this+is+what+the+%22field%22+QParserPlugin+was+invented+for


 




field QParserPlugin - Help needed

2010-03-29 Thread Nair, Manas
Hello Experts,

Could anyone please help me by directing me to some link where I can get more 
details on Solr's field QParserPlugin.

I would be really grateful.

Thankyou all,

Manas



Experiences with SOLR-1797 ?

2010-03-29 Thread Daniel Nowak
Hello, 

has anyone some experiences with this patch of SOLR-1797 
(http://issues.apache.org/jira/browse/SOLR-1797) ?

Best Regards


Daniel Nowak
Senior Developer

Rocket Internet GmbH  |  Saarbrücker Straße 20/21  |  10405 Berlin  | 
Deutschland

tel: +49 30 / 559 554 66  |  fax: +49 30 / 559 554 67  |  skype: daniel.s.nowak

mail: daniel.no...@rocket-internet.de

Geschäftsführer: Frank Biedka, Dr. Florian Heinemann, Uwe Horstmann, Felix 
Jahn, Arnt Jeschke, Dr. Philipp Kreibohm

Eingetragen beim Amtsgericht Berlin, HRB 109262 USt-ID DE256469659





Re: How to use Payloads with Solr?

2010-03-29 Thread Grant Ingersoll

On Mar 27, 2010, at 5:31 AM, MitchK wrote:

 
 Hello community, 
 
 since I have searched for a solution to get TermPositions in Solr, I became
 more aware of the payload-features. So I decided to learn more about
 payloads. 
 In the wiki, there is not much said about them, so I will ask here at the
 mailing-list. 
 
 It seems like Payloads are some extra-information for tokens, which I can
 customize in any way. 
 For example, I could write a payloadFilter that gives the highest
 scoring-factor to the first token and the lowest to the last one. I also
 could say oh, this word is a substantive. Add this as a
 payload-information: substantive. 
 
 However: How do I use these information at query-time? How can I influence
 the scoring in Solr? 
 I mean, I could write a payload-interpreter (Am I right to do so with
 AveragePayloadFunction from Lucene 2.9.1?) for scoring. 
 So, if I do so, I can switch the scoring of all substantives without
 reindexing the payloads by setting there scoring-factor in the schema.xml
 (of course this will need some more extra-modifications). 

Unfortunately, there is no query time support for this, other than a custom 
query parser that is posted in JIRA by Erik Hatcher.

 
 Can anybody tell me more about how to use payloads with Solr? 
 
 For all the others, who want to learn some basic-information about payloads,
 I would suggest to read this article from Grant Ingersoll: 
 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
 
 It is a really good tutorial and introduction to this topic. 
 
 Unfortunately, it seems like he has not written anything about how to
 integrate this in Solr (I haven't find anything more). 

Yeah, this is is unfortunate.  Would be nice to have both support for payloads 
and spans in Solr.

-Grant

Getting solr response in HTML format : HTMLResponseWriter

2010-03-29 Thread Arnaud Garcia
Hello everybody

I m using NUTCH with SOLR and the result of solr searching as you know is in
XML format .


Because I want an HTML format for the response (like the result of NUTCH
searching result)

so I have tried to attach the xslt steelsheet to the response of SOLR with
passing this 2 variables wt=xslttr=example.xsl

while example.xsl is an included steelsheet to SOLR , but the response in
HTML was'nt very perfect .

So i have readen on the net that we can write an extension to the
QueryResponseWriter class like XMLResponseWriter (default)
and i m trying to build that .

I m proceeding like XMLREsponseWriter to create HTMLResponseWriter and i
have added this line  queryResponseWriter name=html
class=org.apache.solr.
request.HTMLResponseWriter / in solr-config.xml

I have an error like this :

org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.request.HTMLResponseWriter'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)

at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)

at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1408)
at org.apache.solr.core.SolrCore.init(SolrCore.java:547)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)

at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)

at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)

at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)

at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.request.HTMLResponseWriter

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)

at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
... 33 more

   It appears that the compiler doesn't found the class HTMLResponseWriter

Does anyone know where additionnals information about the class
HTMLResponseWriter must be added to remove this error 


thanks for all


Re: Getting solr response in HTML format : HTMLResponseWriter

2010-03-29 Thread Arnaud Garcia
2010/3/29 Arnaud Garcia arnaud1...@gmail.com

 Hello everybody

 I m using NUTCH with SOLR and the result of solr searching as you know is
 in XML format .


 Because I want an HTML format for the response (like the result of NUTCH
 searching result)

 so I have tried to attach the xslt steelsheet to the response of SOLR with
 passing this 2 variables wt=xslttr=example.xsl

 while example.xsl is an included steelsheet to SOLR , but the response in
 HTML was'nt very perfect .

 So i have readen on the net that we can write an extension to the
 QueryResponseWriter class like XMLResponseWriter (default)
 and i m trying to build that .

 I m proceeding like XMLREsponseWriter to create HTMLResponseWriter and i
 have added this line  queryResponseWriter name=html
 class=org.apache.solr.
 request.HTMLResponseWriter / in solr-config.xml

 I have an error like this :

 org.apache.solr.common.SolrException: Error loading class 
 'org.apache.solr.request.HTMLResponseWriter'
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)

   at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
   at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)

   at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
   at org.apache.solr.core.SolrCore.initWriters(SolrCore.java:1408)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:547)
   at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)

   at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

   at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
   at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)

   at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
   at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

   at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)

   at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)

   at org.mortbay.jetty.Server.doStart(Server.java:210)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)

   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.solr.request.HTMLResponseWriter

   at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

   at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:247)

   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
   ... 33 more

It appears that the compiler doesn't found the class HTMLResponseWriter

 Does anyone know where additionnals information about the class
 HTMLResponseWriter must be added to remove this error 


 thanks for all




Delete id from a specific core

2010-03-29 Thread Lee Smith
Hey All

From the docs deleting from an index os pretty simpl:  java -Ddata=args 
-Dcommit=no -jar post.jar deleteidSP2514N/id/delete

How about from a specific core?   Say I wanted to delete id=12344  from core 1

Hope this makes sense and is easy to answer!

Regards

Lee

RE: One item, multiple fields, and range queries

2010-03-29 Thread David Smiley (@MITRE.org)

Sorry, I intended to design my post so that one wouldn't have to read the
thread for context but it seems I failed to do that.  Don't bother reading
the thread.  The use-case I'm pondering modifying Lucene/Solr to solve is
the one-to-many problem.  Imagine a document that contains multiple
addresses where each field of an address (like street, state, zipcode) go in
different multi-valued fields.  The main difficulty is considering how
Lucene might be modified to have query results across different fields be
intersected by a matching term position offset (which is designed in these
fields to refer to a known value offset).

Following the link you gave is interesting though the general case I'm
talking about doesn't have a hierarchy.  And I find the use of a single
multi-valued field unpalatable for a variety of reasons.

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p683361.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete id from a specific core

2010-03-29 Thread Erik Hatcher

Lee -

Use the url parameter.

~/dev/solr/example/exampledocs: java -jar post.jar  -help
SimplePostTool: version 1.2
This is a simple command line tool for POSTing raw XML to a Solr
port.  XML data can be read from files specified as commandline
args; as raw commandline arg strings; or via STDIN.
Examples:
  java -Ddata=files -jar post.jar *.xml
  java -Ddata=args  -jar post.jar 'deleteid42/id/delete'
  java -Ddata=stdin -jar post.jar  hd.xml
Other options controlled by System Properties include the Solr
URL to POST to, and whether a commit should be executed.  These
are the defaults for all System Properties...
  -Ddata=files
  -Durl=http://localhost:8983/solr/update
  -Dcommit=yes

Core 1's update URL is likely something like http://localhost:8983/solr/1/update

Erik


On Mar 29, 2010, at 9:08 AM, Lee Smith wrote:


Hey All

From the docs deleting from an index os pretty simpl:  java - 
Ddata=args -Dcommit=no -jar post.jar deleteidSP2514N/id/ 
delete


How about from a specific core?   Say I wanted to delete id=12344   
from core 1


Hope this makes sense and is easy to answer!

Regards

Lee




Re: One item, multiple fields, and range queries

2010-03-29 Thread Lukas Kahwe Smith

On 29.03.2010, at 15:11, David Smiley (@MITRE.org) wrote:

 
 Sorry, I intended to design my post so that one wouldn't have to read the
 thread for context but it seems I failed to do that.  Don't bother reading
 the thread.  The use-case I'm pondering modifying Lucene/Solr to solve is
 the one-to-many problem.  Imagine a document that contains multiple
 addresses where each field of an address (like street, state, zipcode) go in
 different multi-valued fields.  The main difficulty is considering how
 Lucene might be modified to have query results across different fields be
 intersected by a matching term position offset (which is designed in these
 fields to refer to a known value offset).


i posted another use case the other day as well .. then again i hope the 
spatial support in 1.5 will make this use case obsolete soon. basically we have 
an app where we have offers that can be available in multiple stores. now in 
order to have a speedy compact index the idea was to simply store the geo 
location of the stores along with the offers in a multi valued field. however 
in order to filter on the x-y geo coordinates we would have to filter on the 
pairs. this is i guess similar to your above example as well with multiple 
addresses.

here is the link to my post:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201003.mbox/%3cfb3f49c8-31d9-48fc-b416-73a1bbd3f...@pooteeweet.org%3e

btw: i was mailed offlist if i have found an answer to the above question. so 
its not some crazy use case ..

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





More like this - setting a minimum number of terms used to build queries

2010-03-29 Thread Xavier Schepler

Hey,

Is there a way to make the  more like this feature build its queries 
from a minimum number of interesting terms ?

It looks like this component fires query with only 1 term in them.
I got a lot of results that aren't similar at all with the parsed 
document fields.


My parameters :
mlt.fl=question,mlt.mintf=1mlt.mindf=mlt.minwl=4

The question field contains between 15 and 50 terms.

Xavier S.


Re: RejectedExecutionException when searching with DirectSolrConnection

2010-03-29 Thread Don Werve
A followup: I discovered something interesting.  If I don't run Jetty in the
same JVM as DirectSolrConnection, all is well.

Nrr.


solr-trunk in production?

2010-03-29 Thread Agethle, Matthias
Hi,

I need the patch SOLR-236https://issues.apache.org/jira/browse/SOLR-236 
(field collapsing) in a production-system which currently is running on Solr 
1.4.
Can I switch to the trunk version (and apply the patch) without problems or is 
this not recommended?

Matthias



Re: ReplicationHandler reports incorrect replication failures

2010-03-29 Thread Shawn Smith
Thanks.  I created https://issues.apache.org/jira/browse/SOLR-1853

2010/3/27 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 please create a bug



Drill down a solr result set by facets

2010-03-29 Thread Dhanushka Samarakoon
Hi,

I'm trying to perform a search based on keywords and then reduce the result
set based on facets that user selects.
First query for a search would look like this.

http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH

In the above query (as per dismax on the solr config file) it searches
multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc...

Then if user select 'Chemistry' from the facet field 'fDepartmentName'  and
'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the
result set above to only records from where fDepartmentName is 'Chemistry'
and 'fSponsor' is 'US Cancer/Diabetic Research Institute'
The following query is not working.
select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic
Research Instituteversion=2.2

Fields starting with 'f' are defined in the schema.xml as copy fields.
   field name=DepartmentName type=text indexed=true stored=true
multiValued=true /
   field name=fDepartmentName type=string indexed=true stored=false
multiValued=true /
   copyField source=DepartmentName dest=fDepartmentName/

Any ideas on the correct syntax?

Thanks,
Dhanushka.


RE: One item, multiple fields, and range queries

2010-03-29 Thread Steven A Rowe
David,

The standard one-to-many solution is indexing each address (the many) as its 
own document, and then either copy the other fields from your current schema to 
these documents, or index using a heterogeneous field schema, grouping the 
different doc type instances with a unique key (the one) to form a composite 
doc.  (These solutions address your discomfort with a single address field.)

Also, while you say that you don't have a hierarchy, I think you do; what you 
have described could be expressed in XML as:

doc
  field1.../field1
  ...
  addresses
address id=1
  street.../street
  city.../city
  state.../state
  zip.../zip
/address
address id=2
  street.../street
  city.../city
  state.../state
  zip.../zip
/address
...
  /addresses
/doc

I believe you could use the scheme I described on the other thread, using a 
single address field, if you encoded it like so:

  _ADDRESS_ _STREET_ 12 Main Street _CITY_ Metripilos _STATE_ MZ _ZIP_ 0
  _ADDRESS_ _STREET_ 512 23rd Avenue _CITY_ Carmtwon _STATE_ XB _ZIP_ 1
  ...

Then to find the docs associated with Carmtwon, XB:

SpanNot
  Include
SpanOr
  SpanNear slop=2147483647 inOrder=true
SpanTerm_CITY_/SpanTerm
SpanTermCarmtwon/SpanTerm
SpanTerm_STATE_/SpanTerm
SpanTermXB/SpanTerm
  /SpanNear
SpanOr
  /Include
  Exclude
SpanTerm_ADDRESS_/SpanTerm
  /Exclude
/SpanNot

Steve

On 03/29/2010 at 9:11 AM, David Smiley (@MITRE.org) wrote:
 
 Sorry, I intended to design my post so that one wouldn't have to read
 the thread for context but it seems I failed to do that.  Don't bother
 reading the thread.  The use-case I'm pondering modifying Lucene/Solr to
 solve is the one-to-many problem.  Imagine a document that contains
 multiple addresses where each field of an address (like street, state,
 zipcode) go in different multi-valued fields.  The main difficulty is
 considering how Lucene might be modified to have query results across
 different fields be intersected by a matching term position offset
 (which is designed in these fields to refer to a known value offset).
 
 Following the link you gave is interesting though the general case I'm
 talking about doesn't have a hierarchy.  And I find the use of a single
 multi-valued field unpalatable for a variety of reasons.
 
 ~ David Smiley
 
 -
  Author: https://www.packtpub.com/solr-1-4-enterprise-search-
 server/book -- View this message in context:
 http://n3.nabble.com/One-item-multiple-
 fields-and-range-queries-tp475030p683361.html Sent from the Solr - User
 mailing list archive at Nabble.com.




Re: Drill down a solr result set by facets

2010-03-29 Thread Tommy Chheng

 Try adding quotes to your query:

DepartmentName:Chemistry+fSponsor:\US Cancer/Diabetic Research Institute\


 The parser will split on whitespace

Tommy Chheng
Programmer and UC Irvine Graduate Student
Twitter @tommychheng
http://tommy.chheng.com


On 3/29/10 8:49 AM, Dhanushka Samarakoon wrote:

Hi,

I'm trying to perform a search based on keywords and then reduce the result
set based on facets that user selects.
First query for a search would look like this.

http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH

In the above query (as per dismax on the solr config file) it searches
multiple fields such as GrantTitle, DepartmentName, InvestigatorName, etc...

Then if user select 'Chemistry' from the facet field 'fDepartmentName'  and
'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce the
result set above to only records from where fDepartmentName is 'Chemistry'
and 'fSponsor' is 'US Cancer/Diabetic Research Institute'
The following query is not working.
select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic
Research Instituteversion=2.2

Fields starting with 'f' are defined in the schema.xml as copy fields.
field name=DepartmentName type=text indexed=true stored=true
multiValued=true /
field name=fDepartmentName type=string indexed=true stored=false
multiValued=true /
copyField source=DepartmentName dest=fDepartmentName/

Any ideas on the correct syntax?

Thanks,
Dhanushka.



Re: Drill down a solr result set by facets

2010-03-29 Thread Dhanushka Samarakoon
Thanks for the reply. I was just giving the above as an example.
Something as simple as following is also not working.
/select/?q=france+fDepartmentName:Historyversion=2.2

So it looks like the query parameter syntax I'm using is wrong.
This is the params array I'm getting from the result.
lst name=params
str name=rows10/str
str name=start0/str
str name=indenton/str
str name=qkansas fDepartmentName:History/str
str name=qtdismax/str
str name=version2.2/str
/lst

On Mon, Mar 29, 2010 at 10:59 AM, Tommy Chheng tommy.chh...@gmail.comwrote:

  Try adding quotes to your query:

 DepartmentName:Chemistry+fSponsor:\US Cancer/Diabetic Research Institute\


  The parser will split on whitespace

 Tommy Chheng
 Programmer and UC Irvine Graduate Student
 Twitter @tommychheng
 http://tommy.chheng.com



 On 3/29/10 8:49 AM, Dhanushka Samarakoon wrote:

 Hi,

 I'm trying to perform a search based on keywords and then reduce the
 result
 set based on facets that user selects.
 First query for a search would look like this.


 http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH

 In the above query (as per dismax on the solr config file) it searches
 multiple fields such as GrantTitle, DepartmentName, InvestigatorName,
 etc...

 Then if user select 'Chemistry' from the facet field 'fDepartmentName'
  and
 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce
 the
 result set above to only records from where fDepartmentName is 'Chemistry'
 and 'fSponsor' is 'US Cancer/Diabetic Research Institute'
 The following query is not working.
 select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US
 Cancer/Diabetic
 Research Instituteversion=2.2

 Fields starting with 'f' are defined in the schema.xml as copy fields.
field name=DepartmentName type=text indexed=true stored=true
 multiValued=true /
field name=fDepartmentName type=string indexed=true
 stored=false
 multiValued=true /
copyField source=DepartmentName dest=fDepartmentName/

 Any ideas on the correct syntax?

 Thanks,
 Dhanushka.




Re: Drill down a solr result set by facets

2010-03-29 Thread Indika Tantrigoda
Hi Dhanushka,

Have you tried to use the filter query parameter.
Check out this article, the Applying Constraints section should be helpful
to you.
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr

Solr Wiki link to filter query parameter
http://wiki.apache.org/solr/CommonQueryParameters#fq

I am at the moment implementing a similar system where the user needs to
drill down to the data.
What I am doing now is say if the user selects Chemistry from the facet I
request a query with
the filter query applied to fDepartmentName and when the user selects US
Cancer/Diabetic Research Institute
from the fSponsor facet I will apply filter querying to both the
fDepartmentName and fSponsor.

Hope this helps.

Regards,
Indika

On 29 March 2010 21:19, Dhanushka Samarakoon dhan...@gmail.com wrote:

 Hi,

 I'm trying to perform a search based on keywords and then reduce the result
 set based on facets that user selects.
 First query for a search would look like this.


 http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH

 In the above query (as per dismax on the solr config file) it searches
 multiple fields such as GrantTitle, DepartmentName, InvestigatorName,
 etc...

 Then if user select 'Chemistry' from the facet field 'fDepartmentName'  and
 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce
 the
 result set above to only records from where fDepartmentName is 'Chemistry'
 and 'fSponsor' is 'US Cancer/Diabetic Research Institute'
 The following query is not working.
 select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US Cancer/Diabetic
 Research Instituteversion=2.2

 Fields starting with 'f' are defined in the schema.xml as copy fields.
   field name=DepartmentName type=text indexed=true stored=true
 multiValued=true /
   field name=fDepartmentName type=string indexed=true stored=false
 multiValued=true /
   copyField source=DepartmentName dest=fDepartmentName/

 Any ideas on the correct syntax?

 Thanks,
 Dhanushka.



Re: Drill down a solr result set by facets

2010-03-29 Thread Dhanushka Samarakoon
Thanks Indika, that looks good. I'll look at the article.
If anyone else has any good ideas please send them too.

On Mon, Mar 29, 2010 at 11:09 AM, Indika Tantrigoda indik...@gmail.comwrote:

 Hi Dhanushka,

 Have you tried to use the filter query parameter.
 Check out this article, the Applying Constraints section should be helpful
 to you.

 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr

 Solr Wiki link to filter query parameter
 http://wiki.apache.org/solr/CommonQueryParameters#fq

 I am at the moment implementing a similar system where the user needs to
 drill down to the data.
 What I am doing now is say if the user selects Chemistry from the facet I
 request a query with
 the filter query applied to fDepartmentName and when the user selects US
 Cancer/Diabetic Research Institute
 from the fSponsor facet I will apply filter querying to both the
 fDepartmentName and fSponsor.

 Hope this helps.

 Regards,
 Indika


 On 29 March 2010 21:19, Dhanushka Samarakoon dhan...@gmail.com wrote:

 Hi,

 I'm trying to perform a search based on keywords and then reduce the
 result
 set based on facets that user selects.
 First query for a search would look like this.


 http://localhost:8983/solr/select/?q=cancer+stemversion=2.2wt=phpstart=rows=10indent=onqt=dismaxfacet=onfacet.mincount=1facet.field=fDepartmentNamefacet.field=fInvestigatorNamefacet.field=fSponsorfacet.date=DateAwardedfacet.date.start=2009-01-01T00:00:00Zfacet.date.end=2010-01-01T00:00:00Zfacet.date.gap=%2B1MONTH

 In the above query (as per dismax on the solr config file) it searches
 multiple fields such as GrantTitle, DepartmentName, InvestigatorName,
 etc...

 Then if user select 'Chemistry' from the facet field 'fDepartmentName'
  and
 'US Cancer/Diabetic Research Institute' from 'fSponsor' I need to reduce
 the
 result set above to only records from where fDepartmentName is 'Chemistry'
 and 'fSponsor' is 'US Cancer/Diabetic Research Institute'
 The following query is not working.
 select/?q=cancer+stem+fDepartmentName:Chemistry+fSponsor:US
 Cancer/Diabetic
 Research Instituteversion=2.2

 Fields starting with 'f' are defined in the schema.xml as copy fields.
   field name=DepartmentName type=text indexed=true stored=true
 multiValued=true /
   field name=fDepartmentName type=string indexed=true
 stored=false
 multiValued=true /
   copyField source=DepartmentName dest=fDepartmentName/

 Any ideas on the correct syntax?

 Thanks,
 Dhanushka.





Re: Filter query with special character using SolrJ client

2010-03-29 Thread Chris Hostetter

: Since the names of the string fields are not predefined I might have to
: find a method to do this automatically.

if the fields are strings, and you are only looking for exact matches 
(ie: you don't need any special query parser syntax) use the field 
QParser

:  SolrQuery.addFilterQuery(yourStringField:Cameras\\  Photos)

solrQuery.addFilterQuery({!field f=yourStringField}Cameras  Photos)


-Hoss



Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread MitchK

Hello guys,

my analysis.jsp shows me the right results. That means, everything seems to
be parsed the right way and there are some matches.

However, when I try this live, there are never any matched documents. When I
try out to look up whether there is anything in my index, I get the expected
result - everything is indexed. 

What am I doing wrong here?

An example looks like:
select/?indent=ondebugQuery=onq=introductionstart=0rows=10

The result looks like:
---
response
−
lst name=responseHeader
int name=status0/int
int name=QTime16/int
−
lst name=params
str name=debugQueryon/str
str name=indenton/str
str name=start0/str
str name=qintroduction/str
str name=rows10/str
/lst
/lst
result name=response numFound=0 start=0/
−
lst name=debug
str name=rawquerystringintroduction/str
str name=querystringintroduction/str
str name=parsedquerytitle:introduction/str
str name=parsedquery_toStringtitle:introduction/str
lst name=explain/
str name=QParserLuceneQParser/str
−
lst name=timing
double name=time0.0/double
−
lst name=prepare
double name=time0.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
−
lst name=process
double name=time0.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response


Thank you!
-- 
View this message in context: 
http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p683866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filter query with special character using SolrJ client

2010-03-29 Thread Indika Tantrigoda
It works, thanks. Just implemented the code...:):):)

Could you explain what {!field f=yourStringField}Cameras  Photos does.

Regards,
Indika


On 29 March 2010 21:55, Chris Hostetter hossman_luc...@fucit.org wrote:


 : Since the names of the string fields are not predefined I might have to
 : find a method to do this automatically.

 if the fields are strings, and you are only looking for exact matches
 (ie: you don't need any special query parser syntax) use the field
 QParser

 :  SolrQuery.addFilterQuery(yourStringField:Cameras\\  Photos)

 solrQuery.addFilterQuery({!field f=yourStringField}Cameras  Photos)


 -Hoss




Re: Filter query with special character using SolrJ client

2010-03-29 Thread Chris Hostetter

: It works, thanks. Just implemented the code...:):):)
: 
: Could you explain what {!field f=yourStringField}Cameras  Photos does.

{!field} says that the string should be parsed using the FIeldQParser.  
the FieldQParser takes an 'f' local param telling it what field you want 
to use, and the rest of the string is the exact value you want to 
passed to the analyzer for thet field 'f'  ... it's a query parser that 
supports no markup of any kind, and only produces basic 
PhraseQueries or TermQueries (there's also the raw QParser for when you 
absolutely know you only want TermQueries) ...

http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers
http://lucene.apache.org/solr/api/org/apache/solr/search/FieldQParserPlugin.html


-Hoss



Re: jmap output help

2010-03-29 Thread Siddhant Goel
Gentle bounce

On Sun, Mar 28, 2010 at 11:31 AM, Siddhant Goel siddhantg...@gmail.comwrote:

 Hi everyone,

 The output of jmap -histo:live 27959 | head -30 is something like the
 following :

 num #instances #bytes  class name
 --
1:448441  180299464  [C
2:  5311  135734480  [I
3:  3623   68389720  [B
4:445669   17826760  java.lang.String
5:391739   15669560  org.apache.lucene.index.TermInfo
6:417442   13358144  org.apache.lucene.index.Term
7: 587675171496
  org.apache.lucene.index.FieldsReader$LazyField
8: 329025049760  constMethodKlass
9: 329023955920  methodKlass
   10:  28433512688  constantPoolKlass
   11:  23973128048  [Lorg.apache.lucene.index.Term;
   12:353053592  [J
   13: 33044288  [Lorg.apache.lucene.index.TermInfo;
   14: 556712707536  symbolKlass
   15: 272822701352  [Ljava.lang.Object;
   16:  28432212384  instanceKlassKlass
   17:  23432132224  constantPoolCacheKlass
   18: 264241056960  java.util.ArrayList
   19: 164231051072  java.util.LinkedHashMap$Entry
   20:  20391028944  methodDataKlass
   21: 14336 917504  org.apache.lucene.document.Field
   22: 29587 710088  java.lang.Integer
   23:  3171 583464  java.lang.Class
   24:   813 492880  [Ljava.util.HashMap$Entry;
   25:  8471 474376  org.apache.lucene.search.PhraseQuery
   26:  4184 402848  [[I
   27:  4277 380704  [S

 Is it ok to assume that the top 3 entries (character/integer/byte arrays)
 are referring to the entries inside the solr cache?

 Thanks,


 --
 - Siddhant




-- 
- Siddhant


Re: Filter query with special character using SolrJ client

2010-03-29 Thread Indika Tantrigoda
Thank you very much for the explanation.

Regards,
Indika

On 29 March 2010 22:28, Chris Hostetter hossman_luc...@fucit.org wrote:


 : It works, thanks. Just implemented the code...:):):)
 :
 : Could you explain what {!field f=yourStringField}Cameras  Photos does.

 {!field} says that the string should be parsed using the FIeldQParser.
 the FieldQParser takes an 'f' local param telling it what field you want
 to use, and the rest of the string is the exact value you want to
 passed to the analyzer for thet field 'f'  ... it's a query parser that
 supports no markup of any kind, and only produces basic
 PhraseQueries or TermQueries (there's also the raw QParser for when you
 absolutely know you only want TermQueries) ...


 http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers

 http://lucene.apache.org/solr/api/org/apache/solr/search/FieldQParserPlugin.html


 -Hoss




Re: jmap output help

2010-03-29 Thread Bill Au
Take a heap dump and use jhat to find out for sure.

Bill

On Mon, Mar 29, 2010 at 1:03 PM, Siddhant Goel siddhantg...@gmail.comwrote:

 Gentle bounce

 On Sun, Mar 28, 2010 at 11:31 AM, Siddhant Goel siddhantg...@gmail.com
 wrote:

  Hi everyone,
 
  The output of jmap -histo:live 27959 | head -30 is something like the
  following :
 
  num #instances #bytes  class name
  --
 1:448441  180299464  [C
 2:  5311  135734480  [I
 3:  3623   68389720  [B
 4:445669   17826760  java.lang.String
 5:391739   15669560  org.apache.lucene.index.TermInfo
 6:417442   13358144  org.apache.lucene.index.Term
 7: 587675171496
   org.apache.lucene.index.FieldsReader$LazyField
 8: 329025049760  constMethodKlass
 9: 329023955920  methodKlass
10:  28433512688  constantPoolKlass
11:  23973128048  [Lorg.apache.lucene.index.Term;
12:353053592  [J
13: 33044288  [Lorg.apache.lucene.index.TermInfo;
14: 556712707536  symbolKlass
15: 272822701352  [Ljava.lang.Object;
16:  28432212384  instanceKlassKlass
17:  23432132224  constantPoolCacheKlass
18: 264241056960  java.util.ArrayList
19: 164231051072  java.util.LinkedHashMap$Entry
20:  20391028944  methodDataKlass
21: 14336 917504  org.apache.lucene.document.Field
22: 29587 710088  java.lang.Integer
23:  3171 583464  java.lang.Class
24:   813 492880  [Ljava.util.HashMap$Entry;
25:  8471 474376  org.apache.lucene.search.PhraseQuery
26:  4184 402848  [[I
27:  4277 380704  [S
 
  Is it ok to assume that the top 3 entries (character/integer/byte arrays)
  are referring to the entries inside the solr cache?
 
  Thanks,
 
 
  --
  - Siddhant
 



 --
 - Siddhant



Re: ReplicationHandler reports incorrect replication failures

2010-03-29 Thread Jason Rutherglen
Shawn,

I was working on something very similar... Lets perhaps also create a
Jira issue for this monitoring?

Thanks,

Jason

On Fri, Mar 26, 2010 at 6:59 AM, Shawn Smith ssmit...@gmail.com wrote:
 We're using Solr 1.4 Java replication, which seems to be working
 nicely.  While writing production monitors to check that replication
 is healthy, I think we've run into a bug in the status reporting of
 the ../solr/replication?command=details command.  (I know it's
 experimental...)

 Our monitor parses the replication?command=details XML and checks that
 replication lag is reasonable by diffing the indexVersion of the
 master and slave indices to make sure it's within a reasonable time
 range.

 Our monitor also compares the first elements of
 indexReplicatedAtList and replicationFailedAtList lists to see if
 the last replication attempt failed.  This is where we're having a
 problem with the monitor throwing false errors.  It looks like there's
 a bug that causes successful replications to be considered failures.
 The bug is triggered immediately after a slave restarts when the slave
 is already in sync with the master.  Each no-op replication attempt
 after restart is considered a failure until something on the master
 changes and replication has to actually do work.

 From the code, it looks like SnapPuller.successfulInstall starts out
 false on restart.  If the slave starts out in sync with the master,
 then each no-op replication poll leaves successfulInstall set to
 false which makes SnapPuller.logReplicationTimeAndConfFiles log the
 poll as a failure.  SnapPuller.successfulInstall stays false until the
 first time replication actually has to do something, at which point it
 gets set to true, and then everything is OK.

 Thanks,
 Shawn



RE: keyword query tokenizer

2010-03-29 Thread Jason Chaffee
Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to
do.  I do not max scoring field per word.  Instead, I want it per
phrase which may be a single word or multiple words.


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, March 26, 2010 10:35 PM
To: solr-user@lucene.apache.org
Subject: Re: keyword query tokenizer

: 
: I am curious as to why the query parser does any tokenizing?  I would
think
: you would want control/configure this with your analyzers?
: 
: Does anyone know the answer to this. Is there a performance gain or
something?

it's not about performance, it's about hte query parser syntax.

whitespace is markup as far as the query parser is concerned -- just 
like +,-, etc.. whitespace characters are instructions for the query 
parsers.  

Essentially: unquoted whitespace is the markup that tells the query
parser 
to create an OR query out of the chunks of input on either side of
hte 
space (+ signifies MUST, - signifies PROHIBITED, but there is no markup
to 
signify SHOULD)

Also: if the query parser didn't chunk on whitespace queries like
this...

aWord aField:anotherWord

...wouldn't work in the standard query parser.  

You may think but i'm using dismax, why does dismax need to worry about

that? but the key to remember there is that if dismax didn't split on 
whitespace prior to analysis, it wouldn't be able to build the 
DisjunctionMaxQuery's that it uses to find the max scoring field per 
word (which is the whole point of hte parser).



-Hoss



RE: keyword query tokenizer

2010-03-29 Thread Chris Hostetter

: Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to
: do.  I do not max scoring field per word.  Instead, I want it per
: phrase which may be a single word or multiple words.

then you need to quote your enitre q param. (or escape all the white 
space and meta characters)

: You may think but i'm using dismax, why does dismax need to worry about
: 
: that? but the key to remember there is that if dismax didn't split on 
: whitespace prior to analysis, it wouldn't be able to build the 
: DisjunctionMaxQuery's that it uses to find the max scoring field per 
: word (which is the whole point of hte parser).


-Hoss



Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread Chris Hostetter

: my analysis.jsp shows me the right results. That means, everything seems to
: be parsed the right way and there are some matches.

analysis.jsp can tell you that *if* a document is indexed with the current 
config, then what will the tokens look like -- but it doesn't know if 
there are any documents in your index, or if you changed hte ocnfig after 
indexing.

what does /select?q=*:*  return?
how about /admin/luke?fl=title   ?

: select/?indent=ondebugQuery=onq=introductionstart=0rows=10
...
: str name=parsedquerytitle:introduction/str

... i assume title is in fact the field you expect introduction to 
match on?

what does your schema.xml look like?, etc...

http://wiki.apache.org/solr/UsingMailingLists


-Hoss



RE: keyword query tokenizer

2010-03-29 Thread Jason Chaffee
I didn't know the quotes would work.  I thought it had to be escaped and
I wasn't too fond of that because you have to unescape in the analysis
phase.  Using quotes doesn't seem so bad to me.

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Monday, March 29, 2010 11:16 AM
To: solr-user@lucene.apache.org
Subject: RE: keyword query tokenizer


: Ahh, but that is exactly what I don't want the DisjunctionMaxQuery to
: do.  I do not max scoring field per word.  Instead, I want it per
: phrase which may be a single word or multiple words.

then you need to quote your enitre q param. (or escape all the white 
space and meta characters)

: You may think but i'm using dismax, why does dismax need to worry
about
: 
: that? but the key to remember there is that if dismax didn't split on

: whitespace prior to analysis, it wouldn't be able to build the 
: DisjunctionMaxQuery's that it uses to find the max scoring field per 
: word (which is the whole point of hte parser).


-Hoss



Getting /handlers from response and dynamically removing them

2010-03-29 Thread Jon Baer
This is just something that seems to come up now and then ...

* - Id like to write a last-component which does something specific for a 
particular declared handler /handler1 for example and there is no way to 
determine which handler it came from @ the moment (or can it?)
* - It would be nice if there was someway to dynamically update 
(enable/disable) handlers on the fly, specifically update handlers, Id imagine 
something working like the way logging currently is laid out in the admin.

Any thoughts on these 2?

- Jon

negative boost

2010-03-29 Thread Jason Chaffee
Is it possible to give a negative in boost in dismax?  For instance,

 

field1^3 field2^0 field3^-0.1

 

Thanks,

 

Jason



Re: Getting /handlers from response and dynamically removing them

2010-03-29 Thread Erik Hatcher

You can get the qt parameter, at least, in your search component.

What's the use case for controlling handlers enabled flag on the fly?

Erik


On Mar 29, 2010, at 3:02 PM, Jon Baer wrote:


This is just something that seems to come up now and then ...

* - Id like to write a last-component which does something specific  
for a particular declared handler /handler1 for example and there is  
no way to determine which handler it came from @ the moment (or can  
it?)
* - It would be nice if there was someway to dynamically update  
(enable/disable) handlers on the fly, specifically update handlers,  
Id imagine something working like the way logging currently is laid  
out in the admin.


Any thoughts on these 2?

- Jon




field QParserPlugin - Help needed

2010-03-29 Thread Nair, Manas
 

Hello Experts,

Could anyone please help me by directing me to some link where I can get more 
details on Solr's field QParserPlugin.

I would be really grateful.

Thankyou all,

Manas





Re: field QParserPlugin - Help needed

2010-03-29 Thread Erik Hatcher

Manas,

The best you'll find is Solr's javadocs and source code itself.   
There's a bit on the wiki with the pointers: http://wiki.apache.org/solr/SolrPlugins#QParserPlugin


Erik


On Mar 29, 2010, at 3:25 PM, Nair, Manas wrote:




Hello Experts,

Could anyone please help me by directing me to some link where I can  
get more details on Solr's field QParserPlugin.


I would be really grateful.

Thankyou all,

Manas







RE: One item, multiple fields, and range queries

2010-03-29 Thread David Smiley (@MITRE.org)

I'm not going to index each address as its own document because the
one-side that I have currently has loads of text and there are many
addresses.  Furthermore, it doesn't really address the general case of my
problem statement.
I'm not sure what to make of or index using a heterogeneous field schema,
grouping the different doc type instances with a unique key (the one) to
form a composite doc
I could use the scheme you mention provided with the spanNear query but it
conflates different fields into one indexed field which will mess with the
scoring and make queries like range queries if there are dates involved next
to impossible.  This solution is really a hack workaround to a limitation
in Lucene/Solr.  I was hoping to start a conversation to a more truer
resolution to this problem rather than these workarounds which aren't always
satisfactory.

~ David Smiley

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684282.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: field QParserPlugin - Help needed

2010-03-29 Thread Ahmet Arslan



 Could anyone please help me by directing me to some link
 where I can get more details on Solr's field QParserPlugin.

Additionally Chris Hostetter's explanation:
http://search-lucene.com/m/ZKrXi2VX1st


  


Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread MitchK

Hoss,

thank you for your response.

/select?q=*:*
This returns results as expected. 

I have found the mistake, why introduction didn't match - a wrong copyfield.
*rolleyes*
However, this seems to bring more problems to the light: Now, the first few
rows from my database seem to be searchable, but the rest is not searchable.
The thing is, I have got two stored (as well as indexed) fields: ID and
title.
If I search for the ID of a document, which I can't find over its title, it
produces a match. If I search for the title, it returns nothing.

Is there any possibility to see, what is exactly indexed?
Luke seems to response wrong results... since it says, that life is one of
the most frequent terms (398 times) of my index, but if I search for life
(sounds great, doesn't it?) it responses only ONE match. 

select/?q=titleProcessed:livestart=0rows=10indent=on

Here is my schema.xml:
Please, notice that I have done a modification: titleProcessed means the
same as title from my first post. The mistake is NOT that title is now a
string-type. 

field name=title type=string indexed=true stored=true/
field name=synonymTitle type=Synonym indexed=true stored=false/
field name=titleProcessed type=text indexed=true stored=false/



copyField source=title dest=titleProcessed/ 
copyField source=title dest=titleSynonym/ 
copyField source=title dest=titleProcessed/ 

fieldType name=Synonym class = solr.TextField
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory 
synonyms=Synonyms.txt
ignoreCase=true expand=true/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 
generateNumberParts=1 
catenateWords=0 
catenateNumbers=0 
catenateAll=0 
splitOnCaseChange=1/
  /analyzer
/fieldType

fieldType name=text class=solr.TextField 
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

---
May there be a problem, because the fields are already tokenized???

Kind regards
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684344.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: negative boost

2010-03-29 Thread Paul Libbrecht

Jason,

don't you want
  field1^3 • field2^1 • field3^0.9
?
As written in Lucene in action, it's all multiplied.
So negative boost means boost under 1
(and probably elsewhere)

paul

PS: take the log and you get this negative.



Le 29-mars-10 à 21:08, Jason Chaffee a écrit :


Is it possible to give a negative in boost in dismax?  For instance,



field1^3 field2^0 field3^-0.1



Thanks,



Jason





Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread MitchK

EDIT:
The shown query was not the ment one,... please, excuse me, I have tested a
lot and I am a little bit confused :-).

The right query is, of course:

select/?q=titleProcessed:lifestart=0rows=10indent=on 
-- 
View this message in context: 
http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684350.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: One item, multiple fields, and range queries

2010-03-29 Thread Steven A Rowe
Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
 I'm not sure what to make of or index using a heterogeneous field
 schema, grouping the different doc type instances with a unique key
 (the one) to form a composite doc

Lucene is schema-free - you can mix and match different document types in a 
single index.  You could emulate this in Solr by merging the two document types 
and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type: 
Field: Unique-key
Field: Street
Field: City
...

Everything-else-doc-type:
Field: Unique-key
Field: Blob-o'-text
...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...


 I could use the scheme you mention provided with the spanNear query but
 it conflates different fields into one indexed field which will mess
 with the scoring and make queries like range queries if there are dates
 involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use 
cases where the attendant scoring distortion would be acceptable, e.g. 
non-scoring filters.  (Stuffing a variable number of addresses into a single 
document will also mess with the scoring unless you turn off norms, which is 
of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery 
implementations on the mailing lists, so range queries likely wouldn't be a 
problem for long, should it become a general issue.

 This solution is really a hack workaround to a limitation in
 Lucene/Solr.  I was hoping to start a conversation to a more
 truer resolution to this problem rather than these workarounds
 which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.  

Solutions:
1. Hack workaround
2. Rewrite Solr/Lucene to be a database
3. ? (fill in more truer resolution here)

Good luck,
Steve



Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread Erick Erickson
Perhaps a silly question, but did you recreate your index after you made
your schema changes? Or did you delete a bunch of documents in the meantime?
Or do you have a unique key defined in your schema that is replacing
documents? The fact that Luke is giving you unexpected results is a red flag
that your index isn't in the state you *think* it's in

Best
Erick

On Mon, Mar 29, 2010 at 1:13 PM, MitchK mitc...@web.de wrote:


 EDIT:
 The shown query was not the ment one,... please, excuse me, I have tested a
 lot and I am a little bit confused :-).

 The right query is, of course:

 select/?q=titleProcessed:lifestart=0rows=10indent=on
 --
 View this message in context:
 http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684350.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread MitchK

I was using this page:
 solr/admin/dataimport.jsp?handler=/dataimport
To import my data from my database.
I have made a few restarts of my Solr-server and I have re-imported the data
a lot of times.
Furthermore, I have tried to delete everything with the help of the post.jar
from the tutorial.
I have recognized that it deletes only a few thousands of documents, instead
of emptying the whole index.
This was the last thing I've done. Now I am reindexing again.

I have got a unique id - called ID, it is the primary key of my
database-table.
Perharps I am missunderstanding your post, but what do you mean with a
unique key that is replacing documents? 

Thank you
- Mitch
-- 
View this message in context: 
http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684387.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting /handlers from response and dynamically removing them

2010-03-29 Thread Jon Baer
Thanks for the qt tip, I will try that.

Im building a Solr installation as a small standalone and Id like to disable 
everything but the /select after an import has been completed.  In normal 
situations just the master would be setup to index and the slaves are read but 
in this case I need to allow imports on a standalone w/ a small index and allow 
updates only when the handler is enabled.  

Also, its not possible currently to reload a handler w/o a restart correct?

- Jon

On Mar 29, 2010, at 3:22 PM, Erik Hatcher wrote:

 You can get the qt parameter, at least, in your search component.
 
 What's the use case for controlling handlers enabled flag on the fly?
 
   Erik
 
 
 On Mar 29, 2010, at 3:02 PM, Jon Baer wrote:
 
 This is just something that seems to come up now and then ...
 
 * - Id like to write a last-component which does something specific for a 
 particular declared handler /handler1 for example and there is no way to 
 determine which handler it came from @ the moment (or can it?)
 * - It would be nice if there was someway to dynamically update 
 (enable/disable) handlers on the fly, specifically update handlers, Id 
 imagine something working like the way logging currently is laid out in the 
 admin.
 
 Any thoughts on these 2?
 
 - Jon
 



Re: Getting /handlers from response and dynamically removing them

2010-03-29 Thread Chris Hostetter

: Also, its not possible currently to reload a handler w/o a restart correct?

There are methods that can be used to dynamicly add/remove handlers from 
SolrCore -- but there are no built in adminstrtive commands to do so.


-Hoss



RE: One item, multiple fields, and range queries

2010-03-29 Thread David Smiley (@MITRE.org)

Steven,

The composite doc idea is an interesting avenue to a solution here that I 
didn't think of.  What's missing is code to do the group by and then do an 
intersection in order to get boolean AND behavior between the addresses and 
primary documents, and  then filter out the non-primary documents.  Perhaps 
Solr's popular field-collapsing patch would be a starting point.

I realize of course that Lucene/Solr isn't a database but there is plenty of 
gray area in-between.

Did you read my original message where I suggested perhaps a solution might lie 
in intersecting different queries based on common multi-value field offsets 
derived from matching term positions?  I have no idea how far off the current 
codebase is to exposing enough information to make such an approach possible.

~ David Smiley

From: Steven A Rowe [via Lucene] 
[mailto:ml-node+684371-1863547009-13...@n3.nabble.com]
Sent: Monday, March 29, 2010 4:29 PM
To: Smiley, David W.
Subject: RE: One item, multiple fields, and range queries

Hi David,

On 03/29/2010 at 3:36 PM, David Smiley (@MITRE.org) wrote:
 I'm not sure what to make of or index using a heterogeneous field
 schema, grouping the different doc type instances with a unique key
 (the one) to form a composite doc

Lucene is schema-free - you can mix and match different document types in a 
single index.  You could emulate this in Solr by merging the two document types 
and leaving blank the parts that are inapplicable to a given instance.  E.g.:

Address-doc-type:
Field: Unique-key
Field: Street
Field: City
...

Everything-else-doc-type:
Field: Unique-key
Field: Blob-o'-text
...

Doc1: Unique-key: 1; Blob-o'-text: blobbedy-blah-blob; ...
Doc2: Unique-key: 1; Street: 12 Main St; City: Somewheresville; ...
Doc3: Unique-key: 1; Street: 243 13th St; City: Bogdownton; ...


 I could use the scheme you mention provided with the spanNear query but
 it conflates different fields into one indexed field which will mess
 with the scoring and make queries like range queries if there are dates
 involved next to impossible.

I agree, dimensional reduction can be an issue, though I'm sure there are use 
cases where the attendant scoring distortion would be acceptable, e.g. 
non-scoring filters.  (Stuffing a variable number of addresses into a single 
document will also mess with the scoring unless you turn off norms, which is 
of course another form of scoring-messing.)

I've seen a couple of different mentions of private SpanRangeQuery 
implementations on the mailing lists, so range queries likely wouldn't be a 
problem for long, should it become a general issue.

 This solution is really a hack workaround to a limitation in
 Lucene/Solr.  I was hoping to start a conversation to a more
 truer resolution to this problem rather than these workarounds
 which aren't always satisfactory.

Limitation: Solr/Lucene is not a database.

Solutions:
1. Hack workaround
2. Rewrite Solr/Lucene to be a database
3. ? (fill in more truer resolution here)

Good luck,
Steve



View message @ 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684371.html
To unsubscribe from RE: One item, multiple fields, and range queries, click 
here (link removed) ==.



-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: 
http://n3.nabble.com/One-item-multiple-fields-and-range-queries-tp475030p684415.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread MitchK

Luke is responsing (now):
My topTerms of synonyms got a frequency of up to 800.000
and my processed title gots a maximum frequency of 7... 
What the hell???

However, I can't search any of the top synonyms.
I am able to search within the first 55 documents of my index. 

What might be wrong, when analysis.jsp shows the right results, but the
real-index does not?
-- 
View this message in context: 
http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684418.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataimporthandler multivalued dynamic fields

2010-03-29 Thread brad anderson
Greetings,

I'm trying to use dataimporthandler to load values from a db and trying to
put them into multivalued dynamic fields. It appears to work for the first
value, but does not add all the values to the field.

Here is the schema definition of the *_custom fields:
fieldType name=text_ws class=solr.TextField positionIncrementGap=100
 analyzer type = index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class = solr.LowerCaseFilterFactory/
 /analyzer
analyzer type=query
tokenizer class = solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
   /fieldType
dynamicField name=*_custom  type=text_ws  indexed=true stored=true
multiValued=true termVectors=true/


Here is my data-config.xml file:
entity name=usr
query=select * from user n
field name=id column=uid /
field name=givenname column=givenname /
field name=lastname column=lastname /
field name=nickname column=nickname /
field name=zipcode column=zipcode /
field name=city column=city /
field name=country column=country /
field name=site column=site /
field name=state column=state /
field name=role column=role /
field name=companygroup column=companygroup /
field name=personalinfo column=personalinfo /
field name=email column=email /

entity name=customattr
query=select m.dir_attr, m.attr_id from
pulse_custom_attribute_metadata m
entity name=customvalue
   query=select value from
pulse_custom_attribute_values v where v.user_id='${usr.uid}' and
v.attr_id=${customattr.attr_id}
`   field name=${customattr.dir_attr}_custom
column=value /
/entity
/entity
/entity

Does anyone know why its only importing one of the values from the db, as
opposed to all of them.

Thanks,
Brad


RE: negative boost

2010-03-29 Thread Jason Chaffee
Unfortunately, my results aren't quite what I want unless I use 0 on the second 
field.  Instead, if something matches in all the fields it is elevated to the 
top.  I only want the first field match elevated to the top and I want all 
first field matches to have the same weight.  Next, I want all field2 matches 
to have the same weight, and finally, I want all field3 matches to have the 
same weight.  But I want field1 matches to be at the top, then field 2, and 
finally field3.  I don't care if the term is all three fields or not.

Does this make sense?

-Original Message-
From: Paul Libbrecht [mailto:p...@activemath.org] 
Sent: Monday, March 29, 2010 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: negative boost

Jason,

don't you want
   field1^3 * field2^1 * field3^0.9
?
As written in Lucene in action, it's all multiplied.
So negative boost means boost under 1
(and probably elsewhere)

paul

PS: take the log and you get this negative.



Le 29-mars-10 à 21:08, Jason Chaffee a écrit :

 Is it possible to give a negative in boost in dismax?  For instance,



 field1^3 field2^0 field3^-0.1



 Thanks,



 Jason




RE: One item, multiple fields, and range queries

2010-03-29 Thread Steven A Rowe
Hi David,

On 03/29/2010 at 4:54 PM, David Smiley (@MITRE.org) wrote:
 Did you read my original message where I suggested perhaps a solution
 might lie in intersecting different queries based on common multi-value
 field offsets derived from matching term positions?  I have no idea how
 far off the current codebase is to exposing enough information to make
 such an approach possible.

AFAICT, your above-described solution addresses the one-to-many problem by 
representing multiple records within a single document via parallel arrays, one 
array per address-part field.  The parallel array alignment is effected via 
alignment of position increments.  What's missing from Solr/Lucene is the 
ability to constrain matches such that the position increment of all matching 
address-part fields is the same.

I suspect that the Flexible Indexing branch would allow a slightly less 
involved index usage pattern: you could add a new term attribute that 
explicitly represents the record index.  That way you wouldn't have to fiddle 
around with increment gaps and guess about maximum record size.

You still need to perform the equivalent of an SQL table join across the 
matching address-part fields (in addition to any non-address constraints), 
using parallel array index equality as the join predicate.  I don't know how 
hard it would be to implement this, but you'd need to: add the ability to 
express this kind of constraint in the query language; make a new Similarity 
implementation that could handle it; and, if you go the route of adding a new 
record index term attribute, add a new postings codec that handles 
writing/reading it.

Steve



Re: Absolutely empty resultset regardless of what I am searching for

2010-03-29 Thread MitchK

I was using TermsComponent now to make sure, what is really indexed.

Well, one title-field has got only a few terms indexed (as I have mentioned
earlier: it is only saving up to 55 rows of the RDBMS), while the other
fields (which are based on the same filter, but with another
special-word.txt) indexes every term.
However, regardless which field I choose to search on, it makes no
difference: every line after the 55th is unsearchable. 

Any suggestions would be greate!

If I can't solve the problem, I will try to export the whole data as csv and
try it again, although I don't think that this will help, because the stored
fields store the expected values...
-- 
View this message in context: 
http://n3.nabble.com/Absolutely-empty-resultset-regardless-of-what-I-am-searching-for-tp683866p684679.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to create this highlighter behaviour

2010-03-29 Thread Joe Calderon
hello *,  ive been using the highlighter and been pretty happy with
its results, however theres an edge case im not sure how to fix

for query: amazing grace

the record matched and highlighted is
emamazing/em rendition of emamazing grace/em

is there any way to only highlight amazing grace without using phrase
queries, can i modify the highlighter components to only use terms
once and to favor contiguous sections?

i dont want to enforce phrase queries as sometimes i do want terms out
of order highlighter but i only want each term matched highlighted
once


does this make sense?


RE: negative boost

2010-03-29 Thread Chris Hostetter

: Unfortunately, my results aren't quite what I want unless I use 0 on the 
: second field.  Instead, if something matches in all the fields it is 
: elevated to the top.  I only want the first field match elevated to the 
: top and I want all first field matches to have the same weight.  Next, I 
: want all field2 matches to have the same weight, and finally, I want all 
: field3 matches to have the same weight.  But I want field1 matches to be 
: at the top, then field 2, and finally field3.  I don't care if the term 
: is all three fields or not.

try qf=field1^1+field2^100+field3^1tie=0

: Does this make sense?

it does, but it kind of defeats the point of dismax.  what i cited should 
help -- the key is to make the boosts vastly differnet scales, 
and eliminate the tiebreaker value.




-Hoss



RE: negative boost

2010-03-29 Thread Jason Chaffee
I understand that it defeats the reason for dismax, at least the
original reason for dismax.  However,  if I can do it this way without
having to write my own handler because I need to search multiple fields
and combine the results, then it is still preferable and thus another
way to leverage dismax.

Thanks for the tip.  I will try it.

Jason

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Monday, March 29, 2010 5:06 PM
To: solr-user@lucene.apache.org
Subject: RE: negative boost


: Unfortunately, my results aren't quite what I want unless I use 0 on
the 
: second field.  Instead, if something matches in all the fields it is 
: elevated to the top.  I only want the first field match elevated to
the 
: top and I want all first field matches to have the same weight.  Next,
I 
: want all field2 matches to have the same weight, and finally, I want
all 
: field3 matches to have the same weight.  But I want field1 matches to
be 
: at the top, then field 2, and finally field3.  I don't care if the
term 
: is all three fields or not.

try qf=field1^1+field2^100+field3^1tie=0

: Does this make sense?

it does, but it kind of defeats the point of dismax.  what i cited
should 
help -- the key is to make the boosts vastly differnet scales, 
and eliminate the tiebreaker value.




-Hoss



RE: negative boost

2010-03-29 Thread Jason Chaffee
I think the key was change the tie to 0.  I had it at 0.1.  Getting
exactly what I want now.

Big thanks for the help. 

-Original Message-
From: Jason Chaffee [mailto:jchaf...@ebates.com] 
Sent: Monday, March 29, 2010 5:20 PM
To: solr-user@lucene.apache.org
Subject: RE: negative boost

I understand that it defeats the reason for dismax, at least the
original reason for dismax.  However,  if I can do it this way without
having to write my own handler because I need to search multiple fields
and combine the results, then it is still preferable and thus another
way to leverage dismax.

Thanks for the tip.  I will try it.

Jason

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Monday, March 29, 2010 5:06 PM
To: solr-user@lucene.apache.org
Subject: RE: negative boost


: Unfortunately, my results aren't quite what I want unless I use 0 on
the 
: second field.  Instead, if something matches in all the fields it is 
: elevated to the top.  I only want the first field match elevated to
the 
: top and I want all first field matches to have the same weight.  Next,
I 
: want all field2 matches to have the same weight, and finally, I want
all 
: field3 matches to have the same weight.  But I want field1 matches to
be 
: at the top, then field 2, and finally field3.  I don't care if the
term 
: is all three fields or not.

try qf=field1^1+field2^100+field3^1tie=0

: Does this make sense?

it does, but it kind of defeats the point of dismax.  what i cited
should 
help -- the key is to make the boosts vastly differnet scales, 
and eliminate the tiebreaker value.




-Hoss



Re: Solrj doesn't tell if PDF was actually parsed by Tika

2010-03-29 Thread Lance Norskog
Thanks!

You can search for the document after you index it.

On Fri, Mar 26, 2010 at 1:55 AM, Abdelhamid  ABID aeh.a...@gmail.com wrote:
 Well done : https://issues.apache.org/jira/browse/SOLR-1847

 meanwhile, is there any workaround ?

 On 3/26/10, Lance Norskog goks...@gmail.com wrote:

 Please file a bug for this on the JIRA.

 https://issues.apache.org/jira/secure/Dashboard.jspa


 On Thu, Mar 25, 2010 at 7:21 AM, Abdelhamid  ABID aeh.a...@gmail.com
 wrote:
  Hi,
  When posting pdf files using solrj the only response we get from Solr is
  only server response status, but never know whether
  pdf was actually parsed or not, checking the log I found that some Tika
  wasn't able
  to succeed with some pdf files because of content nature (texts in images
  only) or are corrupted:
 
      25 mars 2010 14:54:07 org.apache.pdfbox.util.PDFStreamEngine
  processOperator
      INFO: unsupported/disabled operation: EI
 
      25 mars 2010 14:54:02 org.apache.pdfbox.filter.FlateFilter decode
      GRAVE: Stop reading corrupt stream
 
 
  The question is how can I catch these kinds of exceptions through Solrj ?
 
  --
  Elsadek
 




 --
 Lance Norskog
 goks...@gmail.com




 --
 Abdelhamid ABID
 Software Engineer- J2EE / WEB / ESB MULE




-- 
Lance Norskog
goks...@gmail.com


Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-29 Thread Lance Norskog
SOLR-1316 uses a much faster data structure (Ternary Search Tree), not
a Lucene index. Using Ngram-based tools like the spellchecker, or your
implementation is inherently slower.

Netflix, for example, uses a dedicated TST server farm (their own
implementation of TST) to do auto-complete.

On Fri, Mar 26, 2010 at 3:32 AM, stockii st...@shopgate.com wrote:

 hey thx.

 i think the component runs so far, but i don´t see what it brings me.

 my first autocompletion-solution was with EdgeNGram ... and its exactly the
 same result ...

 can anyone, plese show me the advantages of the Issue-1316 ?!
 --
 View this message in context: 
 http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p661787.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com


Re: solr highlighting

2010-03-29 Thread Lance Norskog
No problem: wrapping and unwrapping escaped text can be very confusing.

On Fri, Mar 26, 2010 at 6:31 AM, Niraj Aswani n.asw...@dcs.shef.ac.uk wrote:
 Hi Lance,

 apologies.. please ignore my previous mail.  I'll have a look at the
 PatternReplaceFilter.

 Thanks,
 Niraj

 Niraj Aswani wrote:

 Hi Lance,

 Yes, that is once solution but wouldn't it stop people searching for
 something like choice in the first place?  I mean, if I encode such
 characters at the index time, one would have to write a query like
 lt;choice.  Am I right?

 Thanks,
 Niraj

 Lance Norskog wrote:

 To display html-markup in an html page, it has to be in entity-encoded
 form. So, encode the  as entities in your input application, and
 have it indexed and stored in this format. Then, the bu are
 inserted as normal. This gives you the html text displayable in an
 html page, with all words highlightable. And add gt/lt etc. as
 stopwords.

 At this point you have the element names, attribute names and values,
 and text parts searchable and highlightable. If you only want the HTML
 syntax parts shown, the PatternReplaceFilter is your friend: with
 regex patterns you can pull out those values and ignore the text
 parts.

 The analysis.jsp page will make it much much easier to debug this.

 Good luck!

 On Thu, Mar 25, 2010 at 8:21 AM, Niraj Aswani n.asw...@dcs.shef.ac.uk
 wrote:


 Hi,

 I am using the following two parameters to highlight the hits.

 hl.simple.pre= + URLEncoder.encode(bu)
 hl.simple.post= + URLEncoder.encode(/u/b)

 This seems to work.  However, there is a bit of trouble when the text
 itself
 contains html markup.

 For example, I have indexed a document with the following text in it.
 ===
 something here...
 choice minOccurs=1 maxOccurs=unboundedxyz/choice
 something here..
 ===

 When I search for the keyword choice, what it does is, it inserts
 bu
 just before the word choice and /u/b immediately after the word
 choice. It results into something like below:

 buchoice/b/u minOccurs=1
 maxOccurs=unboundedxyz/buchoice/u/b


 I would like it to be something like:

 lt;buchoice/b/u minOccurs=1
 maxOccurs=unboundedgt;xyz/buchoice/u/bgt;

 Is there any way to do it such that the highlight content is encoded as
 HTML
 but the prefix and suffix are not?

 Thanks,
 Niraj



 When I issue a query, it returns all the corret












-- 
Lance Norskog
goks...@gmail.com


Re: Complex relational values

2010-03-29 Thread Lance Norskog
If 'item' is the unique document level, then this can be done with:
unique id: your own design
searchable text fields:
foo_x:
foo_y:
bar_x:
bar_y:

The query becomes:
foo_x:[100 TO *] AND foo_y:[500 TO *]

Note that to search the other fields with dismax, and foo* with the
standard query parser, you'll need to combine the two with the crazy
multi-parser syntax.

On Fri, Mar 26, 2010 at 10:49 AM, Kumaravel Kandasami
kumaravel.kandas...@gmail.com wrote:
 I would represent each item element as a document, and each attribute as
 the fields of the document.

 if the field names are not known upfront, you could create 'dynamic fields'.




 Kumar    _/|\_
 www.saisk.com
 ku...@saisk.com
 making a profound difference with knowledge and creativity...


 On Fri, Mar 26, 2010 at 12:37 PM, Phil Messenger p...@miniweb.tv wrote:

 Hi,

 I need to store structured information in an index entry for use when
 filtering. As XML, this could be expressed as:

 item
        some_fields_that_are_searched_using_dismax /
        data
                item type=foo x=100 y=200 /
                item type=bar x=300 y=1000 /
        /data
 /item

 I want to be able to *filter* search results according to the data in the
 item tags - eg. show all index entries which match the expression
 type=foo  x  100  y  500

 Having a multivalued field for type, x and y doesn't seem to work here as
 I need to maintain the relationship between a type/x/y.

 I'm not sure how to approach this problem. Is writing a custom field type
 the
 preferred approach?

 thanks,

 Phil.






-- 
Lance Norskog
goks...@gmail.com


Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-03-29 Thread Grant Ingersoll
Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20  
21, 2010

All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 
PM US EDT

The first European conference dedicated to Lucene and Solr is coming to Prague 
from May 18-21, 2010. Apache Lucene EuroCon is running on on not-for-profit 
basis, with net proceeds donated back to the Apache Software Foundation. The 
conference is sponsored by Lucid Imagination with additional support from 
community and other commercial co-sponsors.

Key Dates:
24 March 2010: Call For Participation Open
13 April 2010: Call For Participation Closes
16 April 2010: Speaker Acceptance/Rejection Notification
18-19 May 2010: Lucene and Solr Pre-conference Training Sessions
20-21 May 2010: Apache Lucene EuroCon

This conference creates a new opportunity for the Apache Lucene/Solr community 
and marketplace, providing  the chance to gather, learn and collaborate on the 
latest in Apache Lucene and Solr search technologies and what's happening in 
the community and ecosystem. There will be two days of Lucene and Solr training 
offered May 18  19, and followed by two days packed with leading edge Lucene 
and Solr Open Source Search content and talks by search and open source thought 
leaders.

We are soliciting 45-minute presentations for the conference, 20-21 May 2010 in 
Prague. The conference and all presentations will be in English.

Topics of interest include: 
- Lucene and Solr in the Enterprise (case studies, implementation, return on 
investment, etc.)
- “How We Did It”  Development Case Studies
- Spatial/Geo search
- Lucene and Solr in the Cloud
- Scalability and Performance Tuning
- Large Scale Search
- Real Time Search
- Data Integration/Data Management
- Tika, Nutch and Mahout
- Lucene Connectors Framework
- Faceting and Categorization
- Relevance in Practice
- Lucene  Solr for Mobile Applications
- Multi-language Support
- Indexing and Analysis Techniques
- Advanced Topics in Lucene  Solr Development

Re: Including Tika-extracted docs in a document?

2010-03-29 Thread Lance Norskog
Look at the 'rootEntity' attribute in the DataImportHandler, both the
description and the examples:

http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config

It is active for all entities. It means that you can run several
operations in the outer entities, then have all of their fields come
together in an inner entity. You have to say 'rootEntity=false'
inwards until the last entity before your main document. (No, that is
not a clear explanation.)

This would let you create multi-valued fields, one value from each
input document. Otherwise, this is a hard one.

On Fri, Mar 26, 2010 at 10:37 PM, Don Werve d...@madwombat.com wrote:
 Is it possible to perform Tika extraction on multiple files that are indexed
 as part of a single document?




-- 
Lance Norskog
goks...@gmail.com


Re: Solr not returning all documents?

2010-03-29 Thread Lance Norskog
Yes, this should work. It will be very slow.

There is a special hack by which you can say sort=_docid_+asc (or
+desc). _docid_ is a magic field name that avoids sorting the results.
Pulling documents at row # 1 million should be only a little slower
than pulling documents at row #0.

On Mon, Mar 29, 2010 at 12:37 AM, Adrian Pemsel apem...@gmail.com wrote:
 Hi,

 As part of our application I have written a reindex task that runs through
 all documents in a core one by one (using *:*, a start offset and a row
 limit of 1) and adds them to a new core (potentially with a new schema).
 However, while working well for small sets this approach somehow does not
 seem to work for larger data sets. The Reindex task counts its offset into
 the old core, this count stops at about 118000 and no more documents are
 returned. However, numDocs says there are around 582000 documents in the old
 core.
 Am I making a wrong assumption in believing I should get all documents like
 this?

 Thanks,

 Adrian




-- 
Lance Norskog
goks...@gmail.com


Re: Experiences with SOLR-1797 ?

2010-03-29 Thread Lance Norskog
There was only one report of the problem.

I just read the patch and original source and it looks right; in
concurrent programming these are famous last words :)

2010/3/29 Daniel Nowak daniel.no...@rocket-internet.de:
 Hello,

 has anyone some experiences with this patch of SOLR-1797 
 (http://issues.apache.org/jira/browse/SOLR-1797) ?

 Best Regards


 Daniel Nowak
 Senior Developer

 Rocket Internet GmbH  |  Saarbrücker Straße 20/21  |  10405 Berlin  | 
 Deutschland

 tel: +49 30 / 559 554 66  |  fax: +49 30 / 559 554 67  |  skype: 
 daniel.s.nowak

 mail: daniel.no...@rocket-internet.de

 Geschäftsführer: Frank Biedka, Dr. Florian Heinemann, Uwe Horstmann, Felix 
 Jahn, Arnt Jeschke, Dr. Philipp Kreibohm

 Eingetragen beim Amtsgericht Berlin, HRB 109262 USt-ID DE256469659







-- 
Lance Norskog
goks...@gmail.com


Re: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-03-29 Thread mbohlig
Grant,

Were you going to send out the open for registration email as well?

-Mike




- Original Message 
From: Grant Ingersoll gsing...@apache.org
Cc: Lucene mailing list gene...@lucene.apache.org; 
solr-user@lucene.apache.org; java-u...@lucene.apache.org; 
mahout-u...@lucene.apache.org; nutch-u...@lucene.apache.org; 
openrelevance-u...@lucene.apache.org; tika-u...@lucene.apache.org; 
pylucene-u...@lucene.apache.org; connectors-...@incubator.apache.org; 
lucene-net-...@lucene.apache.org
Sent: Mon, March 29, 2010 6:11:58 PM
Subject: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic 
May 20  21, 2010

Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20  
21, 2010

All submissions must be received by Tuesday, April 13, 2010, 12 Midnight CET/6 
PM US EDT

The first European conference dedicated to Lucene and Solr is coming to Prague 
from May 18-21, 2010. Apache Lucene EuroCon is running on on not-for-profit 
basis, with net proceeds donated back to the Apache Software Foundation. The 
conference is sponsored by Lucid Imagination with additional support from 
community and other commercial co-sponsors.

Key Dates:
24 March 2010: Call For Participation Open
13 April 2010: Call For Participation Closes
16 April 2010: Speaker Acceptance/Rejection Notification
18-19 May 2010: Lucene and Solr Pre-conference Training Sessions
20-21 May 2010: Apache Lucene EuroCon

This conference creates a new opportunity for the Apache Lucene/Solr community 
and marketplace, providing  the chance to gather, learn and collaborate on the 
latest in Apache Lucene and Solr search technologies and what's happening in 
the community and ecosystem. There will be two days of Lucene and Solr training 
offered May 18  19, and followed by two days packed with leading edge Lucene 
and Solr Open Source Search content and talks by search and open source thought 
leaders.

We are soliciting 45-minute presentations for the conference, 20-21 May 2010 in 
Prague. The conference and all presentations will be in English.

Topics of interest include: 
- Lucene and Solr in the Enterprise (case studies, implementation, return on 
investment, etc.)
- “How We Did It”  Development Case Studies
- Spatial/Geo search
- Lucene and Solr in the Cloud
- Scalability and Performance Tuning
- Large Scale Search
- Real Time Search
- Data Integration/Data Management
- Tika, Nutch and Mahout
- Lucene Connectors Framework
- Faceting and Categorization
- Relevance in Practice
- Lucene  Solr for Mobile Applications
- Multi-language Support
- Indexing and Analysis Techniques
- Advanced Topics in Lucene  Solr Development


Re: SOLR-1316 How To Implement this autosuggest component ???

2010-03-29 Thread Andy
Reading through this thread and SOLR-1316, there seems to be a lot of different 
ways to implement auto-complete in Solr. I've seen the mentions of:

EdgeNGrams
TermsComponent
Faceting
TST
Patricia Tries
RadixTree
DAWG

Which algorthm does SOLR-1316 implement? TST is one. There are others mentioned 
in the comments on SOLR-1316, such as Patricia Tries, RadixTree, DAWG. Are 
those implemented too?

Among all those methods is there a recommended one? What are the pros  cons?

Thanks.

--- On Mon, 3/29/10, Lance Norskog goks...@gmail.com wrote:

From: Lance Norskog goks...@gmail.com
Subject: Re: SOLR-1316 How To Implement this autosuggest component ???
To: solr-user@lucene.apache.org
Date: Monday, March 29, 2010, 8:57 PM

SOLR-1316 uses a much faster data structure (Ternary Search Tree), not
a Lucene index. Using Ngram-based tools like the spellchecker, or your
implementation is inherently slower.

Netflix, for example, uses a dedicated TST server farm (their own
implementation of TST) to do auto-complete.

On Fri, Mar 26, 2010 at 3:32 AM, stockii st...@shopgate.com wrote:

 hey thx.

 i think the component runs so far, but i don´t see what it brings me.

 my first autocompletion-solution was with EdgeNGram ... and its exactly the
 same result ...

 can anyone, plese show me the advantages of the Issue-1316 ?!
 --
 View this message in context: 
 http://n3.nabble.com/SOLR-1316-How-To-Implement-this-patch-autoComplete-tp506492p661787.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com



  

Optimize after delta-import (DIH)

2010-03-29 Thread Blargy

According to the wiki: http://wiki.apache.org/solr/DataImportHandler#Commands
the delta-import command will accept the same clean, commit and optimize
parameters that the full-import command takes but I am my index keeps saying
its not optimized.

[java] INFO: [items] webapp=/solr path=/dataimport
params={optimize=trueclean=truecommit=truecommand=delta-import} status=0
QTime=1 

Also can someone explain to me exactly what the clean command does? The wiki
states: Tells whether to clean up the index before the indexing is started
but thats kind of vague. What does it actually do?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Optimize-after-delta-import-DIH-tp685147p685147.html
Sent from the Solr - User mailing list archive at Nabble.com.