date:20090729

Re: Solr replication and spellcheck data

2009-07-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

This is not supported by the Java Replication . but planned for later
https://issues.apache.org/jira/browse/SOLR-866

On Wed, Jul 29, 2009 at 4:04 AM, Ian Sugariansu...@gmail.com wrote:
 Hi

 I would like to make use of the new replication mechanism [1] to set up a
 master-slaves configuration, but from quick reading and searching around, I
 can't seem to find a way to replicate the spelling index in addition to the
 main search index. (We use the spellcheck component)

 Is there a way to do it, or would we have to go the cron/script/rsync way
 [2]?

 Any pointers appreciated. I probably missed something!

 Ian

 [1] http://wiki.apache.org/solr/SolrReplication
 [2] http://wiki.apache.org/solr/CollectionDistribution




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: query in solr lucene

2009-07-29 Thread Sushan Rungta


I tried using AND, but it even provided me doc 3 which was not required.

Hence my problem still persists...

regards,
Sushan

At 06:59 AM 7/29/2009, Avlesh Singh wrote:


 No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I read
 it.

Sorry, my bad. I did not read properly before replying.

Cheers
Avlesh

On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson 
erickerick...@gmail.comwrote:


 No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I read
 it.

 You might have some joy with KeywordAnalyzer, which does
 not break the incoming stream up into tokens. You have to be
 careful, though, because it also won't fold the case, so 'Hello'
 would not match 'hello'.

 Best
 Erick

 On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh avl...@gmail.com wrote:

  You should perform a PhraseQuery on the required field.
  Meaning, http://your-solr-host:port:
  /your-core-path/select?q=fieldName:Hello
  how are you sushan would work for you.
 
  Cheers
  Avlesh
 
  2009/7/28 Gérard Dupont ger.dup...@gmail.com
 
   Hi Sushan,
  
   I'm not an expert of Solr, just beginner, but it appears to me that you
may
   have default 'OR' combinaison fo keywords so that will explain this
   behavior. Try to modify the configuration for an 'AND' combinaison.
  
   cheers
  
   On Tue, Jul 28, 2009 at 16:49, Sushan Rungta s...@clickindia.com
 wrote:
  
I am extremely sorry for responding late as I was ill from past few
  days.
   
My problem is explained below with an example:
   
I am having three documents with following list:
   
1. Hello how are you
2. Hello how are you sushan
3. Hello how are you sushan. I am fine.
   
When I search for a query Hello how are you sushan, I should only
 get
document 2 in my result.
   
I hope this will give you all a better insight in my problem.
   
regards,
   
Sushan Rungta
   
  
  
  
   --
   Gérard Dupont
   Information Processing Control and Cognition (IPCC) - EADS DS
   http://weblab-project.org
  
   Document  Learning team - LITIS Laboratory

Re: highlighting performance

2009-07-29 Thread ravi.gidwani


Hey Matt:
 I have been facing the same issue. I have a text field that I
highlight along with other fields (may be 10 others fields). But If I enable
highlighting on this text field that contains large number of
characters/words (  100 000 characters) , highlighting suffers performance.
Queries return in about 15/20 seconds with this field enabled in highlights
as compared to less than a second WITHOUT this enabled in highlight.
I did try termvector=true , but I did not see any performance
gain either. 

Just wondering if you were able to solve your issue OR tweak the performance
in any other way. 

BTW , I use solr 1.3.

~Ravi .

goodieboy wrote:
 
 Thanks Otis. I added termVector=true for those fields, but there isn't a
 noticeable difference. So, just to be a little more clear, the dynamic
 fields I'm adding... there might be hundreds. Do you see this as a
 problem?
 
 Thanks,
 Matt
 
 On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 

 Matt,

 I believe indexing those fields that you will use for highlighting with
 term vectors enabled will make things faster (and your index a bit
 bigger).


 Otis --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, May 15, 2009 5:08:23 PM
  Subject: highlighting performance
 
  Hi,
 
  I'm experimenting with highlighting and am noticing a big drop in
  performance with my setup. I have documents that use quite a few
 dynamic
  fields (20-30). The fields are multiValued stored/indexed text fields,
 each
  with a few paragraphs worth of text. My hl.fl param is set to *_t
 
  What kinds of things can I tweak to make this faster? Is it because I'm
  highlighting so many different fields?
 
  Thanks,
  Matt


 
 
Quoted from: 
http://www.nabble.com/highlighting-performance-tp23567323p23713406.html



goodieboy wrote:
 
 Thanks Otis. I added termVector=true for those fields, but there isn't a
 noticeable difference. So, just to be a little more clear, the dynamic
 fields I'm adding... there might be hundreds. Do you see this as a
 problem?
 
 Thanks,
 Matt
 
 On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic 
 otis_gospodne...@yahoo.com wrote:
 

 Matt,

 I believe indexing those fields that you will use for highlighting with
 term vectors enabled will make things faster (and your index a bit
 bigger).


 Otis --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, May 15, 2009 5:08:23 PM
  Subject: highlighting performance
 
  Hi,
 
  I'm experimenting with highlighting and am noticing a big drop in
  performance with my setup. I have documents that use quite a few
 dynamic
  fields (20-30). The fields are multiValued stored/indexed text fields,
 each
  with a few paragraphs worth of text. My hl.fl param is set to *_t
 
  What kinds of things can I tweak to make this faster? Is it because I'm
  highlighting so many different fields?
 
  Thanks,
  Matt


 
 

-- 
View this message in context: 
http://www.nabble.com/highlighting-performance-tp23567323p24713543.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a multi-shard optimize message?

2009-07-29 Thread Shalin Shekhar Mangar

On Wed, Jul 29, 2009 at 2:48 AM, Phillip Farber pfar...@umich.edu wrote:


 Normally to optimize an index you POST optimize/ to /solr/update.  Is
 there any way to POST an optimize message to one instance and have it
 propagate to all shards sort of like the select?

 /solr-shard-1/select?q=dog... shards=shard-1,shard2


No, you'll need to send optimize to each host separately.
-- 
Regards,
Shalin Shekhar Mangar.

refering/alias other Solr documents

2009-07-29 Thread ravi.gidwani


Hi all:
Is in solr, that will allow documents referring each other ? In
other words, if a search for abc matches on document 1 , I should be able
to return document 2 even though the index does any fields matching abc.
Here is the scenario with some more details:

Solr version:1.3

Scenario:
1) Solr Document 1 with say some field title=abc and Solr Document 2 with
its own data.
2) User searches for abc and gets Document 1 as it matches on title field

Expected results:
When the user searches for abc  he it also get Document 2 along with
Document 1. 

I understand one way of doing this is to make sure Document 2 has all the
contents of Document 1. But this introduces a issue of keeping the two
documents (and hence their solr index) in sync with each other. 

I think I am looking for a mechanism like this:

Document 1 refers = document 2, Document 3. 

Hence whenever document 1 in part of search results, document 2 and document
3 will also be returned as search results .

I may be totally off on this expectation but am trying to solve a Contains
problem where lets say a book (represented as Document 1 in solr) contains
Chapters (represented by Document 2,3,4..) in solr. 

I hope this is not too confusing ;) 

TIA
~Ravi Gidwani
-- 
View this message in context: 
http://www.nabble.com/refering-alias-other-Solr-documents-tp24713855p24713855.html
Sent from the Solr - User mailing list archive at Nabble.com.

Boosting ('bq') on multi-valued fields

2009-07-29 Thread KaktuChakarabati


Hey,
I have a field defined as such:

 field name=site_idtype=string indexed=true stored=false
multiValued=true /

with the string type defined as:

fieldtype name=string class=solr.StrField sortMissingLast=true
omitNorms=true/

When I try using some query-time boost parameters using the bq on values of
this field it seems to behave
strangely in case of documents actually having multiple values:
If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems like
all the cases where this field is actually
populated with multiple ones ( i.e a document with field value 5|6 ) do
not get boosted at all. I verified this using
debugQuery  explainOther=doc_id:document_with_multiple_values.
is this a known issue/bug? any work arounds? (i'm using a nightly solr build
from a few months back.. )

Thanks,
-Chak
-- 
View this message in context: 
http://www.nabble.com/Boosting-%28%27bq%27%29-on-multi-valued-fields-tp24713905p24713905.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: update some index documents after indexing process is done with DIH

2009-07-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlesemarc.sturl...@gmail.com wrote:

 That really sounds the best way to reach my goal. How could I invoque a
 listener from the newSearcher?Would be something like:
    listener event=newSearcher class=solr.QuerySenderListener
      arr name=queries
        lst str name=qsolr/str str name=start0/str str
 name=rows10/str /lst
        lst str name=qrocks/str str name=start0/str str
 name=rows10/str /lst
        lststr name=qstatic newSearcher warming query from
 solrconfig.xml/str/lst
      /arr
    /listener
    listener event=newSearcher class=solr.MyCustomListener

 And MyCustomListener would be the class who open the reader:

        RefCountedSolrIndexSearcher searchHolder = null;
        try {
          searchHolder = dataImporter.getCore().getSearcher();
          IndexReader reader = searchHolder.get().getReader();

          //Here I iterate over the reader doing docuemnt modifications

        } finally {
           if (searchHolder != null) searchHolder.decref();
        }
        } catch (Exception ex) {
            LOG.info(error);
        }

you may not be able to access the DIH API from a newSearcher event .
But the API would give you the searcher directly as a method
parameter.

 Finally, to access to documents and add fields to some of them, I have
 thought in using SolrDocument classes. Can you please point me where
 something similar is done in solr source (I mean creation of SolrDocuemnts
 and conversion of them to proper lucene docuements).

 Does this way for reaching the goal makes sense?

 Thanks in advance



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 when a core is reloaded the event fired is firstSearcher. newSearcher
 is fired when a commit happens


 On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 Ok, but if I handle it in a newSearcher listener it will be executed
 every
 time I reload a core, isn't it? The thing is that I want to use an
 IndexReader to load in a HashMap some doc fields of the index and
 depending
 of the values of some field docs modify other docs. Its very memory
 consuming (I have tested it with a simple lucene script). Thats why I
 wanted
 to do it just after the indexing process.

 My ideal case would be to do it in the commit function of
 DirectUpdatehandler2.java just before
 writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want
 to
 mess that code... so trying to find out the best way to do that as a
 plugin
 instead of a hack as possible.

 Thanks in advance


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 It is best handled as a 'newSearcher' listener in solrconfig.xml.
 onImportEnd is invoked before committing

 On Tue, Jul 28, 2009 at 3:13 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 Hey there,
 I would like to be able to do something like: After the indexing
 process
 is
 done with DIH I would like to open an indexreader, iterate over all
 docs,
 modify some of them depending on others and delete some others. I can
 easy
 do this directly coding with lucene but would like to know if there's a
 way
 to do it with Solr using SolrDocument or SolrInputDocument classes.
 I have thougth in using SolrJ or DIH listener onImportEnd but not sure
 if
 I
 can get an IndexReader in there.
 Any advice?
 Thanks in advance
 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context: 
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: FieldCollapsing: Two response elements returned?

2009-07-29 Thread Licinio Fernández Maurelo

I've applied latest collapse field related patch (patch-3) and it doesn't work.
Anyone knows how can i get only the collapse response ?


29-jul-2009 11:05:21 org.apache.solr.common.SolrException log
GRAVE: java.lang.ClassCastException:
org.apache.solr.handler.component.CollapseComponent cannot be cast to
org.apache.solr.request.SolrRequestHandler
at 
org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:150)
at org.apache.solr.core.SolrCore.init(SolrCore.java:539)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:381)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:241)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:115)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at 
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
at 
org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
at 
org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:987)
at 
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:909)
at 
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:495)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
at 
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
at 
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
at 
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
at 
org.apache.catalina.core.StandardService.start(StandardService.java:516)
at 
org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

2009/7/28 Marc Sturlese marc.sturl...@gmail.com:

 That's provably because you are using both the CollpaseComponent and the
 QueryComponent. I think the 2 or 3 last patches allow full replacement of
 QueryComponent.You shoud just replace:

 searchComponent name=query
 class=org.apache.solr.handler.component.QueryComponent /
 for:
 searchComponent name=query
 class=org.apache.solr.handler.component.CollapseComponent /

 This will sort your problem and make response times faster.



 Jay Hill wrote:

 I'm doing some testing with field collapsing, and early results look good.
 One thing seems odd to me however. I would expect to get back one block of
 results, but I get two - the first one contains the collapsed results, the
 second one contains the full non-collapsed results:

 result name=response numFound=11 start=0 ... /result
 result name=response numFound=62 start=0 ... /result

 This seems somewhat confusing. Is this intended or is this a bug?

 Thanks,
 -Jay



 --
 View this message in context: 
 http://www.nabble.com/FieldCollapsing%3A-Two-response-elements-returned--tp24690426p24693960.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lici

solr/home in web.xml relative to web server home

2009-07-29 Thread Chantal Ackermann


Hi all,

the environment variable (env-entry) in web.xml to configure the 
solr/home is relative to the web server's working directory. I find this 
unusual as all the servlet paths are relative to the web applications 
directory (webapp context, that is). So, I specified solr/home relative 
to the web app dir, as well, at first.


I think it makes deployment in an unknown environment, or in different 
environments using a simple war more complex than it needed to be. If a 
webapp relative path inside the war file could be used, the 
configuration of solr (and cores) could be included in the war file 
completely with no outside dependency - except, of course, of the data 
directory if that is to go some place else.
(In my case, I want to deliver the solr web application including a 
custom entity processor, so that is why I want to include the solr war 
as part of my release cycle. It is easier to deliver that to the system 
administration than to provide them with partial packages they have to 
install into an already installed war, imho.)


Am I the only one who has run into that?

Thanks for any input on that!
Chantal



--
Chantal Ackermann

Re: highlighting performance

2009-07-29 Thread Koji Sekiguchi


Just an FYI, Lucene 2.9 has FastVectorHighlighter:

http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/org/apache/lucene/search/vectorhighlight/package-summary.html

Features

   * fast for large docs
   * support N-gram fields
   * support phrase-unit highlighting with slops
   * need Java 1.5
   * highlight fields need to be TermVector.WITH_POSITIONS_OFFSETS
   * take into account query boost to score fragments
   * support colored highlight tags
   * pluggable FragListBuilder
   * pluggable FragmentsBuilder

Unfortunately, Solr hasn't incorporated it yet:

https://issues.apache.org/jira/browse/SOLR-1268

Koji


ravi.gidwani wrote:

Hey Matt:
 I have been facing the same issue. I have a text field that I
highlight along with other fields (may be 10 others fields). But If I enable
highlighting on this text field that contains large number of
characters/words (  100 000 characters) , highlighting suffers performance.
Queries return in about 15/20 seconds with this field enabled in highlights
as compared to less than a second WITHOUT this enabled in highlight.
I did try termvector=true , but I did not see any performance
gain either. 


Just wondering if you were able to solve your issue OR tweak the performance
in any other way. 


BTW , I use solr 1.3.

~Ravi .

goodieboy wrote:
  

Thanks Otis. I added termVector=true for those fields, but there isn't a
noticeable difference. So, just to be a little more clear, the dynamic
fields I'm adding... there might be hundreds. Do you see this as a
problem?

Thanks,
Matt

On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:



Matt,

I believe indexing those fields that you will use for highlighting with
term vectors enabled will make things faster (and your index a bit
bigger).


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Matt Mitchell goodie...@gmail.com
To: solr-user@lucene.apache.org
Sent: Friday, May 15, 2009 5:08:23 PM
Subject: highlighting performance

Hi,

I'm experimenting with highlighting and am noticing a big drop in
performance with my setup. I have documents that use quite a few


dynamic
  

fields (20-30). The fields are multiValued stored/indexed text fields,


each
  

with a few paragraphs worth of text. My hl.fl param is set to *_t

What kinds of things can I tweak to make this faster? Is it because I'm
highlighting so many different fields?

Thanks,
Matt

  

Quoted from: 
http://www.nabble.com/highlighting-performance-tp23567323p23713406.html




goodieboy wrote:
  

Thanks Otis. I added termVector=true for those fields, but there isn't a
noticeable difference. So, just to be a little more clear, the dynamic
fields I'm adding... there might be hundreds. Do you see this as a
problem?

Thanks,
Matt

On Fri, May 15, 2009 at 7:48 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:



Matt,

I believe indexing those fields that you will use for highlighting with
term vectors enabled will make things faster (and your index a bit
bigger).


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: Matt Mitchell goodie...@gmail.com
To: solr-user@lucene.apache.org
Sent: Friday, May 15, 2009 5:08:23 PM
Subject: highlighting performance

Hi,

I'm experimenting with highlighting and am noticing a big drop in
performance with my setup. I have documents that use quite a few


dynamic
  

fields (20-30). The fields are multiValued stored/indexed text fields,


each
  

with a few paragraphs worth of text. My hl.fl param is set to *_t

What kinds of things can I tweak to make this faster? Is it because I'm
highlighting so many different fields?

Thanks,
Matt

Re: debugQuery=true issue

2009-07-29 Thread gwk


Hi,

Thanks for your response, I'm still developing so the schema is still in 
flux so I guess that explains it. Oh and regarding the NPE, I updated my 
checkout and recompiled and now it's gone so I guess somewhere between 
revision 787997 and 798482 it's already been fixed.


Regards,

gwk

Robert Petersen wrote:

I had something similar happen where optimize fixed an odd
sorting/scoring problem, and as I understand it the optimize will clear
out index 'lint' from old schemas/documents and so thus could affect
result scores since all the term vectors or something similar are
refreshed etc etc

Re: HTTP Status 500 - java.lang.RuntimeException: Can't find resource 'solrconfig.xml'

2009-07-29 Thread Koji Sekiguchi


As Solr said in the log, Solr couldn't find solrconfig.xml in classpath
or solr.solr.home, cwd.

My guess is that relative path you set for solr.solr.home
was incorrect? Why don't you try:

solr.solr.home=/home/huenzhao/search/tomcat6/bin/solr

instead of:

solr.solr.home=home/huenzhao/search/tomcat6/bin/solr

Koji

huenzhao wrote:

Hi all,

I used ubuntu 8.10 as the solr server OS, and set the
solr.solr.home=home/huenzhao/search/tomcat6/bin/solr.

When I run the tomcat(The tomcat and the solr that I used running on windows
XP has no problem), there has error as :

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change: false in null
-
java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in
classpath or 'home/huenzhao/search/tomcat6/bin/solr/conf/',
cwd=/home/huenzhao/search/tomcat6/bin at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:194)
at
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:162)
at org.apache.solr.core.Config.(Config.java:100) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:113) at
org.apache.solr.core.SolrConfig.(SolrConfig.java:70) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:69)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:108)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3696)
at 


……

Anybody knows how to do?

enzhao...@gmail.com

Re: solr/home in web.xml relative to web server home

2009-07-29 Thread Shalin Shekhar Mangar

On Wed, Jul 29, 2009 at 2:42 PM, Chantal Ackermann 
chantal.ackerm...@btelligent.de wrote:

 Hi all,

 the environment variable (env-entry) in web.xml to configure the solr/home
 is relative to the web server's working directory. I find this unusual as
 all the servlet paths are relative to the web applications directory (webapp
 context, that is). So, I specified solr/home relative to the web app dir, as
 well, at first.

 I think it makes deployment in an unknown environment, or in different
 environments using a simple war more complex than it needed to be. If a
 webapp relative path inside the war file could be used, the configuration of
 solr (and cores) could be included in the war file completely with no
 outside dependency - except, of course, of the data directory if that is to
 go some place else.
 (In my case, I want to deliver the solr web application including a custom
 entity processor, so that is why I want to include the solr war as part of
 my release cycle. It is easier to deliver that to the system administration
 than to provide them with partial packages they have to install into an
 already installed war, imho.)


You don't need to create a custom war for that. You can package the
EntityProcessor into a separate jar and add it to solr_home/lib directory.

-- 
Regards,
Shalin Shekhar Mangar.

Relevant results with DisMaxRequestHandler

2009-07-29 Thread Vincent Pérès


Hello,

I did notice several strange behaviors on queries. I would like to share
with you an example, so maybe you can explain to me what is going wrong.

Using the following query :
http://localhost:8983/solr/others/select/?debugQuery=trueq=anna%20lewisrows=20start=0fl=*qt=dismax

I get back around 100 results. Follow the two first :
doc
str name=idPerson:151/str
str name=name_sVictoria Davisson/str
/doc
doc
str name=idPerson:37/str
str name=name_sAnna Lewis/str
/doc

And the related debugs :
57.998047 = (MATCH) sum of:
  0.048290744 = (MATCH) sum of:
0.024546575 = (MATCH) max plus 0.01 times others of:
  0.024546575 = (MATCH) weight(text:anna^0.5 in 64288), product of:
0.027395602 = queryWeight(text:anna^0.5), product of:
  0.5 = boost
  5.734427 = idf(docFreq=564, numDocs=30400)
  0.009554783 = queryNorm
0.8960042 = (MATCH) fieldWeight(text:anna in 64288), product of:
  1.0 = tf(termFreq(text:anna)=1)
  5.734427 = idf(docFreq=564, numDocs=30400)
  0.15625 = fieldNorm(field=text, doc=64288)
0.02374417 = (MATCH) max plus 0.01 times others of:
  0.02374417 = (MATCH) weight(text:lewi^0.5 in 64288), product of:
0.026944114 = queryWeight(text:lewi^0.5), product of:
  0.5 = boost
  5.6399217 = idf(docFreq=620, numDocs=30400)
  0.009554783 = queryNorm
0.88123775 = (MATCH) fieldWeight(text:lewi in 64288), product of:
  1.0 = tf(termFreq(text:lewi)=1)
  5.6399217 = idf(docFreq=620, numDocs=30400)
  0.15625 = fieldNorm(field=text, doc=64288)
  57.949757 = (MATCH) FunctionQuery(ord(name_s)), product of:
1213.0 = ord(name_s)=1213
5.0 = boost
0.009554783 = queryNorm

5.006892 = (MATCH) sum of:
  0.038405567 = (MATCH) sum of:
0.021955125 = (MATCH) max plus 0.01 times others of:
  0.021955125 = (MATCH) weight(text:anna^0.5 in 62632), product of:
0.027395602 = queryWeight(text:anna^0.5), product of:
  0.5 = boost
  5.734427 = idf(docFreq=564, numDocs=30400)
  0.009554783 = queryNorm
0.80141056 = (MATCH) fieldWeight(text:anna in 62632), product of:
  2.236068 = tf(termFreq(text:anna)=5)
  5.734427 = idf(docFreq=564, numDocs=30400)
  0.0625 = fieldNorm(field=text, doc=62632)
0.016450444 = (MATCH) max plus 0.01 times others of:
  0.016450444 = (MATCH) weight(text:lewi^0.5 in 62632), product of:
0.026944114 = queryWeight(text:lewi^0.5), product of:
  0.5 = boost
  5.6399217 = idf(docFreq=620, numDocs=30400)
  0.009554783 = queryNorm
0.61053944 = (MATCH) fieldWeight(text:lewi in 62632), product of:
  1.7320508 = tf(termFreq(text:lewi)=3)
  5.6399217 = idf(docFreq=620, numDocs=30400)
  0.0625 = fieldNorm(field=text, doc=62632)
  4.968487 = (MATCH) FunctionQuery(ord(name_s)), product of:
104.0 = ord(name_s)=104
5.0 = boost
0.009554783 = queryNorm

I'm using a simple boost function :
   requestHandler name=dismax class=solr.SearchHandler 
 lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qf
 text^0.5 name_s^5.0
  /str
  str name=pf
 name_s^5.0
  /str
  str name=bf
 name_s^5.0
  /str
 /lst
   /requestHandler

Can anyone explain to me why the first result is on top (the query is 'anna
lewis') with a huge weight and nothing related (it seems the weight come
from the name_s field...) ?

A second general question... is it possible to boost a field if the query
match exactly the content of a field?

Thank you !
Vincent
-- 
View this message in context: 
http://www.nabble.com/Relevant-results-with-DisMaxRequestHandler-tp24716870p24716870.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: facet.prefix question

2009-07-29 Thread Koji Sekiguchi


Licinio Fernández Maurelo wrote:

i'm trying to do some filtering in the count list retrieved by solr when
doing a faceting query ,

i'm wondering how can i use facet.prefix to gem something like this:

Query

facet.field=foofacet.prefix=A OR B

Response

lst name=facet_fields
-
lst name=foo
int name=A12560/int
int name=A*5440/int
int name=B**2357/int
.
.
.
/lst



How can i achieve such this behaviour?

Best Regards

  


You cannot set a query for facet.prefix parameter. facet.prefix should
be a prefix *string* of terms in the index, and you can set it at a time.
So I think you need to send two requests to get what you want:

...facet.field=foofacet.prefix=A
...facet.field=foofacet.prefix=B


Koji

Question about formatting the results returned from Solr

2009-07-29 Thread ahammad


Hi all,

Not sure how good my title is, but here is a (hopefully) better explanation
on what I mean.

I am indexing a set of articles from a DB. Each article has an author. The
author is saved in then the DB as an author ID, which is a number.

There is another table in the DB with more relevant information about the
author. Basically it has columns like:

id, firstname, lastname, email, userid

I set up the DIH so that it returns the userid, and it works fine:

arr name=author
   strjdoe/str
   strmsmith/str
/arr

Would it be possible to return all of the information about the author
(first name, ...) as a subset of the results above?

Here is what I mean:

arr name=author
   arr name=jdoe
  str name=firstNameJohn/str
  str name=lastNameDoe/str
  str name=emailj...@doe.com/str
   /arr
   ...
/arr

Something similar to that at least...

Not sure how descriptive I was, but any pointers would be highly
appreciated.

Cheers

-- 
View this message in context: 
http://www.nabble.com/Question-about-formatting-the-results-returned-from-Solr-tp24719831p24719831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Getting Tika to work in Solr 1.4 nightly

2009-07-29 Thread Kevin Miller

I am working with Solr 1.4 nightly and am running it on a Windows
machine.  Solr is running using the example folder that was installed
from the zip file.  The only alteration that I have made to this default
installation is to add a simple Word document into the exampledocs
folder.

I am trying to get Tika to work in Solr.  When I run the tika-0.3.jar
directed to a Word document it outputs to the screen in XML format.  I
am not able to get Solr to run tika and index the information in the
sample Word document.

I have looked at the following resources: 
Solr mailing list archive (although I could have missed something here);
Documentation and Getting started on the Apache Tika website;
I even found an article called Content Extraction with Tika at this
website:
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles
/Content-Extraction-Tika This article talks about using curl.  Is curl
necessary to use or does Solr have something already configured to do
the same as curl?

I have modified the solrconfig.xml file to include the request handler
for the ExtractingRequestHandler.  I used the modification that was
commented out in the solrconfig.xml file.  Here it is for reference:

requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
  str name=ext.map.Last-Modifiedlast_modified/str
  bool name=ext.ignore.und.fltrue/bool
/lst
  /requestHandler

Is there some modification to this code that I need to make?

Can some one please direct me to a source that can help me get this to
work.


Kevin Miller

Re: FieldCollapsing: Two response elements returned?

2009-07-29 Thread Licinio Fernández Maurelo

My last mail is wrong. Sorry

El 29 de julio de 2009 11:10, Licinio Fernández
Maurelolicinio.fernan...@gmail.com escribió:
 I've applied latest collapse field related patch (patch-3) and it doesn't 
 work.
 Anyone knows how can i get only the collapse response ?


 29-jul-2009 11:05:21 org.apache.solr.common.SolrException log
 GRAVE: java.lang.ClassCastException:
 org.apache.solr.handler.component.CollapseComponent cannot be cast to
 org.apache.solr.request.SolrRequestHandler
        at 
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:150)
        at org.apache.solr.core.SolrCore.init(SolrCore.java:539)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:381)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:241)
        at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:115)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
        at 
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275)
        at 
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397)
        at 
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
        at 
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3800)
        at 
 org.apache.catalina.core.StandardContext.start(StandardContext.java:4450)
        at 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
        at 
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
        at 
 org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526)
        at 
 org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:987)
        at 
 org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:909)
        at 
 org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:495)
        at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1206)
        at 
 org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:314)
        at 
 org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
        at 
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
        at org.apache.catalina.core.StandardHost.start(StandardHost.java:722)
        at 
 org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
        at 
 org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443)
        at 
 org.apache.catalina.core.StandardService.start(StandardService.java:516)
        at 
 org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:583)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)

 2009/7/28 Marc Sturlese marc.sturl...@gmail.com:

 That's provably because you are using both the CollpaseComponent and the
 QueryComponent. I think the 2 or 3 last patches allow full replacement of
 QueryComponent.You shoud just replace:

 searchComponent name=query
 class=org.apache.solr.handler.component.QueryComponent /
 for:
 searchComponent name=query
 class=org.apache.solr.handler.component.CollapseComponent /

 This will sort your problem and make response times faster.



 Jay Hill wrote:

 I'm doing some testing with field collapsing, and early results look good.
 One thing seems odd to me however. I would expect to get back one block of
 results, but I get two - the first one contains the collapsed results, the
 second one contains the full non-collapsed results:

 result name=response numFound=11 start=0 ... /result
 result name=response numFound=62 start=0 ... /result

 This seems somewhat confusing. Is this intended or is this a bug?

 Thanks,
 -Jay



 --
 View this message in context: 
 http://www.nabble.com/FieldCollapsing%3A-Two-response-elements-returned--tp24690426p24693960.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 Lici




-- 
Lici

Re: Relevant results with DisMaxRequestHandler

2009-07-29 Thread Erik Hatcher



On Jul 29, 2009, at 6:55 AM, Vincent Pérès wrote:

Using the following query :
http://localhost:8983/solr/others/select/?debugQuery=trueq=anna%20lewisrows=20start=0fl=*qt=dismax

I get back around 100 results. Follow the two first :
doc
str name=idPerson:151/str
str name=name_sVictoria Davisson/str
/doc
doc
str name=idPerson:37/str
str name=name_sAnna Lewis/str
/doc

And the related debugs :
57.998047 = (MATCH) sum of:
 0.048290744 = (MATCH) sum of:
   0.024546575 = (MATCH) max plus 0.01 times others of:
 0.024546575 = (MATCH) weight(text:anna^0.5 in 64288), product of:
   0.027395602 = queryWeight(text:anna^0.5), product of:
 0.5 = boost
 5.734427 = idf(docFreq=564, numDocs=30400)
 0.009554783 = queryNorm
   0.8960042 = (MATCH) fieldWeight(text:anna in 64288), product  
of:

 1.0 = tf(termFreq(text:anna)=1)
 5.734427 = idf(docFreq=564, numDocs=30400)
 0.15625 = fieldNorm(field=text, doc=64288)
   0.02374417 = (MATCH) max plus 0.01 times others of:
 0.02374417 = (MATCH) weight(text:lewi^0.5 in 64288), product of:
   0.026944114 = queryWeight(text:lewi^0.5), product of:
 0.5 = boost
 5.6399217 = idf(docFreq=620, numDocs=30400)
 0.009554783 = queryNorm
   0.88123775 = (MATCH) fieldWeight(text:lewi in 64288), product  
of:

 1.0 = tf(termFreq(text:lewi)=1)
 5.6399217 = idf(docFreq=620, numDocs=30400)
 0.15625 = fieldNorm(field=text, doc=64288)
 57.949757 = (MATCH) FunctionQuery(ord(name_s)), product of:
   1213.0 = ord(name_s)=1213
   5.0 = boost
   0.009554783 = queryNorm

5.006892 = (MATCH) sum of:
 0.038405567 = (MATCH) sum of:
   0.021955125 = (MATCH) max plus 0.01 times others of:
 0.021955125 = (MATCH) weight(text:anna^0.5 in 62632), product of:
   0.027395602 = queryWeight(text:anna^0.5), product of:
 0.5 = boost
 5.734427 = idf(docFreq=564, numDocs=30400)
 0.009554783 = queryNorm
   0.80141056 = (MATCH) fieldWeight(text:anna in 62632), product  
of:

 2.236068 = tf(termFreq(text:anna)=5)
 5.734427 = idf(docFreq=564, numDocs=30400)
 0.0625 = fieldNorm(field=text, doc=62632)
   0.016450444 = (MATCH) max plus 0.01 times others of:
 0.016450444 = (MATCH) weight(text:lewi^0.5 in 62632), product of:
   0.026944114 = queryWeight(text:lewi^0.5), product of:
 0.5 = boost
 5.6399217 = idf(docFreq=620, numDocs=30400)
 0.009554783 = queryNorm
   0.61053944 = (MATCH) fieldWeight(text:lewi in 62632), product  
of:

 1.7320508 = tf(termFreq(text:lewi)=3)
 5.6399217 = idf(docFreq=620, numDocs=30400)
 0.0625 = fieldNorm(field=text, doc=62632)
 4.968487 = (MATCH) FunctionQuery(ord(name_s)), product of:
   104.0 = ord(name_s)=104
   5.0 = boost
   0.009554783 = queryNorm

I'm using a simple boost function :
  requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
text^0.5 name_s^5.0
 /str
 str name=pf
name_s^5.0
 /str
 str name=bf
name_s^5.0
 /str
/lst
  /requestHandler

Can anyone explain to me why the first result is on top (the query  
is 'anna
lewis') with a huge weight and nothing related (it seems the weight  
come

from the name_s field...) ?


The ord function perhaps isn't doing what you want.  It is returning  
the term position, and thus it appears Anna Lewis is the 104th  
name_s value in your index lexicographically.  And of course Victoria  
Davisson is much further down, at the 1203rd position.  Maybe you  
want rord instead?   But probably not...


A second general question... is it possible to boost a field if the  
query

match exactly the content of a field?


You can use set dismax to have a qs (query slop) factor which will  
boost documents where the users terms are closer together (within the  
number of terms distance specified).


Erik

RE: Boosting ('bq') on multi-valued fields

2009-07-29 Thread Ensdorf Ken

 Hey,
 I have a field defined as such:

  field name=site_idtype=string indexed=true
 stored=false
 multiValued=true /

 with the string type defined as:

 fieldtype name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/

 When I try using some query-time boost parameters using the bq on
 values of
 this field it seems to behave
 strangely in case of documents actually having multiple values:
 If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems
 like
 all the cases where this field is actually
 populated with multiple ones ( i.e a document with field value 5|6 )
 do
 not get boosted at all. I verified this using
 debugQuery  explainOther=doc_id:document_with_multiple_values.
 is this a known issue/bug? any work arounds? (i'm using a nightly solr
 build
 from a few months back.. )

There is no tokenization on 'string' fields, so a query for 5 does not match 
a doc with a value of 5|6 for this field.  You could try  using field type 
'text' for this and see what you get.  You may need to customize it to you the 
StandardAnalyzer or WordDelimiterFilterFactory to get the right behavior.  
Using the analysis tool in the solr admin UI to experiment will probably be 
helpful.

-Ken

Re: update some index documents after indexing process is done with DIH

2009-07-29 Thread Marc Sturlese


From the newSearcher(..) of a CustomEventListener which extends of
AbstractSolrEventListener  can access to SolrIndexSearcher and all core
properties but can't get a SolrIndexWriter. Do you now how can I get from
there a SolrIndexWriter? This way I would be able to modify the documents (I
need to modify them depending on values of other documents, that's why I
can't do it with DIH delta-import).
Thanks in advance


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
 
 On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 That really sounds the best way to reach my goal. How could I invoque a
 listener from the newSearcher?Would be something like:
    listener event=newSearcher class=solr.QuerySenderListener
      arr name=queries
        lst str name=qsolr/str str name=start0/str str
 name=rows10/str /lst
        lst str name=qrocks/str str name=start0/str str
 name=rows10/str /lst
        lststr name=qstatic newSearcher warming query from
 solrconfig.xml/str/lst
      /arr
    /listener
    listener event=newSearcher class=solr.MyCustomListener

 And MyCustomListener would be the class who open the reader:

        RefCountedSolrIndexSearcher searchHolder = null;
        try {
          searchHolder = dataImporter.getCore().getSearcher();
          IndexReader reader = searchHolder.get().getReader();

          //Here I iterate over the reader doing docuemnt modifications

        } finally {
           if (searchHolder != null) searchHolder.decref();
        }
        } catch (Exception ex) {
            LOG.info(error);
        }
 
 you may not be able to access the DIH API from a newSearcher event .
 But the API would give you the searcher directly as a method
 parameter.

 Finally, to access to documents and add fields to some of them, I have
 thought in using SolrDocument classes. Can you please point me where
 something similar is done in solr source (I mean creation of
 SolrDocuemnts
 and conversion of them to proper lucene docuements).

 Does this way for reaching the goal makes sense?

 Thanks in advance



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 when a core is reloaded the event fired is firstSearcher. newSearcher
 is fired when a commit happens


 On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 Ok, but if I handle it in a newSearcher listener it will be executed
 every
 time I reload a core, isn't it? The thing is that I want to use an
 IndexReader to load in a HashMap some doc fields of the index and
 depending
 of the values of some field docs modify other docs. Its very memory
 consuming (I have tested it with a simple lucene script). Thats why I
 wanted
 to do it just after the indexing process.

 My ideal case would be to do it in the commit function of
 DirectUpdatehandler2.java just before
 writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want
 to
 mess that code... so trying to find out the best way to do that as a
 plugin
 instead of a hack as possible.

 Thanks in advance


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 It is best handled as a 'newSearcher' listener in solrconfig.xml.
 onImportEnd is invoked before committing

 On Tue, Jul 28, 2009 at 3:13 PM, Marc
 Sturlesemarc.sturl...@gmail.com
 wrote:

 Hey there,
 I would like to be able to do something like: After the indexing
 process
 is
 done with DIH I would like to open an indexreader, iterate over all
 docs,
 modify some of them depending on others and delete some others. I can
 easy
 do this directly coding with lucene but would like to know if there's
 a
 way
 to do it with Solr using SolrDocument or SolrInputDocument classes.
 I have thougth in using SolrJ or DIH listener onImportEnd but not
 sure
 if
 I
 can get an IndexReader in there.
 Any advice?
 Thanks in advance
 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com
 
 

-- 
View this message in context: 
http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24722111.html
Sent from the Solr - User mailing list archive at

RE: search suggest

2009-07-29 Thread Robert Petersen

To do a proper search suggest feature you have to index all the queries
your system gets and search it with wildcards for matches on what the
user has typed so far for each user keystroke in the search box...
Usually with some timer logic to wait for a small hesitation in their
typing.   



-Original Message-
From: Jack Bates [mailto:ms...@freezone.co.uk] 
Sent: Tuesday, July 28, 2009 10:54 AM
To: solr-user@lucene.apache.org
Subject: search suggest

how can i use solr to make search suggestions? i'm thinking google-style
suggestions, which suggests more refined queries - vs. freebase-style
suggestions, which suggests top hits.

i've been looking at the query params,
http://wiki.apache.org/solr/StandardRequestHandler

- and searching for solr suggest - but haven't figured out how to get
search suggestions from solr

Wildcard and boosting

2009-07-29 Thread Jón Helgi Jónsson

Hey now!

I do index time boosting for my fields and just discovered that when
searching with a trailing wild card the boosting is ignored.

Will my boosting work with a wild card if I do it at query time? And
if so is there a lot of performance difference?

Some other method I can use to preserve my boosting? I do not need
hightlighting.

Thanks,
Jon Helgi

RE: refering/alias other Solr documents

2009-07-29 Thread Steven A Rowe

Hi Ravi,

This may help:

   http://wiki.apache.org/solr/HierarchicalFaceting

Steve

 -Original Message-
 From: ravi.gidwani [mailto:ravi.gidw...@gmail.com]
 Sent: Wednesday, July 29, 2009 3:24 AM
 To: solr-user@lucene.apache.org
 Subject: refering/alias other Solr documents
 
 
 Hi all:
 Is in solr, that will allow documents referring each other ? In
 other words, if a search for abc matches on document 1 , I should be
 able
 to return document 2 even though the index does any fields matching
 abc.
 Here is the scenario with some more details:
 
 Solr version:1.3
 
 Scenario:
 1) Solr Document 1 with say some field title=abc and Solr Document 2
 with
 its own data.
 2) User searches for abc and gets Document 1 as it matches on title
 field
 
 Expected results:
 When the user searches for abc  he it also get Document 2 along with
 Document 1.
 
 I understand one way of doing this is to make sure Document 2 has all
 the
 contents of Document 1. But this introduces a issue of keeping the two
 documents (and hence their solr index) in sync with each other.
 
 I think I am looking for a mechanism like this:
 
 Document 1 refers = document 2, Document 3.
 
 Hence whenever document 1 in part of search results, document 2 and
 document
 3 will also be returned as search results .
 
 I may be totally off on this expectation but am trying to solve a
 Contains
 problem where lets say a book (represented as Document 1 in solr)
 contains
 Chapters (represented by Document 2,3,4..) in solr.
 
 I hope this is not too confusing ;)
 
 TIA
 ~Ravi Gidwani
 --
 View this message in context: http://www.nabble.com/refering-alias-
 other-Solr-documents-tp24713855p24713855.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting Tika to work in Solr 1.4 nightly

2009-07-29 Thread Yonik Seeley

Hi Kevin,
The parameter names have changed in the latest Solr 1.4 builds... please see
http://wiki.apache.org/solr/ExtractingRequestHandler

-Yonik
http://www.lucidimagination.com



On Wed, Jul 29, 2009 at 10:17 AM, Kevin
Millerkevin.mil...@oktax.state.ok.us wrote:
 I am working with Solr 1.4 nightly and am running it on a Windows
 machine.  Solr is running using the example folder that was installed
 from the zip file.  The only alteration that I have made to this default
 installation is to add a simple Word document into the exampledocs
 folder.

 I am trying to get Tika to work in Solr.  When I run the tika-0.3.jar
 directed to a Word document it outputs to the screen in XML format.  I
 am not able to get Solr to run tika and index the information in the
 sample Word document.

 I have looked at the following resources:
 Solr mailing list archive (although I could have missed something here);
 Documentation and Getting started on the Apache Tika website;
 I even found an article called Content Extraction with Tika at this
 website:
 http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles
 /Content-Extraction-Tika This article talks about using curl.  Is curl
 necessary to use or does Solr have something already configured to do
 the same as curl?

 I have modified the solrconfig.xml file to include the request handler
 for the ExtractingRequestHandler.  I used the modification that was
 commented out in the solrconfig.xml file.  Here it is for reference:

 requestHandler name=/update/extract
 class=org.apache.solr.handler.extraction.ExtractingRequestHandler
    lst name=defaults
      str name=ext.map.Last-Modifiedlast_modified/str
      bool name=ext.ignore.und.fltrue/bool
    /lst
  /requestHandler

 Is there some modification to this code that I need to make?

 Can some one please direct me to a source that can help me get this to
 work.


 Kevin Miller

Multi select faceting

2009-07-29 Thread Mike

Hi,

We're using Lucid Imagination's LucidWorks Solr 1.3 and we have a requirement 
to implement multiple-select faceting where the facet cells show up as 
checkboxes and despite checked options, all of the options continue to persist 
with counts. The best example I found is the search on Lucid Imagination's 
site: http://www.lucidimagination.com/search/

It appears the Solr 1.4 release has support for doing this with filter tagging 
(http://wiki.apache.org/solr/SimpleFacetParameters#head-f277d409b221b407d9c5430f552bf40ee6185c4c),
 but I was wondering if there was another way to accomplish this in 1.3?

Mike

query and analyzers

2009-07-29 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]

Hi,
What analyzer, tokenizer, filter factory would I need to use to get wildcard 
matching to match where:
Value:
XYZ123
Query:
XYZ1*

I have been messing with solr.WordDelimiterFilterFactory splitOnNumerics and 
oreserveOriginal in both the analyzer and the query.  I also noticed it is 
different when I use quotes in the query - phrase search.  Unfortunately, I'm 
missing something as I can't get it to work.

Tim

Re: query and analyzers

2009-07-29 Thread AHMET ARSLAN


 What analyzer, tokenizer, filter factory would I need to
 use to get wildcard matching to match where:
 Value:
 XYZ123
 Query:
 XYZ1*

StandardAnalyzer, WhitespaceAnalyzer.
 
 I have been messing with solr.WordDelimiterFilterFactory
 splitOnNumerics and oreserveOriginal in both the analyzer
 and the query.  I also noticed it is different when I
 use quotes in the query - phrase search. 
 Unfortunately, I'm missing something as I can't get it to
 work.

But i think your problem is not the analyzer. I guess in your analyzer there is 
lowercase filter and wildcard queries are not analyzed.
Try querying xyz1*

Re: query in solr lucene

2009-07-29 Thread Avlesh Singh

You may index your data using a delimiter, like $my-field-content$. While
searching, perform a phrase query with the leading and trailing $ appended
to the query string.

Cheers
Avlesh

On Wed, Jul 29, 2009 at 12:04 PM, Sushan Rungta s...@clickindia.com wrote:

 I tried using AND, but it even provided me doc 3 which was not required.

 Hence my problem still persists...

 regards,
 Sushan


 At 06:59 AM 7/29/2009, Avlesh Singh wrote:

 
  No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I
 read
  it.
 
 Sorry, my bad. I did not read properly before replying.

 Cheers
 Avlesh

 On Wed, Jul 29, 2009 at 3:23 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  No, phrase query would match docs 2 and 3. Sushan only wantsdoc 2 as I
 read
  it.
 
  You might have some joy with KeywordAnalyzer, which does
  not break the incoming stream up into tokens. You have to be
  careful, though, because it also won't fold the case, so 'Hello'
  would not match 'hello'.
 
  Best
  Erick
 
  On Tue, Jul 28, 2009 at 11:11 AM, Avlesh Singh avl...@gmail.com
 wrote:
 
   You should perform a PhraseQuery on the required field.
   Meaning, http://your-solr-host:port:
   /your-core-path/select?q=fieldName:Hello
   how are you sushan would work for you.
  
   Cheers
   Avlesh
  
   2009/7/28 Gérard Dupont ger.dup...@gmail.com
  
Hi Sushan,
   
I'm not an expert of Solr, just beginner, but it appears to me that
 you
 may
have default 'OR' combinaison fo keywords so that will explain this
behavior. Try to modify the configuration for an 'AND' combinaison.
   
cheers
   
On Tue, Jul 28, 2009 at 16:49, Sushan Rungta s...@clickindia.com
  wrote:
   
 I am extremely sorry for responding late as I was ill from past
 few
   days.

 My problem is explained below with an example:

 I am having three documents with following list:

 1. Hello how are you
 2. Hello how are you sushan
 3. Hello how are you sushan. I am fine.

 When I search for a query Hello how are you sushan, I should
 only
  get
 document 2 in my result.

 I hope this will give you all a better insight in my problem.

 regards,

 Sushan Rungta

   
   
   
--
Gérard Dupont
Information Processing Control and Cognition (IPCC) - EADS DS
http://weblab-project.org
   
Document  Learning team - LITIS Laboratory

Re: search suggest

2009-07-29 Thread Jason Rutherglen

Autosuggest is something that would be very useful to build into
Solr as many search projects require it.

I'd recommend indexing relevant terms/phrases into a Ternary
Search Tree which is compact and performant. Using a wildcard
query will likely not be as fast as a Ternary Tree, and I'm not
sure how phrases would be handled?

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi

It would be good to separate out the TernaryTree from
analysis/compound and into Lucene core, or into it's own contrib.

Also see http://issues.apache.org/jira/browse/LUCENE-625 which
improves relevancy using click through rates.

I'll open an issue in Solr to get this one going.

On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersenrober...@buy.com wrote:
 To do a proper search suggest feature you have to index all the queries
 your system gets and search it with wildcards for matches on what the
 user has typed so far for each user keystroke in the search box...
 Usually with some timer logic to wait for a small hesitation in their
 typing.



 -Original Message-
 From: Jack Bates [mailto:ms...@freezone.co.uk]
 Sent: Tuesday, July 28, 2009 10:54 AM
 To: solr-user@lucene.apache.org
 Subject: search suggest

 how can i use solr to make search suggestions? i'm thinking google-style
 suggestions, which suggests more refined queries - vs. freebase-style
 suggestions, which suggests top hits.

 i've been looking at the query params,
 http://wiki.apache.org/solr/StandardRequestHandler

 - and searching for solr suggest - but haven't figured out how to get
 search suggestions from solr

Re: search suggest

2009-07-29 Thread Jason Rutherglen

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/hyphenation/TernaryTree.html

On Wed, Jul 29, 2009 at 12:08 PM, Jason
Rutherglenjason.rutherg...@gmail.com wrote:
Autosuggest is something that would be very useful to build into
Solr as many search projects require it.

I'd recommend indexing relevant terms/phrases into a Ternary
Search Tree which is compact and performant. Using a wildcard
query will likely not be as fast as a Ternary Tree, and I'm not
sure how phrases would be handled?

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi

It would be good to separate out the TernaryTree from
analysis/compound and into Lucene core, or into it's own contrib.

Also see http://issues.apache.org/jira/browse/LUCENE-625 which
improves relevancy using click through rates.

I'll open an issue in Solr to get this one going.

On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersenrober...@buy.com wrote:
To do a proper search suggest feature you have to index all the queries
your system gets and search it with wildcards for matches on what the
user has typed so far for each user keystroke in the search box...
Usually with some timer logic to wait for a small hesitation in their
typing.

-Original Message-
From: Jack Bates [mailto:ms...@freezone.co.uk]
Sent: Tuesday, July 28, 2009 10:54 AM
To: solr-user@lucene.apache.org
Subject: search suggest

how can i use solr to make search suggestions? i'm thinking google-style
suggestions, which suggests more refined queries - vs. freebase-style
suggestions, which suggests top hits.

i've been looking at the query params,
http://wiki.apache.org/solr/StandardRequestHandler

- and searching for solr suggest - but haven't figured out how to get
search suggestions from solr

Visualizing Semantic Journal Space (large scale) using full-text

2009-07-29 Thread Glen Newton

I thought the Lucene and Solr communities would find this interesting:
My collaborators and I have used LuSql, Lucene and Semantic Vectors to
visualize a large scale semantic journal space (kind of like 'Maps of
Science') of a large
scale (5.7 million articles) journal article collection using only the
full-text (no metadata).

For more info  howto:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html

Glen Newton

-- 

-

RE: query and analyzers

2009-07-29 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]

This was the definition I was last working with (I've been playing with setting 
the various parameters).

fieldType name=text_ws class=solr.TextField 
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 
generateNumberParts=0 catenateWords=1 
catenateNumbers=1 catenateAll=1
splitOnCaseChange=0 splitOnNumerics=0 
preserveOriginal=1/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=0 
generateNumberParts=0 catenateWords=1 
catenateNumbers=1 catenateAll=1
splitOnCaseChange=0 splitOnNumerics=0 
preserveOriginal=1/
   /analyzer
 /fieldType

-Original Message-
From: AHMET ARSLAN [mailto:iori...@yahoo.com] 
Sent: Wednesday, July 29, 2009 11:55 AM
To: solr-user@lucene.apache.org
Subject: Re: query and analyzers


 What analyzer, tokenizer, filter factory would I need to
 use to get wildcard matching to match where:
 Value:
 XYZ123
 Query:
 XYZ1*

StandardAnalyzer, WhitespaceAnalyzer.
 
 I have been messing with solr.WordDelimiterFilterFactory
 splitOnNumerics and oreserveOriginal in both the analyzer
 and the query.  I also noticed it is different when I
 use quotes in the query - phrase search. 
 Unfortunately, I'm missing something as I can't get it to
 work.

But i think your problem is not the analyzer. I guess in your analyzer there is 
lowercase filter and wildcard queries are not analyzed.
Try querying xyz1*

RE: query and analyzers

2009-07-29 Thread AHMET ARSLAN


In order to match (query) XYZ1* to (document) XYZ123 you do not need 
WordDelimiterFilterFactory. You need an tokenizer that recognizes XYZ123 as one 
token. And WhitespaceTokenizer is one of them.

As I see from the fieldType named text_ws, you want to use 
WhitespaceTokenizerFactory
and there is no LowercaseFilter in it. So there is no problem.
Just remove the WordDelimiterFilterFactory (both query and index) and it should 
work.
 
Ahmet

RE: query and analyzers

2009-07-29 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]

That did it, thanks!

I thought that was how it should work, but I guess somehow I got out of sync or 
something at one point which led me to dive deeper into it than I needed to.

-Original Message-
From: AHMET ARSLAN [mailto:iori...@yahoo.com] 
Sent: Wednesday, July 29, 2009 12:52 PM
To: solr-user@lucene.apache.org
Subject: RE: query and analyzers

In order to match (query) XYZ1* to (document) XYZ123 you do not need 
WordDelimiterFilterFactory. You need an tokenizer that recognizes XYZ123 as one 
token. And WhitespaceTokenizer is one of them.

As I see from the fieldType named text_ws, you want to use 
WhitespaceTokenizerFactory
and there is no LowercaseFilter in it. So there is no problem.
Just remove the WordDelimiterFilterFactory (both query and index) and it should 
work.

Ahmet

Re: search suggest

2009-07-29 Thread manuel aldana

also watch out that you have a good stopwords list otherwise the 
suggestions won't be helpful for the user.


Jack Bates wrote:

how can i use solr to make search suggestions? i'm thinking google-style
suggestions, which suggests more refined queries - vs. freebase-style
suggestions, which suggests top hits.

i've been looking at the query params,
http://wiki.apache.org/solr/StandardRequestHandler

- and searching for solr suggest - but haven't figured out how to get
search suggestions from solr
  



--
manuel aldana
ald...@gmx.de
software-engineering blog: http://www.aldana-online.de

RE: search suggest

2009-07-29 Thread Robert Petersen

Simple minded autosuggest can just not tokenize the phrases at all and
so the wildcards just complete whatever the user has typed so far
including spaces.  Upon encountering a space though, autosuggest should
wait to make more suggestions until the user has typed at least a couple
of letters of the next word.  That is the way I did it last time using a
different search engine.  It'd sure be kewl if this became a core
feature of solr!

I like the idea of the tree approach, sounds much faster.  The root is
the least letters to start suggestions and the leaves are the full
phrases?

-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] 
Sent: Wednesday, July 29, 2009 12:09 PM
To: solr-user@lucene.apache.org
Subject: Re: search suggest

Autosuggest is something that would be very useful to build into
Solr as many search projects require it.

I'd recommend indexing relevant terms/phrases into a Ternary
Search Tree which is compact and performant. Using a wildcard
query will likely not be as fast as a Ternary Tree, and I'm not
sure how phrases would be handled?

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi

It would be good to separate out the TernaryTree from
analysis/compound and into Lucene core, or into it's own contrib.

Also see http://issues.apache.org/jira/browse/LUCENE-625 which
improves relevancy using click through rates.

I'll open an issue in Solr to get this one going.

On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersenrober...@buy.com
wrote:
 To do a proper search suggest feature you have to index all the
queries
 your system gets and search it with wildcards for matches on what the
 user has typed so far for each user keystroke in the search box...
 Usually with some timer logic to wait for a small hesitation in their
 typing.



 -Original Message-
 From: Jack Bates [mailto:ms...@freezone.co.uk]
 Sent: Tuesday, July 28, 2009 10:54 AM
 To: solr-user@lucene.apache.org
 Subject: search suggest

 how can i use solr to make search suggestions? i'm thinking
google-style
 suggestions, which suggests more refined queries - vs. freebase-style
 suggestions, which suggests top hits.

 i've been looking at the query params,
 http://wiki.apache.org/solr/StandardRequestHandler

 - and searching for solr suggest - but haven't figured out how to
get
 search suggestions from solr

Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread ashokc

Sure.

The java command I use with TIKA to extract text from a URL is:

java -jar tika-0.3-standalone.jar -t $url

I have also attached the screenshots of the web page, post documents
produced in the two different ways (Perl Tika) for that web page, and the
screenshots of the search result for a string contained in that web page.
The index in each case contains just this one URL. To keep everything else
identical, I used the same instance for creating the index in each case.
First I posted the Tika document, checked for the results, emptied the
index, posted the Perl document, and checked the results.

Debug query for Tika:

str name=parsedquery
+DisjunctionMaxQuery((urltext:é«éå¬å¸å±ç°äºæµ·éçä¼è´¨å¤åªä½åå®¹è½^2.0
| title:é«éå¬å¸å±ç°äºæµ·éçä¼è´¨å¤åªä½åå®¹è½^2.0 |
content_china:é«é éå¬ å¬å¸ å¸å± å±ç° ç°äº äºæµ· æµ·é
éç çä¼ ä¼è´¨ è´¨å¤ å¤åª åªä½ ä½å åå®¹ å®¹è½)~0.01) ()
/str

Debug query for Perl:

The screenshots
http://www.nabble.com/file/p24728917/Tika%2BIssue.docx Tika+Issue.docx

Perl extracted doc
http://www.nabble.com/file/p24728917/china.perl.xml china.perl.xml

Tika extracted doc
http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml

Grant Ingersoll-6 wrote:

Hmm, looks very much like an encoding problem. Can you post a sample
showing it, along with the commands you invoked?

Thanks,
Grant

On Jul 28, 2009, at 6:14 PM, ashokc wrote:

I am finding that the search results based on indexing Tika
extracted text
are very different from results based on indexing the text extracted
via
other means. This shows up for example with a chinese web site that
I am
trying to index.

I created the documents (for posting to SOLR) in two ways. The
source text
of the web pages are full of html entities like #12345; and some
english
characters mixed in.

(a) Simple text extraction from the page source by a Perl script. The
resulting content field looks like

field name=content_chinaWho We Are
#20844;#21496;#21382;#21490;
#24744;#30340;#25104;#21151;#26696;#20363;
#39046;#23548;#22242;#38431; #19994;#21153;#37096;#38376;
Innovation
#21019; etc... /field

I posted these documents to a SOLR instance

(b) Used Tika (command line). The resulting content field looks like

field name=content_chinaWho We Are Ã¥ Â¬Ã¥ÂÂ¸Ã
¥ÂŽÂ†Ã¥ÂÂ²
Ã¦Â‚Â¨Ã§ÂšÂ„Ã¦ÂˆÂÃ¥ÂŠÂŸÃ¦Â¡
ÂˆÃ¤Â¾Â‹ Ã©Â¢Â†Ã¥Â¯Â¼Ã¥Â›Â¢Ã©Â˜ÂŸ
Ã¤Â¸ÂšÃ¥ÂŠÂ¡Ã©ÂƒÂ¨Ã©Â—Â¨ Ã‚ Innovation Ã
¥Â
etc... /field

I posted these documents to a different instance

When I search the first instance for a string (that I copied
pasted from
the web site) I find a number of hits, including the page from which I
copied the string from. But when I do the same on the instance with
Tika
extracted text - I get nothing.

Has anyone seen this? I believe it may have to do with encoding. In
both
cases the posted documents were utf-8 compiant.

Thanks for your insights.

- ashok

--
View this message in context:
http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search

--
View this message in context:
http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: search suggest

2009-07-29 Thread Jason Rutherglen

Here's a good article on Ternary Trees: http://www.ddj.com/windows/184410528

I looked at the one in Lucene, I don't understand why the find method
only returns a char/int?

On Wed, Jul 29, 2009 at 2:33 PM, Robert Petersenrober...@buy.com wrote:
 Simple minded autosuggest can just not tokenize the phrases at all and
 so the wildcards just complete whatever the user has typed so far
 including spaces.  Upon encountering a space though, autosuggest should
 wait to make more suggestions until the user has typed at least a couple
 of letters of the next word.  That is the way I did it last time using a
 different search engine.  It'd sure be kewl if this became a core
 feature of solr!

 I like the idea of the tree approach, sounds much faster.  The root is
 the least letters to start suggestions and the leaves are the full
 phrases?

 -Original Message-
 From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
 Sent: Wednesday, July 29, 2009 12:09 PM
 To: solr-user@lucene.apache.org
 Subject: Re: search suggest

 Autosuggest is something that would be very useful to build into
 Solr as many search projects require it.

 I'd recommend indexing relevant terms/phrases into a Ternary
 Search Tree which is compact and performant. Using a wildcard
 query will likely not be as fast as a Ternary Tree, and I'm not
 sure how phrases would be handled?

 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysi

 It would be good to separate out the TernaryTree from
 analysis/compound and into Lucene core, or into it's own contrib.

 Also see http://issues.apache.org/jira/browse/LUCENE-625 which
 improves relevancy using click through rates.

 I'll open an issue in Solr to get this one going.

 On Wed, Jul 29, 2009 at 9:12 AM, Robert Petersenrober...@buy.com
 wrote:
 To do a proper search suggest feature you have to index all the
 queries
 your system gets and search it with wildcards for matches on what the
 user has typed so far for each user keystroke in the search box...
 Usually with some timer logic to wait for a small hesitation in their
 typing.



 -Original Message-
 From: Jack Bates [mailto:ms...@freezone.co.uk]
 Sent: Tuesday, July 28, 2009 10:54 AM
 To: solr-user@lucene.apache.org
 Subject: search suggest

 how can i use solr to make search suggestions? i'm thinking
 google-style
 suggestions, which suggests more refined queries - vs. freebase-style
 suggestions, which suggests top hits.

 i've been looking at the query params,
 http://wiki.apache.org/solr/StandardRequestHandler

 - and searching for solr suggest - but haven't figured out how to
 get
 search suggestions from solr

Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread Robert Muir

it appears there is an encoding problem, in the screenshot I can see
the title is mangled, and if i open up the URL in IE or firefox, both
browsers think it is iso-8859-1.

I think this is why (from w3c validator):

Character Encoding mismatch!

The character encoding specified in the HTTP header (iso-8859-1) is
different from the value in the meta element (utf-8). I will use the
value from the HTTP header (iso-8859-1) for this validation.

On Wed, Jul 29, 2009 at 6:02 PM, ashokcash...@qualcomm.com wrote:

 Sure.

 The java command I use with TIKA to extract text from a URL is:

 java -jar tika-0.3-standalone.jar -t $url

 I have also attached the screenshots of the web page, post documents
 produced in the two different ways (Perl  Tika) for that web page, and the
 screenshots of the search result for a string contained in that web page.
 The index in each case contains just this one URL. To keep everything else
 identical, I used the same instance for creating the index in each case.
 First I posted the Tika document, checked for the results, emptied the
 index, posted the Perl document, and checked the results.

 Debug query for Tika:

 str name=parsedquery
 +DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ 
 çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0
 | title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 |
 content_china:é«˜é€š é€šå…¬ å…¬å ¸ å ¸å±• å±•çŽ° çŽ°äº† äº†æµ· æµ·é‡
 é‡ çš„ çš„ä¼˜ ä¼˜è´¨ è´¨å¤š å¤šåª’ åª’ä½“ ä½“å†… å†…å®¹ å®¹èƒ½)~0.01) ()
 /str

 Debug query for Perl:

 str name=parsedquery
 +DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ 
 çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0
 | title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 |
 content_china:é«˜é€š é€šå…¬ å…¬å ¸ å ¸å±• å±•çŽ° çŽ°äº† äº†æµ· æµ·é‡
 é‡ çš„ çš„ä¼˜ ä¼˜è´¨ è´¨å¤š å¤šåª’ åª’ä½“ ä½“å†… å†…å®¹ å®¹èƒ½)~0.01) ()
 /str

 The screenshots
 http://www.nabble.com/file/p24728917/Tika%2BIssue.docx Tika+Issue.docx

 Perl extracted doc
 http://www.nabble.com/file/p24728917/china.perl.xml china.perl.xml

 Tika extracted doc
 http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml


 Grant Ingersoll-6 wrote:

 Hmm, looks very much like an encoding problem.  Can you post a sample
 showing it, along with the commands you invoked?

 Thanks,
 Grant

 On Jul 28, 2009, at 6:14 PM, ashokc wrote:


 I am finding that the search results based on indexing Tika
 extracted text
 are very different from results based on indexing the text extracted
 via
 other means. This shows up for example with a chinese web site that
 I am
 trying to index.

 I created the documents (for posting to SOLR) in two ways. The
 source text
 of the web pages are full of html entities like #12345; and some
 english
 characters mixed in.

 (a) Simple text extraction from the page source by a Perl script. The
 resulting content field looks like

 field name=content_chinaWho We Are
 #20844;#21496;#21382;#21490;
 #24744;#30340;#25104;#21151;#26696;#20363;
 #39046;#23548;#22242;#38431; #19994;#21153;#37096;#38376;
 Innovation
 #21019; etc...     /field

 I posted these documents to a SOLR instance

 (b) Used Tika (command line). The resulting content field looks like

 field name=content_chinaWho We Are Ã¥ Â¬Ã¥Â Â¸Ã
 ¥ÂŽÂ†Ã¥Â Â²
 Ã¦Â‚Â¨Ã§ÂšÂ„Ã¦ÂˆÂ Ã¥ÂŠÂŸÃ¦Â¡
 ÂˆÃ¤Â¾Â‹ Ã©Â¢Â†Ã¥Â¯Â¼Ã¥Â›Â¢Ã©Â˜ÂŸ
 Ã¤Â¸ÂšÃ¥ÂŠÂ¡Ã©ÂƒÂ¨Ã©Â—Â¨ Ã‚ Innovation Ã
 ¥Â
 etc... /field

 I posted these documents to a different instance

 When I search the first instance for a string (that I copied 
 pasted from
 the web site) I find a number of hits, including the page from which I
 copied the string from. But when I do the same on the instance with
 Tika
 extracted text - I get nothing.

 Has anyone seen this? I believe it may have to do with encoding. In
 both
 cases the posted documents were utf-8 compiant.

 Thanks for your insights.

 - ashok

 --
 View this message in context:
 http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search




 --
 View this message in context: 
 http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Robert Muir
rcm...@gmail.com

Re: Indexing TIKA extracted text. Are there some issues?

2009-07-29 Thread ashokc

Could very well be... I will rectify it and try again. Thanks

- ashok

Robert Muir wrote:

it appears there is an encoding problem, in the screenshot I can see
the title is mangled, and if i open up the URL in IE or firefox, both
browsers think it is iso-8859-1.

I think this is why (from w3c validator):

Character Encoding mismatch!

The character encoding specified in the HTTP header (iso-8859-1) is
different from the value in the meta element (utf-8). I will use the
value from the HTTP header (iso-8859-1) for this validation.

On Wed, Jul 29, 2009 at 6:02 PM, ashokcash...@qualcomm.com wrote:

Sure.

The java command I use with TIKA to extract text from a URL is:

java -jar tika-0.3-standalone.jar -t $url

I have also attached the screenshots of the web page, post documents
produced in the two different ways (Perl Tika) for that web page, and
the
screenshots of the search result for a string contained in that web page.
The index in each case contains just this one URL. To keep everything
else
identical, I used the same instance for creating the index in each case.
First I posted the Tika document, checked for the results, emptied the
index, posted the Perl document, and checked the results.

Debug query for Tika:

str name=parsedquery
+DisjunctionMaxQuery((urltext:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡
çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0
| title:é«˜é€šå…¬å ¸å±•çŽ°äº†æµ·é‡ çš„ä¼˜è´¨å¤šåª’ä½“å†…å®¹èƒ½^2.0 |
content_china:é«˜é€š é€šå…¬ å…¬å ¸ å ¸å±• å±•çŽ° çŽ°äº† äº†æµ· æµ·é‡
é‡ çš„ çš„ä¼˜ ä¼˜è´¨ è´¨å¤š å¤šåª’ åª’ä½“ ä½“å†… å†…å®¹ å®¹èƒ½)~0.01) ()
/str

Debug query for Perl:

The screenshots
http://www.nabble.com/file/p24728917/Tika%2BIssue.docx Tika+Issue.docx

Perl extracted doc
http://www.nabble.com/file/p24728917/china.perl.xml china.perl.xml

Tika extracted doc
http://www.nabble.com/file/p24728917/china.tika.xml china.tika.xml

Grant Ingersoll-6 wrote:

Hmm, looks very much like an encoding problem. Can you post a sample
showing it, along with the commands you invoked?

Thanks,
Grant

On Jul 28, 2009, at 6:14 PM, ashokc wrote:

I created the documents (for posting to SOLR) in two ways. The
source text
of the web pages are full of html entities like #12345; and some
english
characters mixed in.

(a) Simple text extraction from the page source by a Perl script. The
resulting content field looks like

field name=content_chinaWho We Are
#20844;#21496;#21382;#21490;
#24744;#30340;#25104;#21151;#26696;#20363;
#39046;#23548;#22242;#38431; #19994;#21153;#37096;#38376;
Innovation
#21019; etc... /field

I posted these documents to a SOLR instance

(b) Used Tika (command line). The resulting content field looks like

field name=content_chinaWho We Are Ã¥ Â¬Ã¥Â Â¸Ã
¥ÂŽÂ†Ã¥Â Â²
Ã¦Â‚Â¨Ã§ÂšÂ„Ã¦ÂˆÂ Ã¥ÂŠÂŸÃ¦Â¡
ÂˆÃ¤Â¾Â‹ Ã©Â¢Â†Ã¥Â¯Â¼Ã¥Â›Â¢Ã©Â˜ÂŸ
Ã¤Â¸ÂšÃ¥ÂŠÂ¡Ã©ÂƒÂ¨Ã©Â—Â¨ Ã‚ Innovation Ã
¥Â
etc... /field

I posted these documents to a different instance

Has anyone seen this? I believe it may have to do with encoding. In
both
cases the posted documents were utf-8 compiant.

Thanks for your insights.

- ashok

--
View this message in context:
http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24708854.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search

--
View this message in context:
http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24728917.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
Robert Muir
rcm...@gmail.com

--
View this message in context:
http://www.nabble.com/Indexing-TIKA-extracted-text.-Are-there-some-issues--tp24708854p24729595.html
Sent from the Solr - User mailing list archive at Nabble.com.

deleteById always returning OK

2009-07-29 Thread Reuben Firmin

Is it expected behaviour that deleteById will always return OK as a
status, regardless of whether the id was matched?

I have a unit test:

  // set up the test data
engine.index(12345, s1, d1);
engine.index(54321, s2, d2);
engine.index(23453, s3, d3);

// ...

@Test
public void testRemove() throws Exception {
assertEquals(engine.size(), 3);
assertTrue(engine.remove(12345));
assertEquals(engine.size(), 2);
// XXX, it returns true
assertFalse(engine.remove(23523352));

Engine is my wrapper around Solr. The remove method looks like this:

private static final int RESPONSE_STATUS_OK = 0;
private SolrServer server;

public boolean remove(final Integer titleInstanceId)
throws IOException
{
try {
server.deleteById(String.valueOf(titleInstanceId));
final UpdateResponse updateResponse = server.commit(true, true);
// XXX It's always OK
return (updateResponse.getStatus() == RESPONSE_STATUS_OK);

Any ideas what's going wrong? Is there a different way to test for the id
not having been there, other than an additional search?

Thanks
Reuben

Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle

2009-07-29 Thread Bradford Stephens

Don't forget this is tonight! Excited to see everyone there.

On Tue, Jul 28, 2009 at 11:25 AM, Bradford
Stephensbradfordsteph...@gmail.com wrote:
 Hey everyone,

 SLIGHT change of plans.

 A few people have asked me to move to a place with Air Conditioning,
 since the temperature's in the 90's this week. So, here we go:

 Big Time Brewing Company
 4133 University Way NE
 Seattle, WA 98105

 Call me at 904-415-3009 if you have any questions.


 On Mon, Jul 27, 2009 at 12:16 PM, Bradford
 Stephensbradfordsteph...@gmail.com wrote:
 Hello again!

 Yes, I know some of us are still recovering from OSCON. It's time for
 another delicious meetup to chat about Hadoop, HBase, Solr, Lucene,
 and more!

 UW is quite a pain for us to access until August, so we're changing
 the venue to one pretty close:

 Piccolo's Pizza
 5301 Roosevelt Way NE
 (between 53rd St  55th St)

 6:45pm - 8:30 (or when we get bored)!

 As usual, people are more than welcome to give talks, whether they're
 long-format or lightning. I'd also really like to start thinking about
 hackathons, perhaps we could have one next month?

 I'll be talking about HBase .20 and the possibility of low-latency
 HBase Analytics. I'd be very excited to hear what people are up to!

 Contact me if there's any questions: 904-415-3009

 Cheers,
 Bradford

 --
 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science




 --
 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science




-- 
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: Wildcard and boosting

2009-07-29 Thread Jón Helgi Jónsson

I just updated to nightly build (I was using 1.2) and this does not
seem to be an issue anymore.

2009/7/29 Jón Helgi Jónsson jonjons...@gmail.com:
 Hey now!

 I do index time boosting for my fields and just discovered that when
 searching with a trailing wild card the boosting is ignored.

 Will my boosting work with a wild card if I do it at query time? And
 if so is there a lot of performance difference?

 Some other method I can use to preserve my boosting? I do not need
 hightlighting.

 Thanks,
 Jon Helgi

Re: deleteById always returning OK

2009-07-29 Thread Koji Sekiguchi


Reuben Firmin wrote:

Is it expected behaviour that deleteById will always return OK as a
status, regardless of whether the id was matched?

  

It is expected behaviour as Solr always returns 0 unless an error occurs
during processing a request (query, update, ...), so you don't need to check
the status, but you'll get an exception if something wrong; otherwise
the request succeeded.

And you cannot know whether the id was matched. The only way
you can try is send a query q=id:valuerows=0 and check the numFound
in the response before sending deleteById.

Koji


I have a unit test:

  // set up the test data
engine.index(12345, s1, d1);
engine.index(54321, s2, d2);
engine.index(23453, s3, d3);

// ...

@Test
public void testRemove() throws Exception {
assertEquals(engine.size(), 3);
assertTrue(engine.remove(12345));
assertEquals(engine.size(), 2);
// XXX, it returns true
assertFalse(engine.remove(23523352));

Engine is my wrapper around Solr. The remove method looks like this:

private static final int RESPONSE_STATUS_OK = 0;
private SolrServer server;

public boolean remove(final Integer titleInstanceId)
throws IOException
{
try {
server.deleteById(String.valueOf(titleInstanceId));
final UpdateResponse updateResponse = server.commit(true, true);
// XXX It's always OK
return (updateResponse.getStatus() == RESPONSE_STATUS_OK);

Any ideas what's going wrong? Is there a different way to test for the id
not having been there, other than an additional search?

Thanks
Reuben

RE: Boosting ('bq') on multi-valued fields

2009-07-29 Thread KaktuChakarabati

Hey Ken,
Thanks for your reply.
When I wrote '5|6' I ment that this is a multiValued field with two values
'5' and '6', rather than the literal string '5|6' (and any Tokenizer). Does
your reply still holds? That is, are multiValued fields dependent on the
notion of tokenization to such a degree so that I cant use str type with
them meaningfully? if so, it seems weird to me that I should be able to
define a str multiValued field to begin with..

-Chak

Ensdorf Ken wrote:

Hey,
I have a field defined as such:

field name=site_idtype=string indexed=true
stored=false
multiValued=true /

with the string type defined as:

fieldtype name=string class=solr.StrField sortMissingLast=true
omitNorms=true/

When I try using some query-time boost parameters using the bq on
values of
this field it seems to behave
strangely in case of documents actually having multiple values:
If i'd do a boost for a particular value ( site_id:5^1.1 ) it seems
like
all the cases where this field is actually
populated with multiple ones ( i.e a document with field value 5|6 )
do
not get boosted at all. I verified this using
debugQuery explainOther=doc_id:document_with_multiple_values.
is this a known issue/bug? any work arounds? (i'm using a nightly solr
build
from a few months back.. )

There is no tokenization on 'string' fields, so a query for 5 does not
match a doc with a value of 5|6 for this field. You could try using
field type 'text' for this and see what you get. You may need to
customize it to you the StandardAnalyzer or WordDelimiterFilterFactory to
get the right behavior. Using the analysis tool in the solr admin UI to
experiment will probably be helpful.

-Ken

--
View this message in context:
http://www.nabble.com/Boosting-%28%27bq%27%29-on-multi-valued-fields-tp24713905p24730981.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is there a multi-shard optimize message?

2009-07-29 Thread Chris Hostetter


:  Normally to optimize an index you POST optimize/ to /solr/update.  Is
:  there any way to POST an optimize message to one instance and have it
:  propagate to all shards sort of like the select?
: 
:  /solr-shard-1/select?q=dog... shards=shard-1,shard2

: No, you'll need to send optimize to each host separately.

and for the record: it would be relatively straight forward to impliment 
something like this (just like distributed search) ... but it has very 
little value.  clients doing indexing operations have to send add/delete 
commands directly to the individual shards, so they have to send teh 
commit/optimize commands directly to them as well.

if/when someone writes a distributed indexing handler, making it support 
distributed optimize/commit will be fairly trivial.





-Hoss

Re: update some index documents after indexing process is done with DIH

2009-07-29 Thread Noble Paul നോബിള്‍ नोब्ळ्

If you make your EventListener implements SolrCoreAware you can get
hold of the core on inform. use that to get hold of the
SolrIndexWriter

On Wed, Jul 29, 2009 at 9:20 PM, Marc Sturlesemarc.sturl...@gmail.com wrote:

 From the newSearcher(..) of a CustomEventListener which extends of
 AbstractSolrEventListener  can access to SolrIndexSearcher and all core
 properties but can't get a SolrIndexWriter. Do you now how can I get from
 there a SolrIndexWriter? This way I would be able to modify the documents (I
 need to modify them depending on values of other documents, that's why I
 can't do it with DIH delta-import).
 Thanks in advance


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 On Tue, Jul 28, 2009 at 5:17 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 That really sounds the best way to reach my goal. How could I invoque a
 listener from the newSearcher?Would be something like:
    listener event=newSearcher class=solr.QuerySenderListener
      arr name=queries
        lst str name=qsolr/str str name=start0/str str
 name=rows10/str /lst
        lst str name=qrocks/str str name=start0/str str
 name=rows10/str /lst
        lststr name=qstatic newSearcher warming query from
 solrconfig.xml/str/lst
      /arr
    /listener
    listener event=newSearcher class=solr.MyCustomListener

 And MyCustomListener would be the class who open the reader:

        RefCountedSolrIndexSearcher searchHolder = null;
        try {
          searchHolder = dataImporter.getCore().getSearcher();
          IndexReader reader = searchHolder.get().getReader();

          //Here I iterate over the reader doing docuemnt modifications

        } finally {
           if (searchHolder != null) searchHolder.decref();
        }
        } catch (Exception ex) {
            LOG.info(error);
        }

 you may not be able to access the DIH API from a newSearcher event .
 But the API would give you the searcher directly as a method
 parameter.

 Finally, to access to documents and add fields to some of them, I have
 thought in using SolrDocument classes. Can you please point me where
 something similar is done in solr source (I mean creation of
 SolrDocuemnts
 and conversion of them to proper lucene docuements).

 Does this way for reaching the goal makes sense?

 Thanks in advance



 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 when a core is reloaded the event fired is firstSearcher. newSearcher
 is fired when a commit happens


 On Tue, Jul 28, 2009 at 4:19 PM, Marc Sturlesemarc.sturl...@gmail.com
 wrote:

 Ok, but if I handle it in a newSearcher listener it will be executed
 every
 time I reload a core, isn't it? The thing is that I want to use an
 IndexReader to load in a HashMap some doc fields of the index and
 depending
 of the values of some field docs modify other docs. Its very memory
 consuming (I have tested it with a simple lucene script). Thats why I
 wanted
 to do it just after the indexing process.

 My ideal case would be to do it in the commit function of
 DirectUpdatehandler2.java just before
 writer.optimize(cmd.maxOptimizeSegments); is executed. But I don't want
 to
 mess that code... so trying to find out the best way to do that as a
 plugin
 instead of a hack as possible.

 Thanks in advance


 Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:

 It is best handled as a 'newSearcher' listener in solrconfig.xml.
 onImportEnd is invoked before committing

 On Tue, Jul 28, 2009 at 3:13 PM, Marc
 Sturlesemarc.sturl...@gmail.com
 wrote:

 Hey there,
 I would like to be able to do something like: After the indexing
 process
 is
 done with DIH I would like to open an indexreader, iterate over all
 docs,
 modify some of them depending on others and delete some others. I can
 easy
 do this directly coding with lucene but would like to know if there's
 a
 way
 to do it with Solr using SolrDocument or SolrInputDocument classes.
 I have thougth in using SolrJ or DIH listener onImportEnd but not
 sure
 if
 I
 can get an IndexReader in there.
 Any advice?
 Thanks in advance
 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24695947.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24696872.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com



 --
 View this message in context:
 http://www.nabble.com/update-some-index-documents-after-indexing-process-is-done-with-DIH-tp24695947p24697751.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 -
 Noble Paul | Principal Engineer| AOL |

48 matches

Mail list logo