wildcard search and hierarchical faceting

2010-01-23 Thread Andy
I'd like to provide a hierarchical faceting functionality.

An example would be location drill down such as USA -> New York -> New York 
City -> SoHo

The number of levels can be arbitrary. One way to handle this could be to use a 
special character as separator, store values such as "USA|New York|New York 
City|SoHo" and use wildcard search. So if "USA" has been selected, the fq would 
be USA*

I read somewhere that when using wildcard search, no stemming or tokenization 
will be performed. So "USA" will not match 'usa". Is there any way to work 
around that?

Or would you recommend a different way to handle hierarchical faceting?


  


Re: Index gets deleted after commit?

2010-01-23 Thread Amit Nithian
Are you using the DIH? If so, did you try setting clean=false in the URL
line? That prevents wiping out the index on load.

On Jan 23, 2010 4:06 PM, "Bogdan Vatkov"  wrote:

After mass upload of docs in Solr I get some "REMOVING ALL DOCUMENTS FROM
INDEX" without any explanation.

I was running indexing w/ Solr for several weeks now and everything was ok -
I indexed 22K+ docs using the SimplePostTool
I was first launching

*:*


then some 22K+ ...
with a finishing


But you can see from the log - right after the last commit I get this
strange REMOVING ALL...
I do not remember what I changed last but now I have this issue that after
the mass upload of docs the index gets completely deleted.

why is this happening?


log after the last commit:

INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/index,segFN=segments_fr,version=1260734716752,generation=567,filenames=[segments_fr]
commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/index,segFN=segments_fs,version=1260734716753,generation=568,filenames=[_gv.nrm,
segments_fs, _gv.fdx, _gw.nrm, _gv.tii, _gv.prx, _gv.tvf, _gv.tis, _gv.tvd,
_gv.fdt, _gw.fnm, _gw.tis, _gw.frq, _gv.fnm, _gw.prx, _gv.tvx, _gw.tii,
_gv.frq]
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1260734716753
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening searc...@de26e52 main
Jan 24, 2010 1:48:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@de26e52 main
Jan 24, 2010 1:48:24 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@de26e52 main
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@4e8deb8a main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,c

Solr Cache Viewing/Browsing

2010-01-23 Thread Amit Nithian
Hi All,

I am using the SolrCache to store some external data in my search app (to be
used in a modified DisMaxHandler) and I was wondering if there is a way to
get at this data from the JSP pages? I then thought that it might be nice to
view more information about the respective caches like the current elements,
recently evicted etc to help debug performance issues. Has anyone worked on
this or have any ideas surrounding this?

Thanks
Amit


Re: understanding termVector output

2010-01-23 Thread Koji Sekiguchi

Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote:

Hi,
I'm trying to see if I can use termVectors for a use case I have.  Essentially I want to 
know is: where in the indexed value does the query hit occur?  I think either tv.positions 
or tv.offsets would provide that info but I don't really grok the result.  Below I've pasted 
the URL and part of the result.  What is http://localhost:8080/solr/select?q=idxPartition:CONNECTED_ASSETS%20AND%20srcSpan:CR1434&rows=1&indent=on&qt=tvrh&tv.offsets=true&fl=srcSpan



|CR1434-Occ1|abcCR1434 is a token for searching with 
WILDCI|testuser|System of 
Registries|2010-01-12T23:00:00.000Z|2010-01-12T23:00:00.000Z|testuser|System of Registries




f57488c1d041a1de5bd6a70b09428d119ed1de29



104
106
107
109
129
131
132
134



  
"#1;00" is a token that was produced by your  from srcSpan 
field value
when you indexed the field. And it seems the token occurred four times 
in the field.
If "#1;00" is unexpected token, you should check your type="index"/>

definition for srcSpan field.

Koji

--
http://www.rondhuit.com/en/



Index gets deleted after commit?

2010-01-23 Thread Bogdan Vatkov
After mass upload of docs in Solr I get some "REMOVING ALL DOCUMENTS FROM
INDEX" without any explanation.

I was running indexing w/ Solr for several weeks now and everything was ok -
I indexed 22K+ docs using the SimplePostTool
I was first launching

*:*


then some 22K+ ...
with a finishing


But you can see from the log - right after the last commit I get this
strange REMOVING ALL...
I do not remember what I changed last but now I have this issue that after
the mass upload of docs the index gets completely deleted.

why is this happening?


log after the last commit:

INFO: start
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/index,segFN=segments_fr,version=1260734716752,generation=567,filenames=[segments_fr]
commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/index,segFN=segments_fs,version=1260734716753,generation=568,filenames=[_gv.nrm,
segments_fs, _gv.fdx, _gw.nrm, _gv.tii, _gv.prx, _gv.tvf, _gv.tis, _gv.tvd,
_gv.fdt, _gw.fnm, _gw.tis, _gw.frq, _gv.fnm, _gw.prx, _gv.tvx, _gw.tii,
_gv.frq]
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1260734716753
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening searc...@de26e52 main
Jan 24, 2010 1:48:24 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to searc...@de26e52 main
Jan 24, 2010 1:48:24 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher searc...@de26e52 main
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing searc...@4e8deb8a main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cu

Re: Find newly added documents

2010-01-23 Thread Simon Rosenthal
"newly added" is a bit vague.  Do you mean "since last Sunday" ? "between
the last  and the one before that" ? Also, do you need to
distinguish between updated and newly added documents ?

Perhaps you could be more specific about the use case.

-Simon

On Fri, Jan 22, 2010 at 4:25 AM, Erik Hatcher wrote:

> You can do a search, sort by the special _docid_ "field" (underscores
> mandatory) descending and the top documents listed will be the latest added.
>
> Like this, un-url-encoded:   q=*:*&sort=_docid_ desc
>
>Erik
>
>
>
> On Jan 22, 2010, at 3:39 AM, Sandeep Tagore wrote:
>
>
>> Thanks a lot Erik. Is there any other alternate way?
>> Thanks a lot for your response.
>>
>> Regards,
>> Sandeep
>>
>>
>> You'll be able to find them only after a commit.
>>
>> One way to do this is index a timestamp with every document, and find
>> the latest ones using that field.  There's an example of an automatic
>> timestamp field in the example schema.
>>
>>Erik
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Find-newly-added-documents-tp27254813p27270104.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>


Re: Dedupe of document results at query-time

2010-01-23 Thread Martijn v Groningen
This manner of detecting duplicates at query time does really match
with what field collapsing does. So I suggest you look into that. As
far as I know there isn't any function query that does something you
have described in your example.

Cheers,

Martijn

On 23 January 2010 12:31, Peter S  wrote:
>
> Hi,
>
>
>
> I wonder if someone might be able to shed some insight into this problem:
>
>
>
> Is it possible and/or what is the best/accepted way to achieve deduplication 
> of documents by field at query-time?
>
>
>
> For example:
>
> Let's say an index contains:
>
>
>
> Doc1
>
> 
>
> host:Host1
>
> time:1 Sept 09
>
> appname:activePDF
>
>
>
> Doc2
>
> 
>
> host:Host1
>
> time:2 Sept 09
>
> appname:activePDF
>
>
>
> Doc3
>
> 
>
> host:Host1
>
> time:3 Sept 09
>
> appname:activePDF
>
>
>
> Can a query be constructed that would return only 1 of these Documents based 
> on appname (doesn't really matter which one)?
>
>
>
> i.e.:
>
>   match on host:Host1
>
>   ignore time
>
>   dedupe on appname:activePDF
>
>
>
> Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually 
> talking about field collapsing?
>
>
>
> Many thanks,
>
> Peter
>
>
>
> _
> Got a cool Hotmail story? Tell us now
> http://clk.atdmt.com/UKM/go/195013117/direct/01/



-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Solr vs. Compass

2010-01-23 Thread Uri Boness

waw...

well, transactional or "transactional", whether it's a nice feature to 
have or just a "selling point". Bottom line, For some applications 
Compass can be very appealing, for other Solr will be the choice. In the 
last several years I've integrated both in different applications and 
gained from both. Do you own math based on your needs, requirements and 
personal preferences. But if someone asks a questions, then it's always 
nice to get several opinions from different experiences.


peace,
Uri

Fuad Efendi wrote:

Of course, I understand what "transaction" means; have you guys been thinking some about 
what may happen if we transfer $123.45 from one banking account to another banking account, and 
MySQL forgets to index "decimal" during transaction, or DBA was weird and forgot to 
create an index? Absolutely nothing.

Why to embed "indexing" as a transaction dependency? Extremely weird idea. But 
I understand some selling points...


SOLR: it is faster than Lucene. Filtered queries run faster than traditional 
"AND" queries! And this is real selling point.



Thanks,

Fuad Efendi
+1 416-993-2060
http://www.linkedin.com/in/liferay

Tokenizer Inc.
http://www.tokenizer.ca/
Data Mining, Vertical Search


  

-Original Message-
From: Fuad Efendi [mailto:f...@efendi.ca]
Sent: January-22-10 11:23 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr vs. Compass

Yes, "transactional", I tried it: do we really need "transactional"?
Even if "commit" takes 20 minutes?
It's their "selling point" nothing more.
HBase is not transactional, and it has specific use case; each tool has
specific use case... in some cases Compass is the best!

Also, note that Compass (Hibernate) ((RDBMS)) use specific "business
domain model" terms with relationships; huge overhead to convert
"relational" into "object-oriented" (why for? Any advantages?)... Lucene
does it behind-the-scenes: you don't have to worry that field "USA" (3
characters) is repeated in few millions documents, and field "Canada" (6
characters) in another few; no any "relational", it's done automatically
without any Compass/Hibernate/Table(s)


Don't think "relational".

I wrote this 2 years ago:
http://www.theserverside.com/news/thread.tss?thread_id=50711#272351


Fuad Efendi
+1 416-993-2060
http://www.tokenizer.ca/




-Original Message-
From: Uri Boness [mailto:ubon...@gmail.com]
Sent: January-21-10 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr vs. Compass

In addition, the biggest appealing feature in Compass is that it's
transactional and therefore integrates well with your infrastructure
(Spring/EJB, Hibernate, JPA, etc...). This obviously is nice for some
systems (not very large scale ones) and the programming model is
  

clean.


On the other hand, Solr scales much better and provides a load of
functionality that otherwise you'll have to custom build on top of
Compass/Lucene.

Lukáš Vlček wrote:
  

Hi,

I think that these products do not compete directly that much, each


fit
  

different business case. Can you tell us more about our specific


situation?
  

What do you need to search and where your data is? (DB, Filesystem,


Web
  

...?)

Solr provides some specific extensions which are not supported


directly by
  

Lucene (faceted search, DisMax... etc) so if you need these then


your


bet on
  

Compass might not be perfect. On the other hand if you need to index
persistent Java objects then Compass fits perfectly into this


scenario


(and
  

if you are using Spring and JPA then setting up search can be matter


of
  

several modifications to configuration and annotations).

Compass is more Hibernate search competitor (but Compass is not


limited to
  

Hibernate only and is not even limited to DB content as well).

Regards,
Lukas


On Thu, Jan 21, 2010 at 4:40 PM, Ken Lane (kenlane)


wrote:
  


We are knee-deep in a Solr project to provide a web services layer
between our Oracle DB's and a web front end to be named later  to
supplement our numerous Business Intelligence dashboards. Someone
  

from a
  

peer group questioned why we selected Solr rather than Compass to
  

start
  

development. The real reason is that we had not heard of Compass
  

until
  

that comment. Now I need to come up with a better answer.



Does anyone out there have experience in both approaches who might
  

be


able to give a quick compare and contrast?



Thanks in advance,

Ken



  






  


Re: Solr under tomcat - UTF-8 issue

2010-01-23 Thread Sven Maurmann

Hi,

I did not read the original mail, but for the UTF-8 issue with Tomcat
you might consult the url http://wiki.apache.org/solr/SolrTomcat

The relevant piece of information is under "URI Charset Config":

*** quote ***
Edit Tomcat's conf/server.xml and add the following attribute to the correct
Connector element: URIEncoding="UTF-8".



  
...
  


*** end quote ***

Sven


--On Freitag, 22. Januar 2010 23:41 +0100 Frank Wesemann 
 wrote:



Glock, Thomas schrieb:


My flex client httpservice by default only sets the content-type request
header to  "application/x-www-form-urlencoded"  what it needed to do for
tomcat is set the content-type request header to content-type =
"application/x-www-form-urlencoded; charset=UTF-8";




As some browsers do not send this particular content-type correctly ( at
least Firefox and Safari skip the "charset=utf-8" part),
I added a servlet.Filter :

public class RequestCharset2utf8Filter implements javax.servlet.Filter {
...
public void doFilter(ServletRequest req, ServletResponse res,
FilterChain chain) throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
chain.doFilter( req, res);
}
}

as the first filter to my webapp:
in web.xml:

  
  CharsetEncodingFilter

my.package.servlet.RequestCharset2utf8Filter
  
  
   CharsetEncodingFilter
   /*
  


I run it on tomcat 6.0.18 .

And:
wonder is of course right, but life isn't all beer and skittles.

--
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.com/
Potsdamer Str. 96   Tel: +49 30 25 79 28 90
10785 BerlinFax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky


RE: SOLR indexing : Multiple content/document types

2010-01-23 Thread Adamsky, Robert

> I would like to know the best strategy/standards to follow for indexing
> multiple document types thru SOLR.

> In other words, let us say we have a file upload form thru which user woudl
> upload the files of different types (text, html, xml, word docs,excel

http://lucene.apache.org/tika/
http://wiki.apache.org/solr/ExtractingRequestHandler


Re: Improvising solr queries

2010-01-23 Thread dipti khullar
Thanks Eric

Correctly said!!
Initially we used to have a different settings for queryResultCache which
used to serve the purpose of serving queries from the cache.



But we changed the settings some days back to see if there were any
issues/improvements.
I believe we need to switch back to some similar settings after some of
analysis.

Also, removing  showed good results on local environment, I think
we will deploy the same on production.

Thanks guys for your help. Will keep posting further queries and findings on
the issue.

Dipti

On Fri, Jan 22, 2010 at 9:05 PM, Erick Erickson wrote:

> Take a look at the Wiki, here's a bit to start...
>
> http://lucene.apache.org/solr/features.html
>
> The short form is that when
> an
> index is first opened,
> there are various caches that are initialized. The
> first few queries that run against a new searcher
> are slowed down by filling up these caches. Warmup
> queries can be fired that'll pre-populate these caches
> in the background. You have to configure this, and
> only *after* the warmup queries have run does
> SOLR switch over to the newly-opened searchers.
>
> I suspect that what you're seeing is that the first few
> queries after you update your index are paying this
> penalty
>
> HTH
> Erick
>
> On Fri, Jan 22, 2010 at 12:30 AM, dipti khullar  >wrote:
>
> > Hi
> >
> > Eric, thanks for your reply.
> > I am not sure what exactly you mean by warmup queries. But if its related
> > to
> > the settings we are using in solrconfig.xml, following are the
> > configurations for query caching:
> >
> >  > autowarmCount="0"/>
> >
> > Also, as we are using snapinstall script on slaves, which eventually
> calls
> > commit script. I was just wondering that whether, we need to change the
> > simple commit command to
> >
> > 
> >
> > Otis, we executed a performance test on our local environments for Solr
> 1.4
> > but there were not considerable performance improvement. Hence, we have
> as
> > of now dropped the idea of upgrading to Solr 1.4.
> > Regarding optimization, we initially were not using optimize at all, but
> > then at peak hours load on slaves increased considerably. Hence, we
> > configured the optimize script to get the system running.
> > But we can try this on local environment and then analyze the results.
> >
> > Thanks
> > Dipti
> >
> >
> > On Fri, Jan 22, 2010 at 10:36 AM, Otis Gospodnetic <
> > otis_gospodne...@yahoo.com> wrote:
> >
> > > Dipti,
> > >
> > > If I'm reading that correctly, you are optimizing the index on the
> master
> > > before replicating it?
> > > There is no need to do that if you are constantly updating your index
> and
> > > replicating it every 10 minutes.
> > > Don't optimize, and you'll replicate smaller portion of an index, and
> > thus
> > > you won't bust the OS cache on the slave as much.
> > > The upgrade to Solr 1.4 and you'll see further benefits from faster
> > > searcher warmup times.
> > >
> > >  Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> > >
> > >
> > >
> > > - Original Message 
> > > > From: dipti khullar 
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Thu, January 21, 2010 11:48:20 AM
> > > > Subject: Re: Improvising solr queries
> > > >
> > > > Hi
> > > >
> > > > Sorry for getting back late on the thread, but we are focusing on
> > > > configuration of master and slave for improving performance issues.
> > > >
> > > > We have observed following trend on production slaves:
> > > > After every 10 minutes the response time increases considerably. In
> > > between
> > > > all the queries are served by cache.
> > > > It seems, after every 10th minute installation and then commit takes
> > time
> > > > and hence results in slow response time.
> > > >
> > > > Following are the logs taken for a complete cycle for master/slave
> sync
> > > up
> > > > process:
> > > >
> > > > 2010/01/21 14:28:02 started by solr
> > > > 2010/01/21 14:28:02 command:
> > > /opt/solr/solr_master/solr/solr/bin/snapshooter
> > > > 2010/01/21 14:28:02 taking snapshot
> > > > /opt/solr/solr_master/solr/data/snapshot.20100121142802
> > > > 2010/01/21 14:28:02 ended (elapsed time: 0 sec)
> > > > 2010/01/21 14:28:01 started by solr
> > > > 2010/01/21 14:28:01 command:
> > /opt/solr/solr_master/solr/solr/bin/optimize
> > > > 2010/01/21 14:28:02 ended (elapsed time: 1 sec)
> > > > 2010/01/21 14:30:02 started by solr
> > > > 2010/01/21 14:30:02 command:
> > > /opt/solr/solr_slave/solr/solr/bin/snappuller
> > > > 2010/01/21 14:30:06 pulling snapshot snapshot.20100121142802
> > > > 2010/01/21 14:30:14 ended (elapsed time: 12 sec)
> > > > 2010/01/21 14:30:14 started by solr
> > > > 2010/01/21 14:30:14 command:
> > > > /opt/solr/solr_slave/solr/solr/bin/snapinstaller
> > > > 2010/01/21 14:30:15 installing snapshot
> > > > /opt/solr/solr_slave/solr/data/snapshot.20100121142802
> > > > 2010/01/21 14:30:16 notifing Solr to open a new Searcher
> > > > 2010/01/

Dedupe of document results at query-time

2010-01-23 Thread Peter S

Hi,

 

I wonder if someone might be able to shed some insight into this problem:

 

Is it possible and/or what is the best/accepted way to achieve deduplication of 
documents by field at query-time?

 

For example:

Let's say an index contains:

 

Doc1



host:Host1

time:1 Sept 09

appname:activePDF

 

Doc2



host:Host1

time:2 Sept 09

appname:activePDF

 

Doc3



host:Host1

time:3 Sept 09

appname:activePDF

 

Can a query be constructed that would return only 1 of these Documents based on 
appname (doesn't really matter which one)?

 

i.e.:

   match on host:Host1

   ignore time

   dedupe on appname:activePDF

 

Is this possible? Would FunctionQuery be helpful here, maybe? Am I actually 
talking about field collapsing?

 

Many thanks,

Peter

 
  
_
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

SOLR indexing : Multiple content/document types

2010-01-23 Thread Kranti™ K K Parisa
Hi,

I would like to know the best strategy/standards to follow for indexing
multiple document types thru SOLR.

In other words, let us say we have a file upload form thru which user woudl
upload the files of different types (text, html, xml, word docs,excel
sheets, pdf, jpg, gif..etc)
Once we save the files into the hard disk at server side, we need to
initiate the SOLR indexing.

What would be the best strategy to achieve this and what are the libraries
to be used for different content/document types.

So far used pdfbox to read pdf files. Please suggest for all the possible
content/document types

Best Regards,
Kranti K K Parisa