RE: Replacing existing documents

2007-08-22 Thread Ard Schrijvers
Hello,

Recently someone mentioned that it would be possible to have a 'replace
existing document' feature rather than just dropping and adding documents
with the same unique id.

AFAIK, this is not possible. You have the update in lucene, but internally it 
just does a delete/add operation

We have a few use cases in this area and I'm
researching whether it is effective to check for a document via Solr
queries, or whether it is worthwhile to add this to the Solr implementation.

What are the usecases?? I do not see what you mean.

Does anyone have an estimate for the difference between querying, day, 100
documents by unique ID from the network v.s. fetching them directly from the
index?

Depends of course from the networkfetching them from the index is fast 
normally.
 
One use case is that we would like to use the index as our one database for
documents, and if we delete a document we want it to stay deleted. Thus we
would mark it deleted and check for its existence.

I suppose you mark it deleted by setting some flag (like lucene Field: 
isDeleted set to true). I am not sure wether using the lucene index as your 
database is really smart...i might get corrupt. I would at least suggest to 
backup it frequently

Regards Ard

ps sry for my annoying .. because i am using a web mail client

Another use case is that we are re-adding the same document a few times a day, 
and the commit times
are ballooning.

 
Where would I implement this?
 
Thanks,
 
Lance





Major update to Solrsharp

2007-08-22 Thread Jeff Rodenburg
A big update was just posted to the Solrsharp project.  This update now
provides for first-class support for highlighting in the library.

The implementation is really robust and provides the following features:

   - Structured highlight parameter assignment based on the SolrField
   object
   - Full access for all highlight parameters, on both an aggregate and
   per-field basis
   - Incorporation of highlighted values into the base search result
   records

All of the supplied documentation has been updated as well as the example
application in using the highlighting classes.

Please report any issues through JIRA.  Be sure to associate any issues with
the C# client component.

cheers,
jeff r.


Re: Replacing existing documents

2007-08-22 Thread Erik Hatcher


On Aug 21, 2007, at 9:25 PM, Lance Norskog wrote:

Recently someone mentioned that it would be possible to have a  
'replace
existing document' feature rather than just dropping and adding  
documents

with the same unique id.


There is such a patch: https://issues.apache.org/jira/browse/SOLR-139

I'm experimenting with it right now and it works well for my cases.

However, it is still under the covers a delete/add and

One use case is that we would like to use the index as our one  
database for
documents, and if we delete a document we want it to stay deleted.  
Thus we
would mark it deleted and check for its existence. Another use case  
is that
we are re-adding the same document a few times a day, and the  
commit times

are ballooning.


...you still have to commit for changes to be visible.

Erik



Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Ravish Bhagdev
Hello,

Sorry for stupid question.  I'm trying to index html file as one of
the fields in Solr, I've setup appropriate analyzer in schema but I'm
not sure how to add html content to Solr.  Encapsulating HTML content
within field tag is obviously not valid.  How do I add html content?
Hope the query is clear

Thanks,
Ravi


Re: Indexing HTML content... (Embed HTML into XML?)

2007-08-22 Thread Jérôme Etévé
You need to encode your html content so it can be include as a normal
'string' value in your xml element.

As far as remember, the only unsafe characters you have to encode as
entities are:
  - lt;
 - gt;
 - quote;
 - amp;

(google xml entities to be sure).

I dont know what language you use , but for perl for instance, you can
use something like:
use HTML::Entities ;
my $xmlString = encode_entities($rawHTML  , '' );

Also you need to make sure your Html is encoded in UTF-8 . To comply
with solr need for UTF-8 encoded xml.

I hope it helps.

J.

On 8/22/07, Ravish Bhagdev [EMAIL PROTECTED] wrote:
 Hello,

 Sorry for stupid question.  I'm trying to index html file as one of
 the fields in Solr, I've setup appropriate analyzer in schema but I'm
 not sure how to add html content to Solr.  Encapsulating HTML content
 within field tag is obviously not valid.  How do I add html content?
 Hope the query is clear

 Thanks,
 Ravi



-- 
Jerome Eteve.
[EMAIL PROTECTED]
http://jerome.eteve.free.fr/


RE: Query optimisation - multiple filter caches?

2007-08-22 Thread Jonathan Woods
I understand - thanks, Yonik.

I notice that LuceneQueryOptimizer is still used in
SolrIndexSearcher.search(Query, Filter, Sort) - is the idea then that this
method is deprecated, or that the config parameter
query/boolTofilterOptimizer is no longer to be used?  As for the other
search() methods, they just delegate directly to
org.apache.lucene.search.IndexSearcher, so no use of caches there.

Jon

 -Original Message-
 From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
 Sent: 16 August 2007 01:40
 To: solr-user@lucene.apache.org
 Subject: Re: Query optimisation - multiple filter caches?
 
 On 8/15/07, Jonathan Woods [EMAIL PROTECTED] wrote:
  I'm trying to understand how best to integrate directly with Solr 
  (Java-to-Java in the same JVM) to make the most of its query 
  optimisation - chiefly, its caching of queries which merely filter 
  rather than rank results.
 
  I notice that SolrIndexSearcher maintains a filter cache 
 and so does 
  LuceneQueryOptimiser.  Shouldn't they be contributing to/using the 
  same cache, or are they used for different things?
 
 LuceneQueryOptimiser is no longer used since one can directly 
 specify filters via fq parameters.
 
 -Yonik
 
 
 



Apache web server logs in solr

2007-08-22 Thread Andrew Nagy
Hello, I was thinking that solr - with its built in faceting - would make for a 
great apache log file storage system.  I was wondering if anyone knows of any 
module or library for apache to write log files directly to solr or to a lucene 
index?

Thanks
Andrew


RE: SolJava --- which attachments are valid?

2007-08-22 Thread Teruhiko Kurosaka
Sorry for revisiting this 3 weeks old thread.
I downloaded the nighlty yesterday.
I noticed that some classes have API docs (.html) but no source code
(.java).
For example, there is a javadoc for
org.apache.solr.client.solrj.util.ClientUtils
but no ClientUtils.java:

bash-3.00$ find . -type f | grep Client
./docs/api-solrj/org/apache/solr/client/solrj/util/class-use/ClientUtils
.html
./docs/api-solrj/org/apache/solr/client/solrj/util/ClientUtils.html

Is this a packaging problem, or is it intentional?

-kuro

 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
 Sent: Friday, August 03, 2007 12:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SolJava --- which attachments are valid?
 
 Teruhiko Kurosaka wrote:
  or you can get it from the nightly builds in:
  http://people.apache.org/builds/lucene/solr/nightly/
  
  For those of you who are interested...
  
  As far as I can tell by inspecting the source code in Trunk,
  solrj.jar from the nightly doesn't seem to work with Solr 1.2.
  For one thing, there is a new layer org.apache.solr.common
  and org.apache.util has become a sub component under
  the common. Things like SolrInputDocument do not exist
  in Solr 1.2 at all. 
  
 
 To run solrj, you need:
   apache-solr-1.3-dev-common.jar
   apache-solr-1.3-dev-solrj.jar
   and all the files in: solrj-lib
 
 You *should* be able to use the client against a server that 
 is running 
 1.2, but I don't make any promises there.
 
 ryan
 


Solr and terracotta

2007-08-22 Thread Jonathan Ariel
Recently I ran into this topic. I googled it a little and didn't find much
information.
It would be great to have solr working with RAMDirectory and Terracotta. We
could stop using crons for rsync, right?
Has anyone tried that out?


Solr scoring: relative or absolute?

2007-08-22 Thread Lance Norskog
Are the score values generated in Solr relative to the index or are they
against an absolute standard?
Is it possible to create a scoring algorithm with this property? Are there
parts of the score inputs that are absolute?
 
My use case is this: I would like to do a parallel search against two Solr
indexes, and combine the results. The two indexes are built with the same
data sources, we just can't handle one giant index. If the score values are
against a common 'scale', then scores from the two search indexes can be
compared. I could combine the result sets with a simple merge by score.
 
This is a difficult concept to explain. I hope I have succeeded.
 
Thanks,
 
Lance


Re: Solr scoring: relative or absolute?

2007-08-22 Thread Sean Timm




Indexes cannot be directly compared unless they have similar collection
statistics. That is the same terms occur with the same frequency
across all indexes and the average document lengths are about the same
(though the default similarity in Lucene may not care about average
document length--I'm not sure).

SOLR-303 is an attempt to solve the
partitioning issue from the search side of things.

-Sean

Lance Norskog wrote:

  Are the score values generated in Solr relative to the index or are they
against an absolute standard?
Is it possible to create a scoring algorithm with this property? Are there
parts of the score inputs that are absolute?
 
My use case is this: I would like to do a parallel search against two Solr
indexes, and combine the results. The two indexes are built with the same
data sources, we just can't handle one giant index. If the score values are
against a common 'scale', then scores from the two search indexes can be
compared. I could combine the result sets with a simple merge by score.
 
This is a difficult concept to explain. I hope I have succeeded.
 
Thanks,
 
Lance

  





RE: Solr and terracotta

2007-08-22 Thread Jeryl Cook
tried it, didn't work that well...so I ended up making my own little faceted 
Search engine directly using RAMDirectory and clustering it via 
Terracotta...not as good as SOLR(smile), but it worked.
i actually posted some questions awhile back in trying to get it to work. so 
terracotta can hook the RAMDirectory, maybe be good to submit this in JIRA 
for terrocotta support!

Jeryl Cook 
 /^\ Pharaoh /^\ 


http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)

 Date: Wed, 22 Aug 2007 16:18:24 -0300
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Solr and terracotta
 
 Recently I ran into this topic. I googled it a little and didn't find much
 information.
 It would be great to have solr working with RAMDirectory and Terracotta. We
 could stop using crons for rsync, right?
 Has anyone tried that out?


Re: Solr and terracotta

2007-08-22 Thread Jonathan Ariel
How come it didn't work? How did you add RAMDir support to solr?

On 8/22/07, Jeryl Cook [EMAIL PROTECTED] wrote:

 tried it, didn't work that well...so I ended up making my own little
 faceted Search engine directly using RAMDirectory and clustering it via
 Terracotta...not as good as SOLR(smile), but it worked.
 i actually posted some questions awhile back in trying to get it to work.
 so terracotta can hook the RAMDirectory, maybe be good to submit this in
 JIRA for terrocotta support!

 Jeryl Cook
 /^\ Pharaoh /^\


 http://pharaohofkush.blogspot.com/



 ..Act your age, and not your shoe size..

 -Prince(1986)

  Date: Wed, 22 Aug 2007 16:18:24 -0300
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Solr and terracotta
 
  Recently I ran into this topic. I googled it a little and didn't find
 much
  information.
  It would be great to have solr working with RAMDirectory and Terracotta.
 We
  could stop using crons for rsync, right?
  Has anyone tried that out?



Running into problems with distributed index and search

2007-08-22 Thread Kasi Sankaralingam
Hi All,

 

This is the scenario, I have two search SOLR instances running on two
different partitions, I am treating one of the servers strictly
read-only (for search) (search server) and the other

Instance (index server) for indexing. The index file data directory
reside on a NFS partition, I am running into the following problems,

 

1)  Index dir is /indexdata/data, when I index using the  Index
server, the index server  understands the data dir mentioned in
solrconfig.xml, writes the index files

To the location and is able to read the files ( I am able to do queries
using SOLR Admin)

 

2)  Search server respects the NFS directory, but does not read the
index files, SOLR Admin returns no search results, I had to create a sym
link to the NFS partition 

Under $SOLRHOME to point to NFS partition to work.

 

3)  I had to bounce the tomcat search SOLR Webapp instance for it to
read the index files, is it mandatory? In a distributed environment, do
we always have to

Bounce the SOLR Webapp instances to reflect the changes in the index
files?

 

Any help/suggestions would be greatly appreciated.

 

Thanks,

 

kasi



How to extract constrained fields from query

2007-08-22 Thread Martin Grotzke
Hello,

in my custom request handler, I want to determine which fields are
constrained by the user.

E.g. the query (q) might be ipod AND brand:apple and there might
be a filter query (fq) like color:white (or more).

What I want to know is that brand and color are constrained.

AFAICS I could use SolrPluginUtils.parseFilterQueries and test
if the queries are TermQueries and read its Field.
Then should I also test which kind of queries I get when parsing
the query (q) and look for all TermQueries from the parsed query?

Or is there a more elegant way of doing this?

Thanx a lot,
cheers,
Martin




signature.asc
Description: This is a digitally signed message part


RE: Solr and terracotta

2007-08-22 Thread Orion Letizi

Jeryl,

I remember you asking about how to hook in the RAMDirectory a while back. 
It seemed like there was maybe some support within Solr that you needed.  I
assume you're suggesting adding an issue in the Solr  JIRA, right?

Is there something that the Terracotta team can do to help?

Cheers,
Orion


Jeryl Cook wrote:
 
 tried it, didn't work that well...so I ended up making my own little
 faceted Search engine directly using RAMDirectory and clustering it via
 Terracotta...not as good as SOLR(smile), but it worked.
 i actually posted some questions awhile back in trying to get it to work.
 so terracotta can hook the RAMDirectory, maybe be good to submit this in
 JIRA for terrocotta support!
 
 Jeryl Cook 
  /^\ Pharaoh /^\ 
 
 
 http://pharaohofkush.blogspot.com/ 
 
 
 
 ..Act your age, and not your shoe size..
 
 -Prince(1986)
 
 Date: Wed, 22 Aug 2007 16:18:24 -0300
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Solr and terracotta
 
 Recently I ran into this topic. I googled it a little and didn't find
 much
 information.
 It would be great to have solr working with RAMDirectory and Terracotta.
 We
 could stop using crons for rsync, right?
 Has anyone tried that out?
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537
Sent from the Solr - User mailing list archive at Nabble.com.



Web statistics for solr?

2007-08-22 Thread Matthew Runo

Hello!

I was wondering if anyone has written a script that displays any  
stats from SOLR.. queries per second, number of docs added.. this  
sort of thing.


Sort of a general dashboard for SOLR.

I'd rather not write it myself if I don't need to, and I didn't see  
anything conclusive in the archives for the email list.


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




Re: defining fiels to be returned when using mlt

2007-08-22 Thread Pieter Berkel
Hi Stefan,

Currently there is no way to specify the list of fields to be returned by
the MoreLikeThis handler.  I've been looking to address this issue in
https://issues.apache.org/jira/browse/SOLR-295 (point 3) however in the
broader scheme of things, it seems logical to wait until
https://issues.apache.org/jira/browse/SOLR-281 is resolved before making
changes to MLT.

cheers,
Piete



On 22/08/07, Stefan Rinner [EMAIL PROTECTED] wrote:

 Hi

 Is there any way to define the numer/type of fields of the documents
 returned in the moreLikeThis part of the response, when mlt is
 set to true?

 Currently I'm using morelikethis to show the number and sources of
 similar documents - therefore I'd need only the source field of
 these similar documents and not everything.

 - stefan



Re: Web statistics for solr?

2007-08-22 Thread Pieter Berkel
Matthew,

Maybe the SOLR Statistics page would suit your purpose?
(click on statistics from the main solr page or use the following url)
http://localhost:8983/solr/admin/stats.jsp

cheers,
Piete



On 23/08/07, Matthew Runo [EMAIL PROTECTED] wrote:

 Hello!

 I was wondering if anyone has written a script that displays any
 stats from SOLR.. queries per second, number of docs added.. this
 sort of thing.

 Sort of a general dashboard for SOLR.

 I'd rather not write it myself if I don't need to, and I didn't see
 anything conclusive in the archives for the email list.

 ++
   | Matthew Runo
   | Zappos Development
   | [EMAIL PROTECTED]
   | 702-943-7833
 ++





Re: almost realtime updates with replication

2007-08-22 Thread Walter Underwood
At Infoseek, we ran a separate search index with today's updates
and merged that in once each day. It requires a little bit of
federated search to prefer the new content over the big index,
but the daily index can be very nimble for update.

wunder

On 8/22/07 7:58 AM, mike topper [EMAIL PROTECTED] wrote:

 Hello,
 
 Currently in our application we are using the master/slave setup and
 have a batch update/commit about every 5 minutes.
 
 There are a couple queries that we would like to run almost realtime so
 I would like to have it so our client sends an update on every new
 document and then have solr configured to do an autocommit every 5-10
 seconds.
 
 reading the Wiki, it seems like this isn't possible because of the
 strain of snapshotting and pulling to the slaves at such a high rate.
 What I was thinking was for these few queries to just query the master
 and the rest can query the slave with the not realtime data, although
 I'm assuming this wouldn't work either because since a snapshot is
 created on every commit, we would still impact the performance too much?
 
 anyone have any suggestions?  If I set autowarmingCount=0 would I be
 able to to pull to the slave faster than every couple of minutes (say,
 every 10 seconds)?
 
 what if I take out the postcommit hook on the master and just have the
 snapshooter run on a cron every 5 minutes?
 
 -Mike
 
 



Re: Solr and terracotta

2007-08-22 Thread Jonathan Ariel
If I am not wrong once you have the RAMDir feature mounting Terracotta
should be transparent and fast, right?

On 8/22/07, Orion Letizi [EMAIL PROTECTED] wrote:


 Jeryl,

 I remember you asking about how to hook in the RAMDirectory a while back.
 It seemed like there was maybe some support within Solr that you
 needed.  I
 assume you're suggesting adding an issue in the Solr  JIRA, right?

 Is there something that the Terracotta team can do to help?

 Cheers,
 Orion


 Jeryl Cook wrote:
 
  tried it, didn't work that well...so I ended up making my own little
  faceted Search engine directly using RAMDirectory and clustering it via
  Terracotta...not as good as SOLR(smile), but it worked.
  i actually posted some questions awhile back in trying to get it to
 work.
  so terracotta can hook the RAMDirectory, maybe be good to submit this
 in
  JIRA for terrocotta support!
 
  Jeryl Cook
   /^\ Pharaoh /^\
 
 
  http://pharaohofkush.blogspot.com/
 
 
 
  ..Act your age, and not your shoe size..
 
  -Prince(1986)
 
  Date: Wed, 22 Aug 2007 16:18:24 -0300
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Solr and terracotta
 
  Recently I ran into this topic. I googled it a little and didn't find
  much
  information.
  It would be great to have solr working with RAMDirectory and
 Terracotta.
  We
  could stop using crons for rsync, right?
  Has anyone tried that out?
 
 

 --
 View this message in context:
 http://www.nabble.com/Solr-and-terracotta-tf4313531.html#a12283537
 Sent from the Solr - User mailing list archive at Nabble.com.