Re: replication of lucene-write.lock file

2009-05-15 Thread Noble Paul നോബിള്‍ नोब्ळ्
the replication relies on lucene API to know what are the files associated with an index version. If it returns the lock file also it is replicated too. I guess we must ignore the .lock file if it is returned in the list of files. you can raise an issue and we can fix it. --Noble On Fri, May

Sol Lederman's Review of Faceted Search

2009-05-15 Thread Andre Hagenbruch
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, Sol Lederman has written a Review http://federatedsearchblog.com/2009/05/14/review-faceted-search/ of the almost finished manuscript of Daniel Tunkelang's Faceted Search which is set to be published in June. As the text also mentions Solr it

Re: Initialising of CommonsHttpSolrServer in Spring framwork

2009-05-15 Thread Aleksander M. Stensby
Out of the box, the simplest way to configure CommonsHttpSolrServer through a spring application context is to simply define the bean for the server and inject it into whatever class you have that will use it, like Avlesh shared below. bean id=httpSolrServer

Re: Solr vs Sphinx

2009-05-15 Thread Michael McCandless
On Thu, May 14, 2009 at 8:36 PM, Mark Miller markrmil...@gmail.com wrote: Michael McCandless wrote: So why haven't we enabled this by default, already? Why isn't Lucene done already :) I hear you :) Mike

Integetr field showing as boolean, breaking phps writer

2009-05-15 Thread Andrew McCombe
Hello I have a field defined in schema.xml as an integer which should contain either 0,1,2,10 or 11 values but my results documents are showing this as either 'true' or 'false'. the majority of the half million documents have this field as 0 or 1 but around 6,000 have it as 2,10 or 11. The

Documents in facet results

2009-05-15 Thread Jeffrey Gelens
Dear community, I'm wondering if there is a clean solution to my rather interesting problem. The following facet query results in a list of all facets and the number of all documents matching the corresponding facet as seen below: Query: str name=q*:*/str str name=facet.limit5/str str

Re: Solr vs Sphinx

2009-05-15 Thread Mark Miller
In the spirit of good defaults: I think we should change the Solr highlighter to highlight phrase queries by default, as well as prefix,range,wildcard constantscore queries. Its awkward to have to tell people you have to turn those on. I'd certainly prefer to have to turn them off if I have

Re: Search Query Questions

2009-05-15 Thread Erik Hatcher
On May 14, 2009, at 8:46 PM, Chris Miller wrote: 1) How do I search for ALL items? For example, I provide a sort query parameter of updated and a rows query parameter of 10 to limit the query results. I still have to provide a search query, of course. What if I want to provide a list of

Re: How to deal with Mark invalid?

2009-05-15 Thread Nikolai Derzhak
No. This patch not help in case, when data is not HTML, but is parsed by HTMLStripReader. Look like we need just fine tuned try/catch in code. To catch only non-HTML data case. On Tue, May 12, 2009 at 6:05 PM, Yonik Seeley yo...@lucidimagination.comwrote: I just committed a minor match

Re: Solr vs Sphinx

2009-05-15 Thread Eric Pugh
Something that would be interesting is to share solr configs for various types of indexing tasks. From a solr configuration aimed at indexing web pages to one doing large amounts of text to one that indexes specific structured data. I could see those being posted on the wiki and helping

How to update only few fields in a document

2009-05-15 Thread Vincent Pérès
Hello, I did just find only post about updating document, maybe things evolved since that time. I need to update a field in few thousand documents in one time (or multiple request), but I wouldn't like to have to add a new document instead of the current one (I mean it's how it works if I well

Solr Shard - Strange results

2009-05-15 Thread CB-PO
Hello, What we have done is created multiple solr instances on the same server, where each instance is created with the DataImportHandler from a different DB. The information on each DB is similar, so the schema's for each instance are pretty much the same. Our goal is to use the shards

Simple search returns no documents

2009-05-15 Thread Jeffrey Gelens
Hello all, I've got some weird problem with a simple field search. The field facility_indexed has the following terms: - kooklessen (freq: 422) - workshop (freq: 422) These terms were tokenized from the string: Kooklessen en Workshops. So during insertion in Solr, the string was succesfully

irrelevant search results

2009-05-15 Thread Radha C.
Hello List, I am having the below query art_id:queryTextstart=0rows=10sort=score desc and this should not yield any result because art_id contains numbers. But when I execute this search , it returns more than 100 documents. the art_id field is String in schema.xml Can anyone tell me how

RE: irrelevant search results - encode issue.

2009-05-15 Thread Radha C.
Hi, I found the why it is returning irrelevant documents. I am encoding my query string with UTF-8 and appending to url as follows so it fails. This is the query string = art_id:queryTextstart=0rows=10sort=score desc encoded url :

Re: replication of lucene-write.lock file

2009-05-15 Thread Bryan Talbot
https://issues.apache.org/jira/browse/SOLR-1170 -Bryan On May 15, 2009, at May 15, 12:24 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote: the replication relies on lucene API to know what are the files associated with an index version. If it returns the lock file also it is replicated too. I

Re: Replication master+slave

2009-05-15 Thread Michael Ludwig
Bryan Talbot schrieb: So how are people managing solrconfig.xml files which are largely the same other than differences for replication? I don't think it's a good thing to maintain two copies of the same file and I'd like to avoid that. Maybe enabling the XInclude feature in DocumentBuilders

Re: Solr vs Sphinx

2009-05-15 Thread Matthew Runo
I agree regarding posting different types of files - because right now if you're just starting out with Solr, taking the sample files from the distro and going from there is the /only path/ =\ Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833

Synchronisation problem with replication

2009-05-15 Thread Jérôme Etévé
Hi All, I've got here a small problem about replication. Let's say I post a document on the master server, and the slaves do a snappuller/installer via crontab every 1 minutes. Then between in average 30 seconds, all my search servers are not synchronized. Is there a way to improve

Re: Solr Shard - Strange results

2009-05-15 Thread Yonik Seeley
Certainly does seem strange. Do you have the same uniqueKeyField in both indexes? Any way you can provide some configuration and some data to reproduce this? -Yonik On Fri, May 15, 2009 at 10:40 AM, CB-PO charles.bush...@gmail.com wrote: Hello, What we have done is created multiple solr

Re: Synchronisation problem with replication

2009-05-15 Thread Otis Gospodnetic
You'd have to have extra hardware. You'd pull out some number of servers out of service while they are being updated. Then you'd put them back in service and take the other half out, update, and put them back in. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -

query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread jayakeerthi s
Hi All, I am trying to index the fileds from the xml files, here is the configuration that I am using. db-data-config.xml dataConfig dataSource type=FileDataSource name =xmlindex/ document name=products entity name=xmlfile processor=FileListEntityProcessor

Re: Solr Shard - Strange results

2009-05-15 Thread CB-PO
Yeah, the first thing I thought of was that perhaps there was something wrong with the uniqueKey and they were clashing between the indexes, however upon visual inspection of the data the field we are using as the unique key in each of the indexes is grossly different between the two databases,

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread Jay Hill
If that is your complete input file then it looks like you are missing the wrapping add/add element: add doc field name=idF8V7067-APL-KIT/ field field name=nameBelkin Mobile Power Cord for iPod w/ Dock/field field name=manuBelkin/field field name=catelectronics/field field

Re: query regarding Indexing xml files -db-data-config.xml

2009-05-15 Thread jayakeerthi s
Many thanks for the reply The complete input xml file is below I missed to include this earlier. add doc field name=idF8V7067-APL-KIT/field field name=nameBelkin Mobile Power Cord for iPod w/ Dock/field field name=manuBelkin/field field name=catelectronics/field field

Re: Solr Shard - Strange results

2009-05-15 Thread Yonik Seeley
On Fri, May 15, 2009 at 4:11 PM, CB-PO charles.bush...@gmail.com wrote: Yeah, the first thing I thought of was that perhaps there was something wrong with the uniqueKey and they were clashing between the indexes, however upon visual inspection of the data the field we are using as the unique

highlighting performance

2009-05-15 Thread Matt Mitchell
Hi, I'm experimenting with highlighting and am noticing a big drop in performance with my setup. I have documents that use quite a few dynamic fields (20-30). The fields are multiValued stored/indexed text fields, each with a few paragraphs worth of text. My hl.fl param is set to *_t What kinds

grouping response docs together

2009-05-15 Thread Matt Mitchell
Is there a built-in mechanism for grouping similar documents together in the response? I'd like to make it look like there is only one document with multiple hits. Matt

Re: grouping response docs together

2009-05-15 Thread Rohit Gandhe
Collapse component may be of interest to you https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel On Fri, May 15, 2009 at 3:52 PM, Matt Mitchell goodie...@gmail.com wrote: Is there a built-in mechanism for grouping similar documents

Query syntax question

2009-05-15 Thread Vauthrin, Laurent
Hello, I'm having a problem with a query but I don't understand what is wrong with it. Can someone explain the following? Here are a few queries that work as expected (premium is a boolean field): premium:false-3004 premium:true -0 -premium:false - 0

Re: grouping response docs together

2009-05-15 Thread Otis Gospodnetic
Matt - you may also want to detect near duplicates at index time: http://wiki.apache.org/solr/Deduplication Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Friday,

Re: highlighting performance

2009-05-15 Thread Otis Gospodnetic
Matt, I believe indexing those fields that you will use for highlighting with term vectors enabled will make things faster (and your index a bit bigger). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matt Mitchell goodie...@gmail.com

Re: Simple search returns no documents

2009-05-15 Thread Otis Gospodnetic
Hi Jeffrey, And now try: ?q=facility_indexed:kooklessen en workshops~1 If that works, head over to the Solr Admin Analysis page, enter the field name, and that phrase for both index and query analyzer. And then look at term positions for your two main terms/tokens. Otis -- Sematext --

Re: How to update only few fields in a document

2009-05-15 Thread Otis Gospodnetic
Vincent, Unfortunately things haven't changed yet. If all your fields are stored, have a look at SOLR-139. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Vincent Pérès vincent.pe...@gmail.com To: solr-user@lucene.apache.org Sent:

Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-15 Thread Otis Gospodnetic
Sachin, EmbeddedSolrServer implies an embedded, local, in-process access to Solr. CommonsHttpSolrServer lets you access a remote Solr instance via HTTP. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: sachin78 tendulkarsachi...@gmail.com

Re: Date field

2009-05-15 Thread Otis Gospodnetic
Jack, Which bug are you referring to? Last time I played with function queries with date fields things worked as expected. If there is/was a known bug, it must be in JIRA... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jack Godwin

Re: Solr memory requirements?

2009-05-15 Thread vivek sar
Some more info, Profiling the heap dump shows org.apache.lucene.index.ReadOnlySegmentReader as the biggest object - taking up almost 80% of total memory (6G) - see the attached screen shot for a smaller dump. There is some norms object - not sure where are they coming from as I've