schema configuration with different kind of score report

2009-08-17 Thread Sushan Rungta
Kindly guide me that how shall I configure solr lucene with the below kind of requirements: The query is abc Documents are: a) abc b) abcd c) xyz ab c mno d) ab I require the score for each of the above mentioned documents with the above mentioned query to be displayed as: For document

Re: schema configuration with different kind of score report

2009-08-17 Thread Avlesh Singh
Why not stick to lucene score for each document then building your own? The easiest way of getting the relevance score for each document is to add the debugQuery=true parameter to your request handler. Cheers Avlesh On Mon, Aug 17, 2009 at 12:32 PM, Sushan Rungta s...@clickindia.com wrote:

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Constantijn Visinescu
near the bottom of my web.xml (just above /web-app) i got env-entry env-entry-namesolr/home/env-entry-name env-entry-valuepath/to/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry While you're at it you might want to make sure the

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Aaron Aberg
Thanks for the help. I commented out that line in solrconfig.xml like you said. my web.xml file has this entry in it: env-entry env-entry-namesolr/home/env-entry-name env-entry-value/usr/share/tomcat5/solr/env-entry-value env-entry-typejava.lang.String/env-entry-type /env-entry And

Re: schema configuration with different kind of score report

2009-08-17 Thread Sushan Rungta
This doesnot solve my purpose, as my requirement is different. Kindly check the document d, which I have mentioned the computation of score for that kind of document will be different. Hence, some sort of different query will be applied, which I am unable to ascertain. Regards, Sushan

Re: schema configuration with different kind of score report

2009-08-17 Thread Avlesh Singh
I am definitely missing something here. Do you want to fetch a document if one of its field contains ab given a search term abc? If you can design a field and query your index so that you can fetch such a document, Lucene (and hence Solr) would automagically give you the relevance score. Cheers

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Constantijn Visinescu
Not sure what's going on but i see jetty stuff scrolling by, that can't be right :) Jetty and Tomcat are 2 seperate webservers for serving java applications. the 2 of them mixing doesn't sound like a good idea. Jetty is included in the examples for .. well .. example purposes ... but it's not a

'Connection reset' in DataImportHandler Development Console

2009-08-17 Thread Andrew Clegg
Hi folks, I'm trying to use the Debug Now button in the development console to test the effects of some changes in my data import config (see attached). However, each time I click it, the right-hand frame fails to load -- it just gets replaced with the standard 'connection reset' message from

Re: 'Connection reset' in DataImportHandler Development Console

2009-08-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
apparently I do not see any command full-import, delta-import being fired. Is that true? On Mon, Aug 17, 2009 at 5:55 PM, Andrew Cleggandrew.cl...@gmail.com wrote: Hi folks, I'm trying to use the Debug Now button in the development console to test the effects of some changes in my data

Re: 'Connection reset' in DataImportHandler Development Console

2009-08-17 Thread Andrew Clegg
Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: apparently I do not see any command full-import, delta-import being fired. Is that true? It seems that way -- they're not appearing in the logs. I've tried Debug Now with both full and delta selected from the dropdown, no difference either way. If I

RE: HTTP ERROR: 500 No default field name specified

2009-08-17 Thread Kevin Miller
I am no longer getting this error. I downloaded the latest nightly build this morning the document I wanted worked without any problems. Kevin Miller Web Services -Original Message- From: Kevin Miller [mailto:kevin.mil...@oktax.state.ok.us] Sent: Thursday, August 13, 2009 3:35 PM To:

Re: Boosting relevance as terms get nearer to each other

2009-08-17 Thread Michael
Anybody have any suggestions or hints? I'd love to score my queries in a way that pays attention to how close together terms appear. Michael On Thu, Aug 13, 2009 at 12:01 PM, Michael solrco...@gmail.com wrote: Hello, I'd like to score documents higher that have the user's search terms nearer

Re: Boosting relevance as terms get nearer to each other

2009-08-17 Thread Mark Miller
Dismax QueryParser with pf and ps params? http://wiki.apache.org/solr/DisMaxRequestHandler -- - Mark http://www.lucidimagination.com Michael wrote: Anybody have any suggestions or hints? I'd love to score my queries in a way that pays attention to how close together terms appear. Michael

Re: A Buzzword Problem!!!

2009-08-17 Thread Grant Ingersoll
Sounds like you just need a buzzword field (indexed, stored) that is analyzed containing each of the terms associated with that buzzword. Then, just do the search against that field and return that field. On Aug 15, 2009, at 11:03 PM, Ninad Raut wrote: I want searchable buzzword word and

DIH opening searchers for every doc.

2009-08-17 Thread Lucas F. A. Teixeira
Hello all, I'm trying Data Import Handler for the first time to generate my index based on my db. Looking the server's logs, I can see the index process is opening a new searcher for every doc. Is this what we should expect? why? If not, how can I avoid it? I think if this wasn't being done,

Re: Boosting relevance as terms get nearer to each other

2009-08-17 Thread Mark Miller
PhraseQuery's do score higher if the terms are found closer together. does that imply that during the computation of the score for a b c~100, sloppyFreq() will be called? Yes. PhraseQuery uses PhraseWeight, which creates a SloppyPhraseScorer, which takes into account

Re: spellcheck component in 1.4 distributed

2009-08-17 Thread Ian Connor
Hi, Just a quick update to the list. Mike and I were able to apply it to 1.4 and it works. We have it loaded on a few production servers and there is an odd StringIndexOutOfBoundsException error but most of the time it seems to work just fine. On Fri, Aug 7, 2009 at 7:30 PM, mike anderson

Re: Boosting relevance as terms get nearer to each other

2009-08-17 Thread Michael
Thanks for the suggestion. Unfortunately, my implementation requires the Standard query parser -- I sanitize and expand user queries into deeply nested queries with custom boosts and other bells and whistles that make Dismax unappealing. I see from the docs that Similarity.sloppyFreq() is a

Re: DIH opening searchers for every doc.

2009-08-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH does not open searchers for each doc. Do you have any autocommit enabled? On Mon, Aug 17, 2009 at 8:17 PM, Lucas F. A. Teixeiralucas...@gmail.com wrote: Hello all, I'm trying Data Import Handler for the first time to generate my index based on my db. Looking the server's logs, I can see

RE: Which server parameters to tweak in Solr if I expect heavy writes and light reads?

2009-08-17 Thread Fuad Efendi
In my personal experience: ramBufferSizeMB=8192 helps to keep many things in RAM and to delay Index Merge forever (I have single segment 10G with almost 100 mlns docs after 24 hours) Heavy I/O was a problem before, and I solved it -Original Message- From: Archon810

Re: Boosting relevance as terms get nearer to each other

2009-08-17 Thread Michael
Great, thank you Mark! Michael On Mon, Aug 17, 2009 at 10:48 AM, Mark Miller markrmil...@gmail.com wrote: PhraseQuery's do score higher if the terms are found closer together. does that imply that during the computation of the score for a b c~100, sloppyFreq() will be called? Yes.

Re: spellcheck component in 1.4 distributed

2009-08-17 Thread Mark Miller
Ian Connor wrote: Hi, Just a quick update to the list. Mike and I were able to apply it to 1.4 and it works. We have it loaded on a few production servers and there is an odd StringIndexOutOfBoundsException error but most of the time it seems to work just fine. Do you happen to have the

Questions about MLT

2009-08-17 Thread Avlesh Singh
I have an index of documents which contain these two fields: field name=city_id type=integer stored=true indexed=true termVectors=true termPositions=true termOffsets=true/ field name=categories type=string stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ Using

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Aaron Aberg
Ok. Did that. Still got that error. Here is the log (it's not adding jetty stuff anymore) Here is the log. I included the exception this time. It looks like its blowing up on something related to XPath. Do you think its having an issue with one of my xml files? Aug 17, 2009 2:37:35 AM

Re: DIH opening searchers for every doc.

2009-08-17 Thread Lucas F. A. Teixeira
No I don't. It's commented. This is giving me 40 docs/sec indexing, witch is very a poor rate. (I know this rate depends in a lot of things, including that my database is not in the same network and other stuff, but I think I can get more than this). Any clues on what is probably happening to

RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Fuad Efendi
Not sure SOLR can work in such environment without asking Hosting Support for making a lot of secific changes... such as giving specific permissions to specific folders, setting ulimit -n, dealing with exact versions and vendors of Java, memory parameters, and even libraries which may overwrite

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Aaron Aberg
Sorry Fuad, that isn't very helpful. I also mentioned that this was a dedicated server so none of those things are an issue. I am using SSH right now to setup solr home etc. though. --Aaron On Mon, Aug 17, 2009 at 10:00 AM, Fuad Efendif...@efendi.ca wrote: Not sure SOLR can work in such

RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Fuad Efendi
What is solr.xml for? INFO: looking for solr.xml: /usr/share/tomcat5/solr/solr.xml Aug 17, 2009 2:37:36 AM org.apache.solr.core.SolrResourceLoader init java.lang.NoClassDefFoundError: org.apache.solr.core.Config - can't find configuration... XPath needs to load XML to configure Config.

Re: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-17 Thread Jason Rutherglen
Fuad, I'd recommend indexing in Hadoop, then copying the new indexes to Solr slaves. This removes the need for Solr master servers. Of course you'd need a Hadoop cluster larger than the number of master servers you have now. The merge indexes command (which can be taxing on the servers because

RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Fuad Efendi
Aaron, Do you have solr.war in your %TOMCAT%/webapps folder? Is your solr/home in another than /webapps location? Try to install sample Tomcat with SOLR on local dev-box and check it's working... -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: August-17-09 1:33 PM

RE: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Fuad Efendi
Looks like you are using SOLR multicore, with solr.xml... I never tried it... The rest looks fine, except suspicious solr.xml -Original Message- From: Fuad Efendi [mailto:f...@efendi.ca] Sent: August-17-09 1:33 PM To: solr-user@lucene.apache.org Subject: RE: Cannot get solr 1.3.0 to run

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-17 Thread Aaron Aberg
On Mon, Aug 17, 2009 at 10:58 AM, Fuad Efendif...@efendi.ca wrote: Looks like you are using SOLR multicore, with solr.xml... I never tried it... The rest looks fine, except suspicious solr.xml whats suspicious about it? is it in the wrong place? Is it not suppose to be there? technically my

Re: delta-import using a full-import command is not working

2009-08-17 Thread djain101
Any help? -- View this message in context: http://www.nabble.com/delta-import-using-a-full-import-command-is-not-working-tp24989144p25011540.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: delta-import using a full-import command is not working

2009-08-17 Thread Avlesh Singh
Solr and your database are different machines? If yes, are their dates synchronized? If you have access to your database server logs, looking at the queries that DIH generated might help. Cheers Avlesh On Mon, Aug 17, 2009 at 11:40 PM, djain101 dharmveer_j...@yahoo.com wrote: Any help? --

Re: delta-import using a full-import command is not working

2009-08-17 Thread djain101
Yes, database and Solr are different machines and their dates are not synchronized. Could that be the issue? Why the date difference between Solr and DB machine fails to put the timestamp from dataimport.properties file? Thanks, Dharmveer Avlesh Singh wrote: Solr and your database are

RE: Performance Tuning: segment_merge:index_update=5:1 (timing)

2009-08-17 Thread Fuad Efendi
Hi Jason, After moving to more RAM and CPUs and setting ramBufferSizeMB=8192 problem disappeared; I had 100 mlns documents added in 24 hours almost without any index merge (mergeFactor=10). Lucene flushes to disk the segment when RAM buffer is full; then MergePolicy orchestrates... However,

Query not working as expected

2009-08-17 Thread Matt Schraeder
I'm attempting to write a query as follows: ($query^10) OR (NOT ($query)) which effectively would return everything, but if it matches the first query it will get a higher score and thus be sorted first in the result set. Unfortunately the results are not coming back as expected. ($query)

SolrJ question

2009-08-17 Thread Paul Tomblin
If I put an object into a SolrInputDocument and store it, how do I query for it back? For instance, I stored a java.net.URI in a field called url, and I want to query for all the documents that match a particular URI. The query syntax only seems to allow Strings, and if I just try

RE: SolrJ question

2009-08-17 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
Assuming you have written the SolrInputDocument to the server, you would next query. See ClientUtils.escapeQueryChars. Also you need to be cognizant of URLEncoding at times. -Original Message- From: ptomb...@gmail.com [mailto:ptomb...@gmail.com] On Behalf Of Paul Tomblin Sent: Monday,

RE: SolrJ question

2009-08-17 Thread Ensdorf Ken
You can escape the string with org.apache.lucene.queryParser.QueryParser.escape(String query) http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/QueryParser.html#escape%28java.lang.String%29 -Original Message- From: ptomb...@gmail.com [mailto:ptomb...@gmail.com]

RE: Query not working as expected

2009-08-17 Thread Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]
The rows parameter would prevent you from getting all docs back. It is set by default to 10 I believe. -Original Message- From: Matt Schraeder [mailto:mschrae...@btsb.com] Sent: Monday, August 17, 2009 2:04 PM To: solr-user@lucene.apache.org Subject: Query not working as expected I'm

Re: SolrJ question

2009-08-17 Thread Paul Tomblin
On Mon, Aug 17, 2009 at 5:28 PM, Harsch, Timothy J. (ARC-SC)[PEROT SYSTEMS]timothy.j.har...@nasa.gov wrote: Assuming you have written the SolrInputDocument to the server, you would next query. I'm sorry, I don't understand what you mean by you would next query. There appear to be some words

Re: SolrJ question

2009-08-17 Thread Paul Tomblin
On Mon, Aug 17, 2009 at 5:30 PM, Ensdorf Kenensd...@zoominfo.com wrote: You can escape the string with org.apache.lucene.queryParser.QueryParser.escape(String query) http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/queryParser/QueryParser.html#escape%28java.lang.String%29 Does this

RE: SolrJ question

2009-08-17 Thread Ensdorf Ken
Does this mean I should have converted my objects to string before writing them to the server? I believe SolrJ takes care of that for you by calling toString(), but you would need to convert explicitly when you query (and then escape).

RE: Query not working as expected

2009-08-17 Thread Matt Schraeder
That isn't the problem, as I am looking at numFound and not actual rows returned. In all searches the rows returned is less than the number found. timothy.j.har...@nasa.gov 8/17/2009 4:30:38 PM The rows parameter would prevent you from getting all docs back. It is set by default to 10 I

Re: SolrJ question

2009-08-17 Thread Paul Tomblin
On Mon, Aug 17, 2009 at 5:36 PM, Ensdorf Kenensd...@zoominfo.com wrote: Does this mean I should have converted my objects to string before writing them to the server? I believe SolrJ takes care of that for you by calling toString(), but you would need to convert explicitly when you query

Re: Query not working as expected

2009-08-17 Thread Mark Miller
Matt Schraeder wrote: I'm attempting to write a query as follows: ($query^10) OR (NOT ($query)) which effectively would return everything, but if it matches the first query it will get a higher score and thus be sorted first in the result set. Unfortunately the results are not coming back as

Maximum number of values in a multi-valued field.

2009-08-17 Thread Arv
All, We are considering some new changes to our Solr schema to better support some new functionality for our application. To that extent, we want to add an additional field that is multi-valued, but will contain a large number of values per document. Potentially up to 2000 values on this field

Re: Maximum number of values in a multi-valued field.

2009-08-17 Thread Jason Rutherglen
Your term dictionary will grow somewhat, which means the term index could consume more memory. Because the term dictionary has grown there could be less performance in looking up terms but that is unlikely to affect your application. How many unique terms will there be? On Mon, Aug 17, 2009 at

Re: Maximum number of values in a multi-valued field.

2009-08-17 Thread Aravind Naidu
Hi, The possibility is that all items in this field could be unique. Let me clarify. The main Solr index is a for a list of products. Some products belong to catalogues. So, the consideration is to add a multi-valued field to put the id of the catalogue in each product as a multi-valued field to

Re: delta-import using a full-import command is not working

2009-08-17 Thread djain101
After debugging dataimporter code, i found that it is a bug in the dataimporter code itself. doFullImport() in DataImporter class is not loading last index time where as doDeltaImport() is. The code snippet from doFullImport() is: if (requestParams.commit) setIndexStartTime(new Date());

Re: SolrJ question

2009-08-17 Thread Paul Tomblin
On Mon, Aug 17, 2009 at 5:47 PM, Paul Tomblinptomb...@xcski.com wrote: Hmmm.  It's not working right.  I've added a 5 documents, 3 with the URL set to http://xcski.com/pharma/; and 2 with the URL set to http://xcski.com/nano/;.  Doing other sorts of queries seems to be pulling back the right

Re: delta-import using a full-import command is not working

2009-08-17 Thread djain101
Looks like this issue has been fixed on Sept 20, 2008 against issue SOLR-768. Can someone please let me know which one is a stable jar after Sept 20, 2008. djain101 wrote: After debugging dataimporter code, i found that it is a bug in the dataimporter 1.3 code itself. doFullImport() in

SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick
After running an application which heavily uses MD5 HEX-representation as uniqueKey for SOLR v.1.4-dev-trunk: 1. After 30 hours: 101,000,000 documents added 2. Commit: numDocs = 783,714 maxDoc = 3,975,393 3. Upload new docs to SOLR during 1 hour(!!!), then commit, then optimize:

Re: JVM Heap utilization Memory leaks with Solr

2009-08-17 Thread Funtick
Can you tell me please how many non-tokenized single-valued fields your schema uses, and how many documents? Thanks, Fuad Rahul R wrote: My primary issue is not Out of Memory error at run time. It is memory leaks: heap space not being released after doing a force GC also. So after

Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Mark Miller
I'd say you have a lot of documents that have the same id. When you add a doc with the same id, first the old one is deleted, then the new one is added (atomically though). The deleted docs are not removed from the index immediately though - the doc id is just marked as deleted. Over time

Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick
But how to explain that within an hour (after commit) I have had about 500,000 new documents, and within 30 hours (after commit) only 1,300,000? Same _random_enough_ documents... BTW, SOLR Console was showing only few hundreds deletesById although I don't use any deleteById explicitly; only

Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick
One more hour, and I have +0.5 mlns more (after commit/optimize) Something strange happening with SOLR buffer flush (if we have single segment???)... explicit commit prevents it... 30 hours, with index flush, commit: 783,714 + 1 hour, commit, optimize: 1,281,851 + 1 hour, commit, optimize:

Re: JVM Heap utilization Memory leaks with Solr

2009-08-17 Thread Funtick
BTW, you should really prefer JRockit which really rocks!!! Mission Control has necessary toolongs; and JRockit produces _nice_ exception stacktrace (explaining almost everything) in case of even OOM which SUN JVN still fails to produce. SolrServlet still catches Throwable: } catch

Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick
UPDATE: After few more minutes (after previous commit): docsPending: about 7,000,000 After commit: numDocs: 2,297,231 Increase = 2,297,231 - 1,281,851 = 1,000,000 (average) So that I have 7 docs with same ID in average. Having 100,000,000 and then dropping below 1,000,000 is strange; it is a

Re: SOLR uniqueKey - extremely strange behavior! Documents disappeared...

2009-08-17 Thread Funtick
sorry for typo in prev msg, Increase = 2,297,231 - 1,786,552 = 500,000 (average) RATE (non-unique-id:unique-id) = 7,000,000 : 500,000 = 14:1 but 125:1 (initial 30 hours) was very strange... Funtick wrote: UPDATE: After few more minutes (after previous commit): docsPending: about

Re: delta-import using a full-import command is not working

2009-08-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can take a nightly of DIH jar alone. It is quite stable On Tue, Aug 18, 2009 at 8:21 AM, djain101dharmveer_j...@yahoo.com wrote: Looks like this issue has been fixed on Sept 20, 2008 against issue SOLR-768. Can someone please let me know which one is a stable jar after Sept 20, 2008.

Re: DataImportHandler - very slow delta import

2009-08-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
delta imports are likely to be far slower that the full imports because it makes one db call per changed row. if you can write the query in such a way that it gives only the changed rows, then write a separate entity (directly under document) and just run a full-import with that entity only. On

Re: delta-import using a full-import command is not working

2009-08-17 Thread djain101
Can you please point me to the url for downloading latest DIH? Thanks for your help. Noble Paul നോബിള്‍ नोब्ळ्-2 wrote: you can take a nightly of DIH jar alone. It is quite stable On Tue, Aug 18, 2009 at 8:21 AM, djain101dharmveer_j...@yahoo.com wrote: Looks like this issue has been

Re: delta-import using a full-import command is not working

2009-08-17 Thread Noble Paul നോബിള്‍ नोब्ळ्
http://people.apache.org/builds/lucene/solr/nightly/ you can just replace the dataimporthandler jar in your current installation and it should be fine On Tue, Aug 18, 2009 at 11:18 AM, djain101dharmveer_j...@yahoo.com wrote: Can you please point me to the url for downloading latest DIH? Thanks