date:20090326

optimization advice?

2009-03-26 Thread Steve Conover

Hi, I've looked over the public Solr perf docs and done some searching on this mailing list. Still, I'd like to seek some advice based on my specific situation: - 2-3 million documents / 5GB index - each document has 40+ indexed fields, and many multivalue fields - only primary keys are "stored"

How to take Index Backup

2009-03-26 Thread dabboo

Hi, I have to take backup of the indexes, which are there in my solr server. I know that we will have to give the target path in scripts.conf file, but I want to know below things. 1. What is the userId to be given in scripts.conf file. 2. How & where to run the scripts given in bin folder under

Re: replication requesthandler solr1.4 slow answer

2009-03-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

replication must work fine w/o applying any patch everything is committed On Thu, Mar 26, 2009 at 6:42 PM, sunnyfr wrote: > > Hi, > > Since I put this functionnality on, on my servers it takes sometimes a long > time to get a respond for a select > sometimes Qtime = 4sec some other 200msec ? > >

Re: DIH deleted instances delta-import

2009-03-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

it runs the 'deletedPkQuery' and the resultset is used to delete the docs. what specifically is your doubt? On Thu, Mar 26, 2009 at 10:25 PM, Rui Pereira wrote: > I can't find that much information about the handling of deleted rows by DIH > in delta-imports, can you show me some examples? > Tha

Re: "Unable to move index file" error during replication

2009-03-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

this fix is there in the trunk , you may not need to apply the patch On Fri, Mar 27, 2009 at 6:02 AM, sunnyfr wrote: > > Hi, > > It doesn't seem to work for me, I changed as well this part below is it ok?? >> - List copiedfiles = new ArrayList(); >> + Set filesToCopy = new HashSet(); > > ht

Incorrect sort with with function query in query parameters

2009-03-26 Thread Asif Rahman

Hi all, I'm having an issue with the order of my results when attempting to sort by a function in my query. Looking at the debug output of the query, the score returned with in the result section for any given document does not match the score in the debug output. It turns out that if I optimiz

Re: Initial query performance poor after update / delete

2009-03-26 Thread Otis Gospodnetic

Tom, Aha, so you are using a single server for index updates, deleted, and searches. This is OK for small setups and in itself is not the source of this slowness. The problem is likely caused by you swapping searchers after each index update/delete, and probably without warming up the new se

Re: "Unable to move index file" error during replication

2009-03-26 Thread sunnyfr

Hi, It doesn't seem to work for me, I changed as well this part below is it ok?? > -List copiedfiles = new ArrayList(); > +Set filesToCopy = new HashSet(); http://www.nabble.com/file/p22734005/ReplicationHandler.java ReplicationHandler.java Thanks a lot, Noble Paul നോബിള്‍ नोब्ळ्

Initial query performance poor after update / delete

2009-03-26 Thread TomWilliamson

I'm developing a site (currently single server) that uses localsolr to perform geo searches on ~200,000 small records although I'm expecting this to grow significantly once I go live. So far, so good but I've noticed that after any updates / deletions to the index the first query is then very slo

Re: Partition index by time using Solr

2009-03-26 Thread Otis Gospodnetic

Hi, 1) There is no need for Lucene at all. That "indexer" is whatever object you use to send your 10K docs to Solr. Presumably each Solr instance you end up creating will have its own "indexer" object in your application. 2) http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc

Re: dismax query not working with 1.4

2009-03-26 Thread Otis Gospodnetic

XML is getting eaten by my mail client (Yahoo mail) when I hit reply. Lame. But your config: dismax explicit is missing qf and other parameters. Which fields is your DisMax supposed to query? It doesn't know, they are not in the config above. Otis -- Sematext

Re: Partition index by time using Solr

2009-03-26 Thread vivek sar

Thanks again Otis. Few more questions, 1) My app currently is a stand-alone java app (not part of Solr JVM) that simply calls update webservice on Solr (running in a separate web container) passing 10k documents at once. In your example you mentioned getting list of Indexers and adding document t

Re: dismax query not working with 1.4

2009-03-26 Thread Ben Lavender

Did the XML in that message come through okay? Gmail seems to be eating it on my end. Anyway, while the default config has those fields, it also fails with the application config, which has: dismax explicit Since this essentially the same as standard, I assumed it would

Re: dismax query not working with 1.4

2009-03-26 Thread Otis Gospodnetic

Standard searches your default field (specified in schema.xml). DisMax searches fields you specify in DisMax config. Yours has: text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 But there are not your real fields. Change that to your real fields in qf, pf and other parts of DisM

Re: dismax query not working with 1.4

2009-03-26 Thread Ben Lavender

I do not have a qf set; this is the query generated by the admin interface: dismax: select?indent=on&version=2.2&q=test&start=0&rows=10&fl=*%2Cscore&qt=dismax&wt=standard&explainOther=&hl.fl= standard: select?indent=on&version=2.2&q=test&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explain

Re: Indexing arbitrary RDF resources

2009-03-26 Thread Renaud Delbru

Hi, Here in DERI [1], we are working on an extension for Lucene / Solr to handle RDF data and structured queries. The engine is currently in use in the Sindice [2] search engine. We are planning to release our extension, called SIREn (for Semantic Information Retrieval Engine), as open source

Re: Search transparently with Solr with multiple cores, different indexes, common response type

2009-03-26 Thread Stephen Weiss

I have a very similar setup and that's precisely what we do - except with JSON. 1) Request comes into PHP 2) PHP runs the search against several different cores (in a multicore setup) - ours are a little more than "slightly" different 3) PHP constructs a new object with the responseHeader a

Re: dismax query not working with 1.4

2009-03-26 Thread Matt Mitchell

Do you have qf set? Just last week I had a problem where no results were coming back, and it turned out that my qf param was empty. Matt On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender wrote: > Hello, > > I'm using the March 18th 1.4 nightly, and I can't get a dismax query > to return results. T

Re: dismax query not working with 1.4

2009-03-26 Thread Otis Gospodnetic

Hi, Here's some info that might be helpful: - URL you are accessing - Response you are getting if any - Any errors mentioned in the logs - Your dismax config section from solrconfig.xml - Your fields from schema.xml used in DisMax config in solrconfig.xml Otis -- Sematext -- http://sematext.com

Re: Search transparently with Solr with multiple cores, different indexes, common response type

2009-03-26 Thread Chris Hostetter

: Data "A", "B", and "C" are slightly different, thus they are indexed : differently; obviously the client receives the search results for all data : types in a consistent/common format. The client application shall be able to : search among each or all data types ("A", "B", "C"). The order will b

dismax query not working with 1.4

2009-03-26 Thread Ben Lavender

Hello, I'm using the March 18th 1.4 nightly, and I can't get a dismax query to return results. The standard and partitioned query types return data fine. I'm using jetty, and the problem occurs with the default solrconfig.xml as well as the one I am using, which is the Drupal module, beta 6. Th

Indexing arbitrary RDF resources

2009-03-26 Thread remus

Hey, all! I'm planning a project where I want to write software that takes an RDF class and uses that information to dynamically support indexing and faceted searching of resources of that type. This would (as I imagine it) function with dynamic fields in all required data types and multiplicit

Re: Status of an update request

2009-03-26 Thread Chris Hostetter

: > What I want, however, is an accurate description of the error and not just : > a standard Apache error code. : > Is there a way to obtain an XML response file from solr ? : If the update command executes successfully, then the response is XML. In : case of error, the error page is generated b

Re: Delta import

2009-03-26 Thread AlexxelA

I found what was the prob. I was using a mysql view and it seems it don't take in consideration the index i had on the last_modified field from the original table ><. Mysql calls were taking 1 sec each :| I just switch back to a request with join instead of a request to my view. Now doing aroun

Re: Partition index by time using Solr

2009-03-26 Thread Otis Gospodnetic

Hi, 1) Look for "multicore" on Solr Wiki 2) I meant to say you would not index it all in one index (that's what you wanted to do, no?). So in your app you'd do something like ts = doc.getTimestamp(); indexer = getIndexer(ts); // gives you different indexer based on the ts. You keep track of

Re: Strange anomaly(?) with string matching in query

2009-03-26 Thread Kurt Nordstrom

Changed the config so that both WordDelimiterFilterFactory settings on both index and query use: org.apache.solr.analysis.WordDelimiterFilterFactory {generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} Restarted Solr, reindexed the records. Unfortunat

Re: Size of my index directory increase considerably

2009-03-26 Thread Jaco

Hi, After installing that patch, all is running fine for me as well - problem no longer occurring and replication running great! The issue https://issues.apache.org/jira/browse/SOLR-978 has already been committed, so it's also there in the 1.4 nightly builds. Bye, Jaco. 2009/3/26 sunnyfr >

Re: Solr OpenBitSet OutofMemory Error

2009-03-26 Thread Yonik Seeley

On Thu, Mar 26, 2009 at 10:58 AM, Otis Gospodnetic wrote: > Yes, UnInvertedField uses OpenBitSet. Right, for those terms that match a large percent of the documents. But filtering (fq params) also takes up space, so you don't want the filterCache too large. Look at the stats page in solr admin...

More than one language in the same document

2009-03-26 Thread ashokc

Hi, I have documents where text from two languages, e.g. (english & korean) or (english & german) are mixed u p in a fairly intensive way. 20-30% of the text is in English and the rest in the other. Can somebody indicate how I should set up the 'analyzers' and 'fields' in schema.xml? Should I hav

Re: Partition index by time using Solr

2009-03-26 Thread vivek sar

Thanks Otis for the response. I'm still not clear on few things, 1) I thought Solr can work with only one index at a time. In order to have multiple indexes you need multiple instances of Solr - isn't that right? How can we make Solr to read/ write from and to multiple indexes? 2) What does it me

DIH deleted instances delta-import

2009-03-26 Thread Rui Pereira

I can't find that much information about the handling of deleted rows by DIH in delta-imports, can you show me some examples? Thanks in advance, Rui Pereira

Re: comma delimited files

2009-03-26 Thread Otis Gospodnetic

Nga, It doesn't out of the box, but it could. I think you could achieve this with either a custom XSLT that transforms the typical XML response into a new format, or by writing a completely custom Response writer. See: http://wiki.apache.org/solr/QueryResponseWriter http://wiki.apache.org

Re: How to search for "C++"?

2009-03-26 Thread Yonik Seeley

Synonym mappings are an easy way to handle specific cases like these... C++ => cplusplus C# => csharp -Yonik http://www.lucidimagination.com On Thu, Mar 26, 2009 at 9:27 AM, Jana, Kumar Raja wrote: > Hi Leonardo, > 1. U can change the fieldtype to "string" in which case no tokenizers > will act

Multicore Solr not showing Cache Stats

2009-03-26 Thread Otis Gospodnetic

Hello, I'm having a hard time getting a multi-core Solr instance caches to show up on Stats/Cache Admin page. This works fine with non-multicore Solr instances, of course. This is with Solr 1.4-dev 753608 (from Match 14th). Here are the details: - Solr home is at /data/solr_home - Jetty i

Re: indexing mutiple table

2009-03-26 Thread John Martyniak

You could probably create a type field in the index to indicate the task type. And then use the task type plus the primary key from the db to create the Id within the index. Would save you alot of on maintenance, and has a bunch benefits. -John On Mar 26, 2009, at 8:23 AM, "Radha C." wr

comma delimited files

2009-03-26 Thread nga pham

Hi All, Can Solr export into comma delimited files? thank you, Nga

Re: Solr OpenBitSet OutofMemory Error

2009-03-26 Thread Otis Gospodnetic

Harish, Yes, UnInvertedField uses OpenBitSet. As for the profiler, YouKit is a good one - http://www.youkit.com/ Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: smock > To: solr-user@lucene.apache.org > Sent: Thursday, March 26, 2009 9:58

Re: Not able to configure multicore

2009-03-26 Thread Otis Gospodnetic

Hi, You should be able to access http://./solr2 There, you should see all your cores and clicking on them should take you to /solr2/CoreNameHere/admin Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: mitulpatel > To: solr-user@lucene.

Re: Strange anomaly(?) with string matching in query

2009-03-26 Thread Otis Gospodnetic

Kurt, Attributes for WordDelimiterFilterFactory have different values in the "index" vs. "query" sections. Do things work if you make them identical? (you'll have to reindex) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Kurt Nordstro

Re: solr_hostname in scripts.conf

2009-03-26 Thread Otis Gospodnetic

Tim, If localhost doesn't work for some reason, you can always use 127.0.0.1 . That "localhost" is typically defined in /etc/hosts . Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Garafola Timothy > To: solr-user@lucene.apache.org > S

Re: How to avoid case sensitive search? use LowerCaseTokenizer

2009-03-26 Thread Otis Gospodnetic

Was writing the email and reading responses really faster than http://www.google.com/search?q=solr+case+insensitive ? :) - Original Message > From: con > To: solr-user@lucene.apache.org > Sent: Thursday, March 26, 2009 2:43:44 AM > Subject: How to avoid case sensitive search? > > >

Re: solr_hostname in scripts.conf

2009-03-26 Thread Bill Au

That should be OK. I did a quick scan of all the scripts that use $solr_hostname. It defaults to localhost if it is not set. Bill On Wed, Mar 25, 2009 at 7:24 PM, Garafola Timothy wrote: > I've a question. Is it safe to use 'localhost' as solr_hostname in > scripts.conf? > > -- > -Tim >

Re: Solr OpenBitSet OutofMemory Error

2009-03-26 Thread smock

Hi Otis, Thanks for the feedback - I'm pretty new to Java, could you or anyone else give me some pointers on how to run Solr with a debugger/profiler? It would be really appreciated. More generally, is OpenBitSet a utility for UnInvertedFields? Is it reasonable to expect that this has somethin

Re: Size of my index directory increase considerably

2009-03-26 Thread sunnyfr

Just applied this patch : http://www.nabble.com/Solr-Replication%3A-disk-space-consumed-on-slave-much-higher-than-on--master-td21579171.html#a21622876 It seems to work well now. Do I have to do something else ? Do you reckon something for my configuration ? Thanks a lot -- View this message in

RE: How to search for "C++"?

2009-03-26 Thread Jana, Kumar Raja

Hi Leonardo, 1. U can change the fieldtype to "string" in which case no tokenizers will act on ur data and the content will be stored as is. 2. If u are using Solr 1.4 (latest) then there is a provision to mention protected words for WordDelimiterFilterFactory which will take care of your issue.

How to search for "C++"?

2009-03-26 Thread Leonardo Dias

Hello there! Currently we're having a problem in here and we're looking for some solutions. Right now we use the Standard Tokenizer to separate tokens and we just found out that we cannot search for "c++" in our index because it is not considered a word. Since we need this search to work pro

Re: replication requesthandler solr1.4 slow answer

2009-03-26 Thread sunnyfr

sunnyfr wrote: > > Hi, > > Since I put this functionnality on, on my servers it takes sometimes a > long time to get a respond for a select > sometimes Qtime = 4sec some other 200msec ? > > Do you know why? and when I look at my servers graph, users part is very > used since I've applied this

replication requesthandler solr1.4 slow answer

2009-03-26 Thread sunnyfr

Hi, Since I put this functionnality on, on my servers it takes sometimes a long time to get a respond for a select sometimes Qtime = 4sec some other 200msec ? Do you know why? and when I look at my servers graph, users part is very used since I've applied this two patch. Thanks for your help. I

replication requesthandler solr1.4

2009-03-26 Thread sunnyfr

Hi, Since I put this functionnality on, on my servers it takes sometimes a long time to get a respond for a select sometimes Qtime = 4sec some other 200msec ? Do you know why? and when I look at my servers graph, users part is very used since I've applied this two patch. Thanks for your help. I

RE: indexing mutiple table

2009-03-26 Thread Radha C.

Giovanni, Much Thanks for the reply. We are having seperate set of tables for each task. So we are going to provide different search based on the task. The tables of one task are unrelated to tables of another task. _ From: Giovanni De Stefano [mailto:giovanni.destef...@gmail.com]

Re: indexing mutiple table

2009-03-26 Thread Giovanni De Stefano

Hello, that might be a solution although it is a maintenance nightmare... Are all those tables completely unrelated? Meaning does each table produce a totally different document? Either or when you perform a search you must return a common document (unless your client is able to distinguish betw

RE: indexing mutiple table

2009-03-26 Thread Radha C.

Thanks for your reply. If I want to search the my data spread over many tables say more than 50 tables, then I have to setup that many cores ? _ From: Giovanni De Stefano [mailto:giovanni.destef...@gmail.com] Sent: Thursday, March 26, 2009 5:04 PM To: solr-user@lucene.apache.org; cra..

Re: DIH don't abort

2009-03-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

The SqlEntityProcessor does not ignore the error because SQL errors are usually serious errors. How come you have a wrong table name? On Thu, Mar 26, 2009 at 4:15 PM, Rui Pereira wrote: > Is there a way for DIH not to abort when an entity query is wrong (invalid > table name or table field), this

Re: Deleting documents

2009-03-26 Thread Noble Paul നോബിള്‍ नोब्ळ्

I can't make out what the obvious mistake is BTW why don't you use SolrJ? http://wiki.apache.org/solr/SolrJ --Noble On Thu, Mar 26, 2009 at 3:57 PM, Rui Pereira wrote: > Here is the code where I make the request: > Document xmlDocument = this.constructDeleteXml(); > > try { >

Re: indexing mutiple table

2009-03-26 Thread Giovanni De Stefano

Hello, I believe you should use 2 different indexes, 2 different cores and write a custom request handler or any other client that forwards the query to the cores and merge the results. Cheers, Giovanni On 3/26/09, Radha C. wrote: > > Hi, > > I am trying to index different tables with differen

Re: SRW/U and OAI-PMH servers over solr

2009-03-26 Thread Miguel Coxo

How can i access your oai interface (server) ? On Wed, Mar 25, 2009 at 9:01 PM, Ryan McKinley wrote: > I implemented OAI-PMH for solr a few years back for the Massachusetts > library system... it appears not to be running right now, but check... > http://www.digitalcommonwealth.org/ > > It wou

indexing mutiple table

2009-03-26 Thread Radha C.

Hi, I am trying to index different tables with different primary keys and different fields. Table A - primary field is a_id Table B - primary fiedls is b_id How to specify two different primary keys for two different tables in schema.xml? Is it possible to create a data-config with differen

DIH don't abort

2009-03-26 Thread Rui Pereira

Is there a way for DIH not to abort when an entity query is wrong (invalid table name or table field), this is, a way to continue with the next entity. Thanks in advance, Rui Pereira

Re: How do I accomplish this (semi-)complicated setup?

2009-03-26 Thread Aleksander M. Stensby

If you are saying that the number of private repos for the user is limited to say less than 10 or something like that, the query wouldn't be very long... Something like public:true OR (repo_id:(1 OR 2 OR 3 OR 4) etc. - Aleks On Thu, 26 Mar 2009 09:33:14 +0100, Jesper Nøhr wrote: Ah. Well

Re: Deleting documents

2009-03-26 Thread Rui Pereira

Here is the code where I make the request: Document xmlDocument = this.constructDeleteXml(); try { URL url = new URL(this.solrPath + "/update"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setDoOut

Size of my index directory increase considerably

2009-03-26 Thread sunnyfr

Hi, I don't understand how my index folder can pass from 11G to 45G? Is it a prob with my segment? For information I'm using solr 1.4, i've 14M of docs. The first full import or optimize low down size to 11G. I'm updating data (delta-import) every 30 mn for about 50 000docs updated every time.

Re: How do I accomplish this (semi-)complicated setup?

2009-03-26 Thread Jesper Nøhr

Ah. Well that's what I thought, and that's where I get confused. Realistically speaking, we have, say, 10.000 public repositories and any given user may have 2 or 3 private repositories. This means that when the user searches, he should search among all those 10.000 public ones, but also his 2 or

Re: Scheduling DIH

2009-03-26 Thread fergus mcmenemie

H, my tuppence worth! IMHO I do not think this should be built into solr. Doing it properly leads to all kinds of nasty platform dependent issues... will we then want to add notification features on success/failure? via email? Ideally, all the scheduled activities on a system should be ce

63 matches

Mail list logo