Re: Realtime search and facets with very frequent commits

2010-04-05 Thread Janne Majaranta
Yeah, thanks for pointing this out. I'm not using any relevancy functions (yet). The data indexed for my app is basically log events. The most relevant events are the newest ones, so sorting by timestamp is enough. BTW, your book is great ;) -Janne 2010/3/31 Smiley, David W. > Janne, >

Re: How to add new entity to the solr index without having to re-index previously stored data.

2010-04-05 Thread MitchK
Maddy, you need to reindex the whole record, if you change or add any kind of data that belongs to it. Please, note that you need to subscribe to the solr-user-mailing list, since not everyone is using Nabble to get Mailinglist-postings. Kind regards, - Mitch Maddy.Jsh wrote: > > I indexed

Re: including external files in config by corename

2010-04-05 Thread Mark Miller
On 04/05/2010 10:12 PM, Chris Hostetter wrote: : The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : : http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution Except that XInclude

Re: Multicore and TermVectors

2010-04-05 Thread Lance Norskog
There is no query parameter. The query parser throws an NPE if there is no query parameter: http://issues.apache.org/jira/browse/SOLR-435 It does not look like term vectors are processed in distributed search anyway. On Mon, Apr 5, 2010 at 4:45 PM, Chris Hostetter wrote: > > : Subject: Multicor

Re: Need info on CachedSQLentity processor

2010-04-05 Thread bbarani
Mark, I have opened a JIRA issue - https://issues.apache.org/jira/browse/SOLR-1867 Thanks, Barani -- View this message in context: http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tp698418p699329.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Some help for folks trying to get new Solr/Lucene up in Eclipse

2010-04-05 Thread Lance Norskog
I had a slight hiccup that I just ignored. Even when I used Java 1.6 JDK mode, Eclipse did not know this method. I had to comment out the three places that use this method. javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(true) Lance Norskog On Mon, Apr 5, 2010 at 1:49 PM, Mattmann, Chr

Re: including external files in config by corename

2010-04-05 Thread Chris Hostetter
: The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : : http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution Except that XInclude is a feature of hte XML parser, while proper

Re: Need info on CachedSQLentity processor

2010-04-05 Thread Mark Miller
On 04/05/2010 02:28 PM, bbarani wrote: Hi, I am using cachedSqlEntityprocessor in DIH to index the data. Please find below my dataconfig structure, ---> object --> object properties For each and every object I would be retrieveing corresponding object properties (in my subqueries). I ge

Re: including external files in config by corename

2010-04-05 Thread Mark Miller
On 04/05/2010 01:53 PM, Shawn Heisey wrote: Is it possible to access the core name in a config file (such as solrconfig.xml) so I can include core-specific configlets into a common config file? I would like to pull in different configurations for things like shards and replication, but have al

Re: exact match coming as second record

2010-04-05 Thread Mark Fletcher
Hi Eric, Thanks many for your mail! Please find attached the debugQuery results. Thanks! Mark On Mon, Apr 5, 2010 at 7:38 PM, Erick Erickson wrote: > What do you get back when you specify &debugQuery=on? > > Best > Erick > > On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher > wrote: > > > Hi, > >

Re: Solr caches and nearly static indexes

2010-04-05 Thread Yonik Seeley
On Mon, Apr 5, 2010 at 9:10 PM, Chris Hostetter wrote: > > > : > ... the reusing the FieldCache seems like hte only thing that would be > : > advantageous in that case > : > : And FieldCache entries are currently reused when there have only been > : deletions on a segment (since Solr 1.4). > > But

Re: Solr caches and nearly static indexes

2010-04-05 Thread Chris Hostetter
: > ... the reusing the FieldCache seems like hte only thing that would be : > advantageous in that case : : And FieldCache entries are currently reused when there have only been : deletions on a segment (since Solr 1.4). But that's kind of orthoginal to (what i think) Lance's point was: that

Re: Solr caches and nearly static indexes

2010-04-05 Thread Yonik Seeley
On Mon, Apr 5, 2010 at 9:04 PM, Chris Hostetter wrote: > ... the reusing the FieldCache seems like hte only thing that would be > advantageous in that case And FieldCache entries are currently reused when there have only been deletions on a segment (since Solr 1.4). -Yonik http://www.lucidimagin

Re: Solr caches and nearly static indexes

2010-04-05 Thread Chris Hostetter
: We had exactly this problem in a consumer app; we had a small but : continuously growing list of obscene documents in the index, and did : not want to display these. So, we had a filter query with all of the : obscene words, and used this with every query. that doesn't seem like it would really

Re: Solr caches and nearly static indexes

2010-04-05 Thread Chris Hostetter
: times. Is there any way to have the index keep its caches when the only thing : that happens is deletions, then invalidate them when it's time to actually add : data? It would have to be something I can dynamically change when switching : between deletions and the daily import. The problem is

Re: Multicore and TermVectors

2010-04-05 Thread Chris Hostetter
: Subject: Multicore and TermVectors It doesn't sound like Multicore is your issue ... it seems like what you mean is that you are using distributed search with TermVectors, and that is causing a problem. Can you please clarify exactly what you mean ... describe your exact setup (ie: how mana

Re: one particular doc in results should always come first for a particular query

2010-04-05 Thread Chris Hostetter
: If that's the case, you could copy the magic keyword to a different field : (say magic_keyword) and boost it right into orbit as an OR clause : (magic_keyword:bonkers ^1). This kind of assumes that a magic keyword : corresponds to one and only one document : : If this is way off base, p

Re: exact match coming as second record

2010-04-05 Thread Erick Erickson
What do you get back when you specify &debugQuery=on? Best Erick On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher wrote: > Hi, > > I am using the dismax handler. > I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have > boosted myfield^20.0. > Even with such a high boost (in fact

Re: one particular doc in results should always come first for a particular query

2010-04-05 Thread Erick Erickson
Hmmm, how do you know which particular record corresponds to which keyword? Is this a list known at index time, as in "this record should come up first whenever "bonkers" is the keyword? If that's the case, you could copy the magic keyword to a different field (say magic_keyword) and boost it righ

exact match coming as second record

2010-04-05 Thread Mark Fletcher
Hi, I am using the dismax handler. I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have boosted myfield^20.0. Even with such a high boost (in fact among the qf fields specified this field has the max boost given), when I search for XXX.YYY.ZZZ I see my record as the second one

Re: no of cfs files are more that the mergeFactor

2010-04-05 Thread Lance Norskog
mergeFactor=5 means that if there are 42 documents, there will be 3 index files: 1 with 25 documents, 3 with 5 documents, and 1 with 2 documents Imagine making change with coins of 1 document, 5 documents, 5^2 documents, 5^3 documents, etc. On Mon, Apr 5, 2010 at 10:59 AM, Chris Hostetter wrote

Re: including external files in config by corename

2010-04-05 Thread Lance Norskog
Making snippets is part of highlighting. http://www.lucidimagination.com/search/s:lucid/li:cdrg?q=snippet On Mon, Apr 5, 2010 at 10:53 AM, Shawn Heisey wrote: > Is it possible to access the core name in a config file (such as > solrconfig.xml) so I can include core-specific configlets into a com

Re: Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor

2010-04-05 Thread Lance Norskog
The MailEntityProcessor is an "extra" and does not come normally with the DataImportHandler. The wiki page should mention this. In the Solr distribution it should be in the dist/ directory as dist/apache-solr-dataimporthandler-extras-1.4.jar. The class it wants is in this jar . (Do 'unzip -l jar'

one particular doc in results should always come first for a particular query

2010-04-05 Thread Mark Fletcher
Hi, Suppose I search for the word *international. *A particular record (say * recordX*) I am looking for is coming as the Nth result now. I have a requirement that when a user queries for *international *I need recordX to always be the first result. How can I achieve this. Note:- When user searc

Re: Minimum Should Match the other way round

2010-04-05 Thread MitchK
Sorry for doubleposting, but to avoid any missunderstanding: Accessing instantiated filters is not a really good idea, since a new Filter must be instantiated all the time. However, what I have ment was: if I create a WordDelimiterFilter or a StopFilter and I have set a param for a file like stop

Re: Obtaining SOLR index size on disk

2010-04-05 Thread Lance Norskog
This information is not available via the API. If you would like this information added to the statistics request, please file a JIRA requesting it. Without knowing the size of the index files to be transferred, the client cannot monitor its own disk space. This would be useful for the cloud manag

Some help for folks trying to get new Solr/Lucene up in Eclipse

2010-04-05 Thread Mattmann, Chris A (388J)
Hey All, Just to save some folks some time in case you are trying to get new Lucene/Solr up in running in Eclipse. If you continue to get weird errors, e.g., in solr/src/test/TestConfig.java regarding org.w3c.dom.Node#getTextContent(), I found for me this error was caused by including the Tidy.jar

Re: Solr caches and nearly static indexes

2010-04-05 Thread Lance Norskog
In a word: "no". What you can do instead of deleting them is to add them to a growing list of "don't search for these documents". This could be listed in a filter query. We had exactly this problem in a consumer app; we had a small but continuously growing list of obscene documents in the index,

Re: Index db data

2010-04-05 Thread MitchK
It seems to work ;). However, trueman, you should subscribe to solr-user@lucene.apache.org, since not everybody looks up Nabble for mailing-list postings. - Mitch -- View this message in context: http://n3.nabble.com/Index-db-data-tp693204p698691.html Sent from the Solr - User mailing list a

Re: Minimum Should Match the other way round

2010-04-05 Thread MitchK
Thank you both for responsing. Hoss, what you've pointed out was exactly what I am looking for. However, I would *always* prefer the second implementation, because of the fact that you have to compute the number of terms for all records only for *one* time. :-) At the moment I would feel like w

Re: Experience with indexing billions of documents?

2010-04-05 Thread Lance Norskog
The 2B limitation is within one shard, due to using a signed 32-bit integer. There is no limit in that regard in sharding- Distributed Search uses the stored unique document id rather than the internal docid. On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens wrote: > A colleague of mine is using nati

Re: Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor

2010-04-05 Thread Andrew McCombe
Hi Can no-one help me with this? Andrew On 2 April 2010 22:24, Andrew McCombe wrote: > Hi > > I am experimenting with Solr to index my gmail and am experiencing an error: > > 'Unable to load MailEntityProcessor or > org.apache.solr.handler.dataimport.MailEntityProcessor' > > I downloaded a fres

Re: Read Time Out Exception while trying to upload a huge SOLR input xml

2010-04-05 Thread Lance Norskog
Solr also has a feature to stream from a local file rather than over the network. The parameter stream.file=/full/local/file/name.txt means 'read this file from the local disk instead of the POST upload'. Of course, you have to get the entire file onto the Solr indexer machine (or a common file

Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Robert Muir
On Mon, Apr 5, 2010 at 2:28 PM, Chris Hostetter wrote: > > Robert: BOMs are one of those things that strike me as being abhorent and > inheriently evil because they seem to cause nothing but problems -- > Yes. > > If text files that start with a BOM aren't properly being dealt with by > Solr ri

Re: Minimum Should Match the other way round

2010-04-05 Thread Chris Hostetter
: > However, I am searching for a solution that does something like: "this is my : > query" and the document has to consist of this query plus maximal - for : > example - two another terms? ... : Not quite following. It sounds like you are saying you want to favor : docs that are shorter,

Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Yonik Seeley
On Mon, Apr 5, 2010 at 2:28 PM, Chris Hostetter wrote: > If text files that start with a BOM aren't properly being dealt with by > Solr right now, should we consider that a bug? It's a Java bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058 But we should fix if it's practical to do

Re: selecting documents older than 4 hours

2010-04-05 Thread Chris Hostetter
: NOW/HOUR-5HOURS evaluates to 2010-03-31T21:00:00 which should not be the : case if the current time is Wed Mar 31 19:50:48 PDT 2010. Is SOLR converting : NOW to GMT time? 1) "NOW" means "Now" ... what moment in time is happening right at this moment is independent of what locale you are in an

RE: Query time only Ranges

2010-04-05 Thread Chris Hostetter
: Actually I needed time upto seconds granularity, so did you mean I : should index the field after conversion into seconds it doesnt' relaly matter what granularity you need -- the point is if you need to query for things based on time of day, independent of hte actual date, then the best way

Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Chris Hostetter
: Some applications (such as Windows Notepad), insert a UTF-8 Byte Order Mark : (BOM) as the first character of the file. So, perhaps the first word in your : stopwords list contains a UTF-8 BOM and thats why you are seeing this : behavior. Robert: BOMs are one of those things that strike me as b

Need info on CachedSQLentity processor

2010-04-05 Thread bbarani
Hi, I am using cachedSqlEntityprocessor in DIH to index the data. Please find below my dataconfig structure, ---> object --> object properties For each and every object I would be retrieveing corresponding object properties (in my subqueries). I get in to OOM very often and I think thats a

Re: exceptionhandling & error-reporting?

2010-04-05 Thread Chris Hostetter
: This client uses a simple user-agent that requires JSON-syntax while parsing : searchresults from solr, but when solr drops an exception, tomcat returns an : error-500 page to the client and it crashes. define "crashes" ? ... presumabl you are tlaking about the client crashing because it ca

Re: Getting solr response in HTML format : HTMLResponseWriter

2010-04-05 Thread Chris Hostetter
: so I have tried to attach the xslt steelsheet to the response of SOLR with : passing this 2 variables wt=xslt&tr=example.xsl : : while example.xsl is an included steelsheet to SOLR , but the response in : HTML was'nt very perfect . can you elaborate on what you mean by "wasn't very perfect" ?

Re: no of cfs files are more that the mergeFactor

2010-04-05 Thread Mark Miller
I'm guessing the user is expecting there to be one cfs file for the index, and does not understand that its actually per segment. On 04/05/2010 01:59 PM, Chris Hostetter wrote: This sounds completley normal form what i remembe about mergeFactor. Segmenets are merged "by level" meaning that wit

Re: Related terms/combined terms

2010-04-05 Thread Blargy
Ahmet thanks, this sounds like what I was looking for. Would one recommend using the TermsComponent prefix search or the Faceted prefix search for this sort of functionality. I know for auto-suggest functionality the generally consensus has been leaning towards the Faceted prefix search over the

Re: no of cfs files are more that the mergeFactor

2010-04-05 Thread Chris Hostetter
This sounds completley normal form what i remembe about mergeFactor. Segmenets are merged "by level" meaning that with a mergeFactor of 5, once 5 "level 1" segments are formed they are merged into a single "level 2" segment. then 5 more "level 1" segments are allowed to form before the next m

Re: Related terms/combined terms

2010-04-05 Thread Blargy
Thanks for the response Mitch. I'm not too sure how well this will work for my needs but Ill certainly play around with it. I think something more along the lines of Ahmet's solution is what I was looking for. -- View this message in context: http://n3.nabble.com/Related-terms-combined-terms-

including external files in config by corename

2010-04-05 Thread Shawn Heisey
Is it possible to access the core name in a config file (such as solrconfig.xml) so I can include core-specific configlets into a common config file? I would like to pull in different configurations for things like shards and replication, but have all the cores otherwise use an identical confi

Re: dismax multi search?

2010-04-05 Thread Chris Hostetter
: I want to be able to direct some search terms to specific fields : : I want to do something like this : : keyword1 should search against book titles / authors : : keyword2 should search against book contents / book info / user reviews your question is a little vague ... will keyword1 and key

Re: feature request for ivalid data formats

2010-04-05 Thread Chris Hostetter
: : I don't know whether this is the good place to ask it, or there is a special : tool for issue : requests. We use Jira for bug reports and feature reuqests, but it's always a good idea to start with a solr-user email before filing a new bug/request to help discuss the behavior you are seeing

Re: MoreLikeThis function queries

2010-04-05 Thread Blargy
Ok its now monday and everyone should have had their nice morning cup of coffee :) -- View this message in context: http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p698304.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does Lucidimagination search uses Multi facet query filter or uses session?

2010-04-05 Thread Grant Ingersoll
We are using multiselect facets like what you have below (although I haven't verified your syntax). So no, we are not using sessions. See http://www.lucidimagination.com/search/?q=multiselect+faceting#/s:email for help. -Grant http://www.lucidimagination.com On Apr 1, 2010, at 12:35 PM, bbara

Re: Minimum Should Match the other way round

2010-04-05 Thread Grant Ingersoll
On Apr 3, 2010, at 10:18 AM, MitchK wrote: > > Hello, > > I want to tinkle a little bit with Solr, so I need a little feedback: > Is it possible to define a Minimum Should Match for the document itself? > > I mean, it is possible to say, that a query "this is my query" should only > match a do

Re: excluder filters and multivalued fields

2010-04-05 Thread Chris Hostetter
: name->john : year->2009;year->2010;year->2011 : : And I query for: : q=john&fq=-year:2010 : : Doc1 won't be in the matching results. Is there a way to make it appear : because even having 2010 the document has also years that don't match the : filter query? Not natively -- but you can index a

Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Erick Erickson
I still don't see what the difference is. If there was a distinct add/update process, how would that absolve you from having to implement your own queueing? To have predictable index content, you still must order your operations. Best Erick On Mon, Apr 5, 2010 at 12:45 PM, Julian Davchev wrote:

Re: Related terms/combined terms

2010-04-05 Thread Ahmet Arslan
> Not sure of the exact vocabulary I am looking for so I'll > try to explain > myself. > > Given a search term is there anyway to return back a list > of related/grouped > keywords (based on the current state of the index) for that > term. > > For example say I have a sports catalog and I searc

Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Israel Ekpo
Chris, I don't see anything in the headers suggesting that Julian's message was a hijack of another thread On Thu, Apr 1, 2010 at 2:17 PM, Chris Hostetter wrote: > > : Subject: add/update document as distinct operations? Is it possible? > : References: > : > > : In-Reply-To: > : > > > http://p

Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Julian Davchev
Hi, I got the picture now. Not having distinct add/update actions force me to implement custom queueing mechanism. Thanks Cheers. Erick Erickson wrote: > One of the most requested features in Lucene/SOLR is to be able > to update only selected fields rather than the whole document. But > that's no

Re: cheking the size of the index using solrj API's

2010-04-05 Thread Ryan McKinley
On Fri, Apr 2, 2010 at 7:07 AM, Na_D wrote: > > hi, > > > I need to monitor the index for the following information: > > 1. Size of the index > 2 Last time the index was updated. > If by 'size o the index' you mean document count, then check the Luke Request Handler http://wiki.apache.org/solr/Lu

Re: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 & 21, 2010

2010-04-05 Thread Grant Ingersoll
Just a reminder, just over one week left open on the CFP. Some great talks entered already. Keep it up! On Mar 24, 2010, at 8:03 PM, Grant Ingersoll wrote: > Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 > & 21, 2010 > > All submissions must be received by Tue

Re: cheking the size of the index using solrj API's

2010-04-05 Thread Peter Sturge
If you're using ReplicitionHandler directly, you already have the xml from which to extract the 'indexSize' attribute. >From a client, you can get the indexSize by issuing: http://hostname:8983/solr/core/replication?command=details This will give you an xml response. Use: http://hostname:8983/s

Re: cheking the size of the index using solrj API's

2010-04-05 Thread Na_D
hi, I am using the piece of code given below ReplicationHandler handler2 = new ReplicationHandler(); System.out.println( handler2.getDescription()); NamedList statistics = handler2.getStatistics();

Re: Obtaining SOLR index size on disk

2010-04-05 Thread Na_D
hi, I am using the piece of code given below ReplicationHandler handler2 = new ReplicationHandler(); System.out.println( handler2.getDescription()); NamedList statistics = handler2.getStatistics();

Re: What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread Koji Sekiguchi
paulosalamat wrote: Hi Koji, Thank you for the reply. I have another question. If WhitespaceTokenizer is used, is the term text "macbook+air" equal to "macbook air"? No. In the field, "macbook air" will be a phrase (not a term). You can define not only terms but phrases in synonyms.txt: ex

Re: What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread paulosalamat
Hi Koji, Thank you for the reply. I have another question. If WhitespaceTokenizer is used, is the term text "macbook+air" equal to "macbook air"? Thank you, Paulo On Mon, Apr 5, 2010 at 5:50 PM, Koji Sekiguchi [via Lucene] < ml-node+697386-2142071620-218...@n3.nabble.com > wrote: > paulosala

Re: What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread Koji Sekiguchi
paulosalamat wrote: Hi I'm new to this group, I would like to ask a question: What does it mean when you see a plus sign in between two words inside synonyms.txt? e.g. macbookair => macbook+air Thanks, Paulo Welcome, Paulo! It depends on your tokenizer. You can specify a tokenizer via