Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Robert Muir
On Mon, Apr 5, 2010 at 2:28 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Robert: BOMs are one of those things that strike me as being abhorent and inheriently evil because they seem to cause nothing but problems -- Yes. If text files that start with a BOM aren't properly being

Re: Read Time Out Exception while trying to upload a huge SOLR input xml

2010-04-05 Thread Lance Norskog
Solr also has a feature to stream from a local file rather than over the network. The parameter stream.file=/full/local/file/name.txt means 'read this file from the local disk instead of the POST upload'. Of course, you have to get the entire file onto the Solr indexer machine (or a common

Re: Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor

2010-04-05 Thread Andrew McCombe
Hi Can no-one help me with this? Andrew On 2 April 2010 22:24, Andrew McCombe eupe...@gmail.com wrote: Hi I am experimenting with Solr to index my gmail and am experiencing an error: 'Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor' I

Re: Experience with indexing billions of documents?

2010-04-05 Thread Lance Norskog
The 2B limitation is within one shard, due to using a signed 32-bit integer. There is no limit in that regard in sharding- Distributed Search uses the stored unique document id rather than the internal docid. On Fri, Apr 2, 2010 at 10:31 AM, Rich Cariens richcari...@gmail.com wrote: A colleague

Re: Index db data

2010-04-05 Thread MitchK
It seems to work ;). However, trueman, you should subscribe to solr-user@lucene.apache.org, since not everybody looks up Nabble for mailing-list postings. - Mitch -- View this message in context: http://n3.nabble.com/Index-db-data-tp693204p698691.html Sent from the Solr - User mailing list

Re: Solr caches and nearly static indexes

2010-04-05 Thread Lance Norskog
In a word: no. What you can do instead of deleting them is to add them to a growing list of don't search for these documents. This could be listed in a filter query. We had exactly this problem in a consumer app; we had a small but continuously growing list of obscene documents in the index, and

Some help for folks trying to get new Solr/Lucene up in Eclipse

2010-04-05 Thread Mattmann, Chris A (388J)
Hey All, Just to save some folks some time in case you are trying to get new Lucene/Solr up in running in Eclipse. If you continue to get weird errors, e.g., in solr/src/test/TestConfig.java regarding org.w3c.dom.Node#getTextContent(), I found for me this error was caused by including the

Re: Obtaining SOLR index size on disk

2010-04-05 Thread Lance Norskog
This information is not available via the API. If you would like this information added to the statistics request, please file a JIRA requesting it. Without knowing the size of the index files to be transferred, the client cannot monitor its own disk space. This would be useful for the cloud

Re: Minimum Should Match the other way round

2010-04-05 Thread MitchK
Sorry for doubleposting, but to avoid any missunderstanding: Accessing instantiated filters is not a really good idea, since a new Filter must be instantiated all the time. However, what I have ment was: if I create a WordDelimiterFilter or a StopFilter and I have set a param for a file like

one particular doc in results should always come first for a particular query

2010-04-05 Thread Mark Fletcher
Hi, Suppose I search for the word *international. *A particular record (say * recordX*) I am looking for is coming as the Nth result now. I have a requirement that when a user queries for *international *I need recordX to always be the first result. How can I achieve this. Note:- When user

Re: Unable to load MailEntityProcessor or org.apache.solr.handler.dataimport.MailEntityProcessor

2010-04-05 Thread Lance Norskog
The MailEntityProcessor is an extra and does not come normally with the DataImportHandler. The wiki page should mention this. In the Solr distribution it should be in the dist/ directory as dist/apache-solr-dataimporthandler-extras-1.4.jar. The class it wants is in this jar . (Do 'unzip -l jar'

Re: including external files in config by corename

2010-04-05 Thread Lance Norskog
Making snippets is part of highlighting. http://www.lucidimagination.com/search/s:lucid/li:cdrg?q=snippet On Mon, Apr 5, 2010 at 10:53 AM, Shawn Heisey s...@elyograg.org wrote: Is it possible to access the core name in a config file (such as solrconfig.xml) so I can include core-specific

Re: no of cfs files are more that the mergeFactor

2010-04-05 Thread Lance Norskog
mergeFactor=5 means that if there are 42 documents, there will be 3 index files: 1 with 25 documents, 3 with 5 documents, and 1 with 2 documents Imagine making change with coins of 1 document, 5 documents, 5^2 documents, 5^3 documents, etc. On Mon, Apr 5, 2010 at 10:59 AM, Chris Hostetter

exact match coming as second record

2010-04-05 Thread Mark Fletcher
Hi, I am using the dismax handler. I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have boosted myfield^20.0. Even with such a high boost (in fact among the qf fields specified this field has the max boost given), when I search for XXX.YYY.ZZZ I see my record as the second one

Re: one particular doc in results should always come first for a particular query

2010-04-05 Thread Erick Erickson
Hmmm, how do you know which particular record corresponds to which keyword? Is this a list known at index time, as in this record should come up first whenever bonkers is the keyword? If that's the case, you could copy the magic keyword to a different field (say magic_keyword) and boost it right

Re: exact match coming as second record

2010-04-05 Thread Erick Erickson
What do you get back when you specify debugQuery=on? Best Erick On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher mark.fletcher2...@gmail.comwrote: Hi, I am using the dismax handler. I have a field named *myfield* which has a value say XXX.YYY.ZZZ. I have boosted myfield^20.0. Even with such

Re: one particular doc in results should always come first for a particular query

2010-04-05 Thread Chris Hostetter
: If that's the case, you could copy the magic keyword to a different field : (say magic_keyword) and boost it right into orbit as an OR clause : (magic_keyword:bonkers ^1). This kind of assumes that a magic keyword : corresponds to one and only one document : : If this is way off base,

Re: Multicore and TermVectors

2010-04-05 Thread Chris Hostetter
: Subject: Multicore and TermVectors It doesn't sound like Multicore is your issue ... it seems like what you mean is that you are using distributed search with TermVectors, and that is causing a problem. Can you please clarify exactly what you mean ... describe your exact setup (ie: how

Re: Solr caches and nearly static indexes

2010-04-05 Thread Chris Hostetter
: times. Is there any way to have the index keep its caches when the only thing : that happens is deletions, then invalidate them when it's time to actually add : data? It would have to be something I can dynamically change when switching : between deletions and the daily import. The problem

Re: Solr caches and nearly static indexes

2010-04-05 Thread Chris Hostetter
: We had exactly this problem in a consumer app; we had a small but : continuously growing list of obscene documents in the index, and did : not want to display these. So, we had a filter query with all of the : obscene words, and used this with every query. that doesn't seem like it would

Re: Solr caches and nearly static indexes

2010-04-05 Thread Yonik Seeley
On Mon, Apr 5, 2010 at 9:04 PM, Chris Hostetter hossman_luc...@fucit.org wrote: ... the reusing the FieldCache seems like hte only thing that would be advantageous in that case And FieldCache entries are currently reused when there have only been deletions on a segment (since Solr 1.4). -Yonik

Re: Solr caches and nearly static indexes

2010-04-05 Thread Chris Hostetter
: ... the reusing the FieldCache seems like hte only thing that would be : advantageous in that case : : And FieldCache entries are currently reused when there have only been : deletions on a segment (since Solr 1.4). But that's kind of orthoginal to (what i think) Lance's point was: that

Re: Solr caches and nearly static indexes

2010-04-05 Thread Yonik Seeley
On Mon, Apr 5, 2010 at 9:10 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : ... the reusing the FieldCache seems like hte only thing that would be : advantageous in that case : : And FieldCache entries are currently reused when there have only been : deletions on a segment (since

Re: exact match coming as second record

2010-04-05 Thread Mark Fletcher
Hi Eric, Thanks many for your mail! Please find attached the debugQuery results. Thanks! Mark On Mon, Apr 5, 2010 at 7:38 PM, Erick Erickson erickerick...@gmail.comwrote: What do you get back when you specify debugQuery=on? Best Erick On Mon, Apr 5, 2010 at 7:31 PM, Mark Fletcher

Re: including external files in config by corename

2010-04-05 Thread Mark Miller
On 04/05/2010 01:53 PM, Shawn Heisey wrote: Is it possible to access the core name in a config file (such as solrconfig.xml) so I can include core-specific configlets into a common config file? I would like to pull in different configurations for things like shards and replication, but have

Re: Need info on CachedSQLentity processor

2010-04-05 Thread Mark Miller
On 04/05/2010 02:28 PM, bbarani wrote: Hi, I am using cachedSqlEntityprocessor in DIH to index the data. Please find below my dataconfig structure, entity x query=select * from x --- object entity y query=select * from y processor=cachedSqlEntityprocessor cachekey=y.id cachevalue=x.id --

Re: including external files in config by corename

2010-04-05 Thread Chris Hostetter
: The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : : http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution Except that XInclude is a feature of hte XML parser, while

Re: Some help for folks trying to get new Solr/Lucene up in Eclipse

2010-04-05 Thread Lance Norskog
I had a slight hiccup that I just ignored. Even when I used Java 1.6 JDK mode, Eclipse did not know this method. I had to comment out the three places that use this method. javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(true) Lance Norskog On Mon, Apr 5, 2010 at 1:49 PM, Mattmann,

Re: Need info on CachedSQLentity processor

2010-04-05 Thread bbarani
Mark, I have opened a JIRA issue - https://issues.apache.org/jira/browse/SOLR-1867 Thanks, Barani -- View this message in context: http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tp698418p699329.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multicore and TermVectors

2010-04-05 Thread Lance Norskog
There is no query parameter. The query parser throws an NPE if there is no query parameter: http://issues.apache.org/jira/browse/SOLR-435 It does not look like term vectors are processed in distributed search anyway. On Mon, Apr 5, 2010 at 4:45 PM, Chris Hostetter hossman_luc...@fucit.org

Re: including external files in config by corename

2010-04-05 Thread Mark Miller
On 04/05/2010 10:12 PM, Chris Hostetter wrote: : The best you have to work with at the moment is Xincludes: : : http://wiki.apache.org/solr/SolrConfigXml#XInclude : : and System Property Substitution: : : http://wiki.apache.org/solr/SolrConfigXml#System_property_substitution Except that

What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread paulosalamat
Hi I'm new to this group, I would like to ask a question: What does it mean when you see a plus sign in between two words inside synonyms.txt? e.g. macbookair = macbook+air Thanks, Paulo -- View this message in context:

Re: What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread Koji Sekiguchi
paulosalamat wrote: Hi I'm new to this group, I would like to ask a question: What does it mean when you see a plus sign in between two words inside synonyms.txt? e.g. macbookair = macbook+air Thanks, Paulo Welcome, Paulo! It depends on your tokenizer. You can specify a tokenizer via

Re: What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread paulosalamat
Hi Koji, Thank you for the reply. I have another question. If WhitespaceTokenizer is used, is the term text macbook+air equal to macbook air? Thank you, Paulo On Mon, Apr 5, 2010 at 5:50 PM, Koji Sekiguchi [via Lucene]

Re: What does it mean when you see a plus sign in between two words inside synonyms.txt?

2010-04-05 Thread Koji Sekiguchi
paulosalamat wrote: Hi Koji, Thank you for the reply. I have another question. If WhitespaceTokenizer is used, is the term text macbook+air equal to macbook air? No. In the field, macbook air will be a phrase (not a term). You can define not only terms but phrases in synonyms.txt: ex)

Re: Obtaining SOLR index size on disk

2010-04-05 Thread Na_D
hi, I am using the piece of code given below ReplicationHandler handler2 = new ReplicationHandler(); System.out.println( handler2.getDescription()); NamedList statistics =

Re: cheking the size of the index using solrj API's

2010-04-05 Thread Na_D
hi, I am using the piece of code given below ReplicationHandler handler2 = new ReplicationHandler(); System.out.println( handler2.getDescription()); NamedList statistics = handler2.getStatistics();

Re: cheking the size of the index using solrj API's

2010-04-05 Thread Peter Sturge
If you're using ReplicitionHandler directly, you already have the xml from which to extract the 'indexSize' attribute. From a client, you can get the indexSize by issuing: http://hostname:8983/solr/core/replication?command=details This will give you an xml response. Use:

Re: Apache Lucene EuroCon Call For Participation: Prague, Czech Republic May 20 21, 2010

2010-04-05 Thread Grant Ingersoll
Just a reminder, just over one week left open on the CFP. Some great talks entered already. Keep it up! On Mar 24, 2010, at 8:03 PM, Grant Ingersoll wrote: Apache Lucene EuroCon Call For Participation - Prague, Czech Republic May 20 21, 2010 All submissions must be received by

Re: cheking the size of the index using solrj API's

2010-04-05 Thread Ryan McKinley
On Fri, Apr 2, 2010 at 7:07 AM, Na_D nabam...@zaloni.com wrote: hi, I need to monitor the index for the following information: 1. Size of the index 2 Last time the index was updated. If by 'size o the index' you mean document count, then check the Luke Request Handler

Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Julian Davchev
Hi, I got the picture now. Not having distinct add/update actions force me to implement custom queueing mechanism. Thanks Cheers. Erick Erickson wrote: One of the most requested features in Lucene/SOLR is to be able to update only selected fields rather than the whole document. But that's not

Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Israel Ekpo
Chris, I don't see anything in the headers suggesting that Julian's message was a hijack of another thread On Thu, Apr 1, 2010 at 2:17 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: add/update document as distinct operations? Is it possible? : References: :

Re: Related terms/combined terms

2010-04-05 Thread Ahmet Arslan
Not sure of the exact vocabulary I am looking for so I'll try to explain myself. Given a search term is there anyway to return back a list of related/grouped keywords (based on the current state of the index) for that term. For example say I have a sports catalog and I search for

Re: add/update document as distinct operations? Is it possible?

2010-04-05 Thread Erick Erickson
I still don't see what the difference is. If there was a distinct add/update process, how would that absolve you from having to implement your own queueing? To have predictable index content, you still must order your operations. Best Erick On Mon, Apr 5, 2010 at 12:45 PM, Julian Davchev

Re: Minimum Should Match the other way round

2010-04-05 Thread Grant Ingersoll
On Apr 3, 2010, at 10:18 AM, MitchK wrote: Hello, I want to tinkle a little bit with Solr, so I need a little feedback: Is it possible to define a Minimum Should Match for the document itself? I mean, it is possible to say, that a query this is my query should only match a document,

Re: Does Lucidimagination search uses Multi facet query filter or uses session?

2010-04-05 Thread Grant Ingersoll
We are using multiselect facets like what you have below (although I haven't verified your syntax). So no, we are not using sessions. See http://www.lucidimagination.com/search/?q=multiselect+faceting#/s:email for help. -Grant http://www.lucidimagination.com On Apr 1, 2010, at 12:35 PM,

Re: feature request for ivalid data formats

2010-04-05 Thread Chris Hostetter
: : I don't know whether this is the good place to ask it, or there is a special : tool for issue : requests. We use Jira for bug reports and feature reuqests, but it's always a good idea to start with a solr-user email before filing a new bug/request to help discuss the behavior you are

Re: dismax multi search?

2010-04-05 Thread Chris Hostetter
: I want to be able to direct some search terms to specific fields : : I want to do something like this : : keyword1 should search against book titles / authors : : keyword2 should search against book contents / book info / user reviews your question is a little vague ... will keyword1 and

including external files in config by corename

2010-04-05 Thread Shawn Heisey
Is it possible to access the core name in a config file (such as solrconfig.xml) so I can include core-specific configlets into a common config file? I would like to pull in different configurations for things like shards and replication, but have all the cores otherwise use an identical

Re: Related terms/combined terms

2010-04-05 Thread Blargy
Thanks for the response Mitch. I'm not too sure how well this will work for my needs but Ill certainly play around with it. I think something more along the lines of Ahmet's solution is what I was looking for. -- View this message in context:

Re: no of cfs files are more that the mergeFactor

2010-04-05 Thread Chris Hostetter
This sounds completley normal form what i remembe about mergeFactor. Segmenets are merged by level meaning that with a mergeFactor of 5, once 5 level 1 segments are formed they are merged into a single level 2 segment. then 5 more level 1 segments are allowed to form before the next merge

Re: no of cfs files are more that the mergeFactor

2010-04-05 Thread Mark Miller
I'm guessing the user is expecting there to be one cfs file for the index, and does not understand that its actually per segment. On 04/05/2010 01:59 PM, Chris Hostetter wrote: This sounds completley normal form what i remembe about mergeFactor. Segmenets are merged by level meaning that with

Re: Getting solr response in HTML format : HTMLResponseWriter

2010-04-05 Thread Chris Hostetter
: so I have tried to attach the xslt steelsheet to the response of SOLR with : passing this 2 variables wt=xslttr=example.xsl : : while example.xsl is an included steelsheet to SOLR , but the response in : HTML was'nt very perfect . can you elaborate on what you mean by wasn't very perfect ?

Re: exceptionhandling error-reporting?

2010-04-05 Thread Chris Hostetter
: This client uses a simple user-agent that requires JSON-syntax while parsing : searchresults from solr, but when solr drops an exception, tomcat returns an : error-500 page to the client and it crashes. define crashes ? ... presumabl you are tlaking about the client crashing because it

Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Chris Hostetter
: Some applications (such as Windows Notepad), insert a UTF-8 Byte Order Mark : (BOM) as the first character of the file. So, perhaps the first word in your : stopwords list contains a UTF-8 BOM and thats why you are seeing this : behavior. Robert: BOMs are one of those things that strike me as

Re: selecting documents older than 4 hours

2010-04-05 Thread Chris Hostetter
: NOW/HOUR-5HOURS evaluates to 2010-03-31T21:00:00 which should not be the : case if the current time is Wed Mar 31 19:50:48 PDT 2010. Is SOLR converting : NOW to GMT time? 1) NOW means Now ... what moment in time is happening right at this moment is independent of what locale you are in and

Re: Is this a bug of the RessourceLoader?

2010-04-05 Thread Yonik Seeley
On Mon, Apr 5, 2010 at 2:28 PM, Chris Hostetter hossman_luc...@fucit.org wrote: If text files that start with a BOM aren't properly being dealt with by Solr right now, should we consider that a bug? It's a Java bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058 But we should fix