Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Gabriele Kahlout
Sorry being unclear and thank you for answering. Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and C(k0,k2,k3), where A,B,C are document identifiers and the ks in bracket with each are the terms each contains. So Solr inverted index should be something like: k0 -- A | C k1 -- A | B

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread pravesh
k0 -- A | C k1 -- A | B k2 -- A | B | C k3 -- B | C Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? Do we bother to do that. Now that's what lucene does :) -- View this message in context:

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Gabriele Kahlout
On Tue, Jun 7, 2011 at 8:43 AM, pravesh suyalprav...@yahoo.com wrote: k0 -- A | C k1 -- A | B k2 -- A | B | C k3 -- B | C Now let q=k1, how do I make sure C doesn't appear as a result since it doesn't contain any occurence of k1? Do we bother to do that. Now that's what lucene does :)

Re: Master Slave help

2011-06-07 Thread Rohit Gupta
thanks Jayendra.. From: Jayendra Patil jayendra.patil@gmail.com To: solr-user@lucene.apache.org Sent: Tue, 7 June, 2011 6:55:58 AM Subject: Re: Master Slave help Do you mean the replication happens everytime you restart the server ? If so, you would need

Commit taking very long

2011-06-07 Thread Rohit Gupta
Hi, My commit seems to be taking too much time, if you notice from the Dataimport status given below to commit 1000 docs its taking longer than 24 minutes /lst str name=statusbusy/str str name=importResponseA command is still running.../str − lst name=statusMessages str name=Time

getting numberformat exception while using tika

2011-06-07 Thread Naveen Gupta
Hi We are using requestextractinghandler and we are getting following error. we are giving microsoft docx file for indexing. I think that this is something to do with field date definition .. but now very sure ...what field type should we use? 2. we are trying to index jpg (when we search over

How many fields can SOLR handle?

2011-06-07 Thread roySolr
Hello, I have a SOLR implementation with 1m products. Every products has some information, lets say a television has some information about pixels and inches, a computer has information about harddisk, cpu, gpu. When a user search for computer i want to show the correct facets. An example: User

function queries scope

2011-06-07 Thread Marco Martinez
Hi, I need to use the function queries operations with the score of a given query, but only in the docset that i get from the query and i dont know if this is possible. Example: q=shops in madridreturns 1 docs with a specific score for each doc but now i need to do some stuff like

Indexing Mediawiki

2011-06-07 Thread Tod
I have a need to index an internal instance of Mediawiki. I'd like to use DIH if I can since I have access to the database but the example provided on the Solr wiki uses a Mediawiki dump XML file. Does anyone have any experience using DIH in this manner? Am I barking up the wrong tree and

solr 3.1 java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory

2011-06-07 Thread bryan rasmussen
As per the subject I am getting java.lang.NoClassDEfFoundError org/carrot2/core/ControllerFactory when I try to run clustering. I am using Solr 3.1: I get the following error: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory at

Re: Documents update

2011-06-07 Thread Denis Kuzmenok
Created file, reloaded solr - externalfilefield works fine, if i change change external files and do curl http://127.0.0.1:4900/solr/site/update -H Content-Type: text/xml --data-binary 'commit /' then no thanges are made. If i start solr without external files

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread lee carroll
Gabriele Lucene uses a combination of boolean and VSM for its IR. A straight forward query for a keyword will only match docs with that keyword. Now things quickly get subtle and complex the more sugar you add, more complicated queries across fields and more complex analysis chains but I think

clustering problems on 3.1

2011-06-07 Thread bryan rasmussen
I added the following to my configuration lib dir=c:/projects/solrtest/dist/ regex=apache-solr-clustering-.*\.jar / requestHandler name=clusty class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str bool name=clusteringtrue/bool str

Re: Commit taking very long

2011-06-07 Thread Erick Erickson
Are you optimizing? That is unnecessary when committing, and is often the culprit. Best Erick On Tue, Jun 7, 2011 at 5:42 AM, Rohit Gupta ro...@in-rev.com wrote: Hi, My commit seems to be taking too much time, if you notice from the Dataimport status given below to commit 1000 docs its

Re: problem: zooKeeper Integration with solr

2011-06-07 Thread Mohammad Shariq
how this method (http://localhost:8983/solr/select?shards=*Machine:Port/Solr Path,**Machine:Port/Solr Path*indent=trueq=query) is better than zooKeeper, could you please refer any performance doc. On 7 June 2011 08:18, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Instead of

RE: SpellCheckComponent performance

2011-06-07 Thread Demian Katz
As I may have mentioned before, VuFind is actually doing two Solr queries for every search -- a base query that gets basic spelling suggestions, and a supplemental spelling-only query that gets shingled spelling suggestions. If there's a way to get two different spelling responses in a single

Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-07 Thread roySolr
Hello, I have some problems with the installation of the new PECL package solr-1.0.1. I run this command: pecl uninstall solr-beta ( to uninstall old version, 0.9.11) pecl install solr The installing is running but then it gives the following error message:

Re: java.lang.AbstractMethodError at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)

2011-06-07 Thread idivad
Finally figured out the problem. -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-AbstractMethodError-at-org-apache-solr-handler-ContentStreamHandlerBase-handleRequestBody--tp3026470p3034456.html Sent from the Solr - User mailing list archive at Nabble.com.

Solr Cloud Query Question

2011-06-07 Thread Jamie Johnson
I am currently experimenting with the Solr Cloud code on trunk and just had a quick question. Lets say my setup had 3 nodes a, b and c. Node a has 1000 results which meet a particular query, b has 2000 and c has 3000. When executing this query and asking for row 900 what specifically happens?

Re: Solr Cloud Query Question

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson jej2...@gmail.com wrote: I am currently experimenting with the Solr Cloud code on trunk and just had a quick question.  Lets say my setup had 3 nodes a, b and c.  Node a has 1000 results which meet a particular query, b has 2000 and c has 3000.  

Re: function queries scope

2011-06-07 Thread Yonik Seeley
One way is to use the boost qparser: http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html q={!boost b=productValueField}shops in madrid Or you can use the edismax parser which as a boost parameter that does the same thing: defType=edismaxq=shops in

Re: function queries scope

2011-06-07 Thread Marco Martinez
Thanks, but its not what i'm looking for, because the BoostQParserPlugin multiplies the score of the query with the function queries defined in the b param of the BoostQParserPlugin. and i can't use the edismax because we have our own qparser. Its seems that i have to code another qparser.

RE: SpellCheckComponent performance

2011-06-07 Thread Dyer, James
Demian, If you omit spellcheckIndexDir from the configuration, it will create an in-memory spelling dictionary. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Demian Katz [mailto:demian.k...@villanova.edu] Sent: Tuesday, June 07, 2011

Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-07 Thread Stefan Moises
Hi Yonik, thanks, it's working in trunk now again... I had to re-index though because of exceptions at startup, did the index format change again between trunk of beginning / mid may and current trunk? best regards, Stefan Am 03.06.2011 15:32, schrieb Yonik Seeley: This bug was introduced

Re: Debugging a Solr/Jetty Hung Process

2011-06-07 Thread Chris Cowan
OK... The fix I thought would fix it didn't fix it (which was to use the commitWithin feature). What I can gather from `ps` is that the thread has pages locked in memory. Currently I'm using native locking for Solr. Would switching to simple help alleviate this problem? Chris On Jun 4, 2011,

Re: Default query parser operator

2011-06-07 Thread Brian Lamb
I feel like this should be fairly easy to do but I just don't see anywhere in the documentation on how to do this. Perhaps I am using the wrong search parameters. On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, Is it possible to change the query parser

Solr Custom Installation

2011-06-07 Thread Federico Czerwinski
Hey there. I was wondering if Solr can be embedded into my Java Web App. As far as I know, Solr comes as a war or bundled with Jetty if you don't have a container. I've opened the war's web.xml and found out that it only has a couple of servlets, filters and that's it. So, would it be possible to

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Jonathan Rochkind
Um, normally that would never happen, because, well, like you say, the inverted index doesn't have docC for term K1, because doc C didn't include term K1. If you search on q=K1, then how/why would docC ever be in your result set? Are you seeing it in your result set? The question then would

Re: Default query parser operator

2011-06-07 Thread Jonathan Rochkind
Nope, not possible. I'm not even sure what it would mean semantically. If you had default operator OR ordinarily, but default operator AND just for field2, then what would happen if you entered: field1:foo field2:bar field1:baz field2:bom Where the heck would the ANDs and ORs go? The

Re: Solr Custom Installation

2011-06-07 Thread Tomás Fernández Löbbe
Hi Federico, you can take a look to this wiki page: http://wiki.apache.org/solr/EmbeddedSolr http://wiki.apache.org/solr/EmbeddedSolrSolr also has some maven support, see the ant target generate-maven-artifacts, don't know if that's what you need. Regards, Tomás On Tue, Jun 7, 2011 at 12:17 PM,

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Gabriele Kahlout
You are right, Lucene will return based on my scoring function implementation (Similarity classhttp://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html ): score(q,d) =

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Jonathan Rochkind
Okay, if you're using a custom similarity, I'm not sure what's going on, I'm not familiar with that. But ordinarily, you are right, you would require k1 with +k1. What you say about the + being lost suggests something is going wrong. Either you are not sending your query to Solr properly

Data not always returned

2011-06-07 Thread Jerome Renard
Hi all, I have a problem with my index. Even though I always index the same data over and over again, whenever I try a couple of searches (they are always the same as they are issued by a unit test suite) I do not get the same results, sometimes I get 3 successes and 2 failures and sometimes it

Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Luis Cappa Banda
Hello! My problem is as follows: I've got a field (indexed and stored setted as true) tokenized by whitespaces and other patterns, with a gap with value 100. For example, if index the following expression for the field that I mentioned: *Expression*: A B C D E- *Index*: tokenA

Re: Solr Cloud Query Question

2011-06-07 Thread Jamie Johnson
Thanks Yonik. I have a follow on now, how does Solr ensure consistent results across pages? So for example if we had my 3 theoretical solr instances again and a, b and c each returned 100 documents with the same score and the user only requested 100 documents, how are those 100 documents chosen

Re: Default query parser operator

2011-06-07 Thread Brian Lamb
Hi Jonathan, Thank you for your reply. Your point about my example is a good one. So let me try to restate using your example. Suppose I want to apply AND to any search terms within field1. Then field1:foo field2:bar field1:baz field2:bom would by written as

Re: Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Tomás Fernández Löbbe
My first guess would be that you are using AND as default operator? you can see the generated query by using the parameter debugQuery=true On Tue, Jun 7, 2011 at 1:34 PM, Luis Cappa Banda luisca...@gmail.comwrote: Hello! My problem is as follows: I've got a field (indexed and stored setted as

Re: Solr Cloud Query Question

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 1:01 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Yonik.  I have a follow on now, how does Solr ensure consistent results across pages?  So for example if we had my 3 theoretical solr instances again and a, b and c each returned 100 documents with the same score and

Re: Default query parser operator

2011-06-07 Thread Jonathan Rochkind
There's no feature in Solr to do what you ask, no. I don't think. On 6/7/2011 1:30 PM, Brian Lamb wrote: Hi Jonathan, Thank you for your reply. Your point about my example is a good one. So let me try to restate using your example. Suppose I want to apply AND to any search terms within field1.

Re: Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Yonik Seeley
On Tue, Jun 7, 2011 at 12:34 PM, Luis Cappa Banda luisca...@gmail.com wrote: *Expression*: A B C D E F G H I As written, this is equivalent to *Expression*: A default_field:B default_field:C default_field:D default_field:E default_field:F default_field:G default_field:H default_field:I Try

Solr Cloud and Range Facets

2011-06-07 Thread Jamie Johnson
I have a solr cloud setup wtih 2 servers, when executing a query against them of the form:

Compound word search not what I expected

2011-06-07 Thread kenf_nc
I have a field defined as: field name=content type=text indexed=true stored=false termVectors=true multiValued=true / where text is unmodified from the schema.xml example that came with Solr 1.4.1. I have documents with some compound words indexed, words like Sandstone. And in several cases

Re: Compound word search not what I expected

2011-06-07 Thread Markus Jelsma
catenateWords should be set to true. Same goes for the index analyzer. preserveOriginal would also work. I have a field defined as: field name=content type=text indexed=true stored=false termVectors=true multiValued=true / where text is unmodified from the schema.xml example that came

How to deal with many files using solr external file field

2011-06-07 Thread Bohnsack, Sven
Hi all, we're using solr 1.4 and external file field ([1]) for sorting our searchresults. We have about 40.000 Terms, for which we use this sorting option. Currently we're running into massive OutOfMemory-Problems and were not pretty sure, what's the matter. It seems that the garbage collector

Available Solr Indexing strategies

2011-06-07 Thread zarni aung
Hi, I am very new to Solr and my client is trying to implement full text searching capabilities to their product by using Solr. They will also have master storage that would be the Authoritative data store which will also provide meta data searches. Can you please point me in the right

Re: Data not always returned

2011-06-07 Thread Erick Erickson
Well, this is odd. Several questions 1 what do your logs show? I'm wondering if somehow some data is getting rejected. I have no idea why that would be, but if you're seeing indexing exceptions that would explain it. 2 on the admin/stats page, are maxDocs and numDocs the same in the

Re: Compound word search not what I expected

2011-06-07 Thread Erick Erickson
WordDelimiterFilterFactory is doing this to you. It's not clear to me that you want this in place at all. Look at admin/analysis for that field to see how that filter breaks things up, it's often surprising to people. Best Erick On Tue, Jun 7, 2011 at 3:13 PM, kenf_nc ken.fos...@realestate.com

Re: Compound word search not what I expected

2011-06-07 Thread lee carroll
see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory from the wiki Example of generateWordParts=1 and catenateWords=1: PowerShot - 0:Power, 1:Shot 1:PowerShot (where 0,1,1 are token positions) A's+B'sC's - 0:A, 1:B, 2:C, 2:ABC

Re: Compound word search not what I expected

2011-06-07 Thread kenf_nc
I tried setting catenateWords=1 on the Query analyzer and that didn't do anything. I think what I need is to set my Index Analyzer to have preserveOriginal=1 and then re-index everything. That will be a pain, so I'll do a small test to make sure first. I'm really surprised preserveOriginal=1 isn't

Re: Default query parser operator

2011-06-07 Thread lee carroll
Hi Brian could your front end app do this field query logic? (assuming you have an app in front of solr) On 7 June 2011 18:53, Jonathan Rochkind rochk...@jhu.edu wrote: There's no feature in Solr to do what you ask, no. I don't think. On 6/7/2011 1:30 PM, Brian Lamb wrote: Hi Jonathan,

Solr Coldfusion Search Issue

2011-06-07 Thread Alejandro Delgadillo
Hi, I¹m having some troubles using Solr throught Coldfusion, the problem right now is that when I search for a term in a Custom field, the results sometimes have the value that I sent to the custom field and not to the field that contains the text, this is the cfsearch sintax that I¹m using:

Re: Solr Coldfusion Search Issue

2011-06-07 Thread lee carroll
Can you see the query actually presented to solr in the logs ? maybe capture that and then run it with a debug true in the admin pages. sorry i cant help directly with your syntax On 7 June 2011 23:06, Alejandro Delgadillo adelgadi...@febg.org wrote: Hi, I¹m having some troubles using Solr

Re: Compound word search not what I expected

2011-06-07 Thread Markus Jelsma
You must catenateWord on index-time as well. I tried setting catenateWords=1 on the Query analyzer and that didn't do anything. I think what I need is to set my Index Analyzer to have preserveOriginal=1 and then re-index everything. That will be a pain, so I'll do a small test to make sure

wildcard search

2011-06-07 Thread Thomas Fischer
Hello, I am testing solr 3.2 and have problems with wildcards. I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK, and can't find a way to search with wildcards. I want to use a wild card search to match something like IA 31? but cannot find a way to do so. GOK:IA\ 38*

Re: Solr Coldfusion Search Issue

2011-06-07 Thread Alejandro Delgadillo
Thanks Lee for the quick response, Let me explain it a little bit better In the CFSEARCH tag, you use the CRITERIA attribute, what it does... By default is that it sents to the SOLR via post the search query of the user to the field where the text is stored in this case since I'm indexing PDF

Re: wildcard search

2011-06-07 Thread Erick Erickson
Yes there is, but you haven't provided enough information to make a suggestion. What isthe fieldType definition? What is the field definition? Two resources that'll help you greatly are: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and the admin/analysis page... Best Erick On

400 MB Fields

2011-06-07 Thread Otis Gospodnetic
Hello, What are the biggest document fields that you've ever indexed in Solr or that you've heard of? Ah, it must be Tom's Hathi trust. :) I'm asking because I just heard of a case of an index where some documents having a field that can be around 400 MB in size! I'm curious if anyone has

Re: 400 MB Fields

2011-06-07 Thread Erick Erickson
From older (2.4) Lucene days, I once indexed the 23 volume Encyclopedia of Michigan Civil War Volunteers in a single document/field, so it's probably within the realm of possibility at least G... Erick On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello,

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
I think the question is strange... May be you are wondering about possible OOM exceptions? I think we can pass to Lucene single document containing comma separated list of term, term, ... (few billion times)... Except stored and TermVectorComponent... I believe thousands companies already indexed

Re: 400 MB Fields

2011-06-07 Thread Otis Gospodnetic
Hi, I think the question is strange... May be you are wondering about possible OOM exceptions? No, that's an easier one. I was more wondering whether with 400 MB Fields (indexed, not stored) it becomes incredibly slow to: * analyze * commit / write to disk * search I think we can pass to

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
Hi Otis, I am recalling pagination feature, it is still unresolved (with default scoring implementation): even with small documents, searching-retrieving documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can take few minutes (I saw it with trunk version 6 months ago, and

Re: 400 MB Fields

2011-06-07 Thread Lance Norskog
The Salesforce book is 2800 pages of PDF, last I looked. What can you do with a field that big? Can you get all of the snippets? On Tue, Jun 7, 2011 at 5:33 PM, Fuad Efendi f...@efendi.ca wrote: Hi Otis, I am recalling pagination feature, it is still unresolved (with default scoring

RE: 400 MB Fields

2011-06-07 Thread Burton-West, Tom
Hi Otis, Our OCR fields average around 800 KB. My guess is that the largest docs we index (in a single OCR field) are somewhere between 2 and 10MB. We have had issues where the in-memory representation of the document (the in memory index structures being built)is several times the size of

tika integration exception and other related queries

2011-06-07 Thread Naveen Gupta
Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error. java.lang. NumberFormatException: For input string: quot;2011-01-27T07:18:00Zquot; at