Re: How to list all dynamic fields of a document using solrj?
Hi Juan I tried with the following code first: final SolrQuery allDocumentsQuery = new SolrQuery(); allDocumentsQuery.setQuery(id: + myId); allDocumentsQuery.setFields(*); allDocumentsQuery.setRows(1); QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST); With this, only non-dynamic fields are returned. Then I wrote the following helper method: private SetString getDynamicFields() throws SolrServerException, IOException { final LukeRequest luke = new LukeRequest(); luke.setShowSchema(false); final LukeResponse process = luke.process(solr); final MapString, FieldInfo fieldInfo = process.getFieldInfo(); final SetString dynamicFields = new HashSetString(); for (final String key : fieldInfo.keySet()) { if (key.endsWith(_string) || (key.endsWith(_dateTime))) { dynamicFields.add(key); } } return dynamicFields; } where as _string and _dateTime are the suffixes of my dynamic fields. This one returns really all stored fields of the document: final SetString dynamicFields = getDynamicFields(); final SolrQuery allDocumentsQuery = new SolrQuery(); allDocumentsQuery.setQuery(uri: + myId); allDocumentsQuery.setFields(*); for (final String df : dynamicFields) { allDocumentsQuery.addField(df); } allDocumentsQuery.setRows(1); QueryResponse response = solr.query(allDocumentsQuery, METHOD.POST); Is there a more elegant way to do this? We are using solrj 3.1.0 and solr 3.1.0. Regards Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business - Ursprüngliche Mail - Von: Juan Grande juan.gra...@gmail.com An: solr-user@lucene.apache.org Gesendet: Montag, 29. August 2011 18:19:05 Betreff: Re: How to list all dynamic fields of a document using solrj? Hi Michael, It's supposed to work. Can we see a snippet of the code you're using to retrieve the fields? *Juan* On Mon, Aug 29, 2011 at 8:33 AM, Michael Szalay michael.sza...@basis06.chwrote: Hi all how can I list all dynamic fields and their values of a document using solrj? The dynamic fields are never returned when I use setFields(*). Thanks Michael -- Michael Szalay Senior Software Engineer basis06 AG, Birkenweg 61, CH-3013 Bern - Fon +41 31 311 32 22 http://www.basis06.ch - source of smart business
Solr 3.3. Grouping vs DeDuplication and Deduplication Use Case
Solr 3.3. has a feature Grouping. Is it practically same as deduplication? Here is my use case for duplicates removal - We have many documents with similar (upto 99%) content. Upon some search queries, almost all of them come up on first page results. Of all these documents, essentially one is original and the other are duplicates. We are able to find the original content on a basis of number of factors - who uploaded it, when, how many viral shares.It is also possible that the duplicates are uploaded earlier (and hence exist in search index) while the original is uploaded later (and gets added later to index). AFAIK, Deduplication targets index time. Is there a means I can specify the original which should be returned and the duplicates which could be removed from coming up.? *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny
Re: Does Solr flush to disk even before ramBufferSizeMB is hit?
Thanks Shawn. If Solr writes this info to Disk as soon as possible (which is what I am seeing) then ramBuffer setting seems to be misleading. Anyone else has any thoughts on this? -Saroj On Mon, Aug 29, 2011 at 6:14 AM, Shawn Heisey s...@elyograg.org wrote: On 8/28/2011 11:18 PM, roz dev wrote: I notice that even though InfoStream does not mention that data is being flushed to disk, new segment files were created on the server. Size of these files kept growing even though there was enough Heap available and 856MB Ram was not even used. With the caveat that I am not an expert and someone may correct me, I'll offer this: It's been my experience that Solr will write the files that constitute stored fields as soon as they are available, because that information is always the same and nothing will change in those files based on the next chunk of data. Thanks, Shawn
Stream still in memory after tika exception? Possible memoryleak?
Hi all, Currently I'm testing Solr's indexing performance, but unfortunately I'm running into memory problems. It looks like Solr is not closing the filestream after an exception, but I'm not really sure. The current system I'm using has 150GB of memory and while I'm indexing the memoryconsumption is growing and growing (eventually more then 50GB). In the attached graph I indexed about 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws an exception. The commits are after 64MB, 60 seconds or after a job (there are 6 evenly divided jobs). After indexing the memoryconsumption isn't dropping. Even after an optimize command it's still there. What am I doing wrong? I can't imagine I'm the only one with this problem. Thanks in advance! Kind regards, Marc
Re: Solr 3.3. Grouping vs DeDuplication and Deduplication Use Case
Deduplication uses lucene indexWriter.updateDocument using the signature term. I don't think it's possible as a default feature to choose wich document to index, the original should be always the last to be indexed. /IndexWriter.updateDocument Updates a document by first deleting the document(s) containing term and then adding the new document. The delete and then add are atomic as seen by a reader on the same index (flush may happen only after the add)./ With grouping you have all your documents indexed so it gives you more flexibility -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-3-Grouping-vs-DeDuplication-and-Deduplication-Use-Case-tp3294711p3295023.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index full text as possible
For phrase queries, you simply surround the text with double quotes e.g. this is a phrase... Best Erick 2011/8/29 Rode González r...@libnova.es: Hi again. In that case, you should be able to use a tokeniser to split the input into phrases, though you will probably need to write a custom tokeniser, depending on what characters you want to break phrases at. Please see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters I have read this page but I didn't see anything. I thought it was a filter implemented. It is also entirely possible to index the full text, and just do a phrase search later. This is probably the easiest option, unless you have a huge volume of text, and the volume of phrases to be indexed can be significantly lower. How can I do that? Thanks. Rode.
Re: Solr custom plugins: is it possible to have them persistent?
Why doesn't the singleton approach we talked about a few days ago do this? It would create the object the first time you asked for it, and return you the same one thereafter Best Erick On Mon, Aug 29, 2011 at 11:04 AM, samuele.mattiuzzo samum...@gmail.com wrote: it's how i'm doing it now... but i'm not sure i'm placing the objects into the right place significant part of my code here : http://pastie.org/2448984 (i've omitted the methods implementations since are pretty long) inside the method setLocation, i create the connection to mysql database inside the method setFieldPosition, i create the categorization object Then i started thinking i was creating and deleting those objects locally everytime solr reads a document to index. So, where should i put them? inside the tothegocustom class constructor, after the super call? I'm asking this because i'm not sure if my custom updaterequestprocessor is created once or for everydocument parsed (i'm still learning solr, but i think i'm getting into it, bits per bits!) Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3292928.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Geodist
Why couldn't you just give an outrageous distance (10) or something? You have to have some kind of point you're asking for the distance *from*, don't you? Best Erick On Mon, Aug 29, 2011 at 5:09 PM, solrnovice manisha...@yahoo.com wrote: Eric, thanks for the update, I thought solr 4.0 should have the pseudo columns and i am using the right version. So did you ever worked on a query to return distance, where there is no long, lat are used in the where clause. I mean not in a radial search, but a city search, but displayed the distance. So my thought was to pass in the long and lat to the geodist and also the coordinates(long and lat) of every record, and let geodist compute the distance. Can you please let me know if this worked for you? thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3293779.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr warming when using master/slave replication
Will traffic be served with a non warmed index searcher at any point No. That's what auto-warming is all about. More correctly, it depends on how you configure things in your config file. There are entries like firstSearcher, newSearcher and various autowarm counts, all of which you set and the various actions specified are carried out before the switch is made to the new searcher after replication. There's also useColdSearchers if you want to specifically NOT wait for warmup. As I said, it depends on how you configure things. In fact, this will lead to a temporary increase in memory use, since the old and new caches will both be in memory for a short time. Best Erick On Mon, Aug 29, 2011 at 5:54 PM, Mike Austin mike.aus...@juggle.com wrote: Correction: Will traffic be served with a non warmed index searcher at any point? Thanks, Mike On Mon, Aug 29, 2011 at 4:52 PM, Mike Austin mike.aus...@juggle.com wrote: Distribution/Replication gives you a 'new' index on the slave. When Solr is told to use the new index, the old caches have to be discarded along with the old Index Searcher. That's when autowarming occurs. If the current Index Searcher is serving requests and when a new searcher is opened, the new one is 'warmed' while the current one is serving external requests. When the new one is ready, it is registered so it can serve any new requests while the original one first finishes the requests it is handling. So if warming is configured, the new index will warm before going live? How does that work with the copying to the new directory? Does it get warmed while in the temp directory before copied over? My question is basically, will traffic be served with a non indexed searcher at any point? Thanks, Mike On Mon, Aug 29, 2011 at 4:45 PM, Rob Casson rob.cas...@gmail.com wrote: it's always been my understanding that the caches are discarded, then rebuilt/warmed: http://wiki.apache.org/solr/SolrCaching#Caching_and_Distribution.2BAC8-Replication hth, rob On Mon, Aug 29, 2011 at 5:30 PM, Mike Austin mike.aus...@juggle.com wrote: How does warming work when a collection is being distributed to a slave. I understand that a temp directory is created and it is eventually copied to the live folder, but what happens to the cache that was built in with the old index? Does the cache get rebuilt, can we warm it before it becomes live, or can we keep the old cache? Thanks, Mike
Re: Solr Faceting DIH
I'd really think carefully before disabling unique IDs. If you do, you'll have to manage the records yourself, so your next delta-import will add more records to your search result, even those that have been updated. You might do something like make the uniqueKey the concatenation of productid and attributeid or whatever makes sense. Best Erick On Mon, Aug 29, 2011 at 5:52 PM, Aaron Bains aaronba...@gmail.com wrote: Hello, I am trying to setup Solr Faceting on products by using the DataImportHandler to import data from my database. I have setup my data-config.xml with the proper queries and schema.xml with the fields. After the import/index is complete I can only search one productid record in Solr. For example of the three productid '10100039' records there are I am only able to search for one of those. Should I somehow disable unique ids? What is the best way of doing this? Below is the schema I am trying to index: +---+-+-++ | productid | attributeid | valueid | categoryid | +---+-+-++ | 10100039 | 331100 | 1580 | 1 | | 10100039 | 331694 | 1581 | 1 | | 10100039 | 33113319 | 1537370 | 1 | | 10100040 | 331100 | 1580 | 1 | | 10100040 | 331694 | 1540230 | 1 | | 10100040 | 33113319 | 1537370 | 1 | +---+-+-++ Thanks!
Re: Stream still in memory after tika exception? Possible memoryleak?
Hi all, Currently I'm testing Solr's indexing performance, but unfortunately I'm running into memory problems. It looks like Solr is not closing the filestream after an exception, but I'm not really sure. The current system I'm using has 150GB of memory and while I'm indexing the memoryconsumption is growing and growing (eventually more then 50GB). In the attached graph (http://postimage.org/image/acyv7kec/) I indexed about 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws an exception. The commits are after 64MB, 60 seconds or after a job (there are 6 evenly divided jobs). After indexing the memoryconsumption isn't dropping. Even after an optimize command it's still there. What am I doing wrong? I can't imagine I'm the only one with this problem. Thanks in advance! Kind regards, Marc
Re: Shingle and Query Performance
Hi Eric, Fields are lazy loading, content stored in solr and machine 32 gig.. solr has 20 gig heap. There is no swapping. As you see we have many phrases in the same query . I couldnt find a way to drop qtime to subsecends. Suprisingly non shingled test better qtime ! On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson erickerick...@gmail.comwrote: Oh, one other thing: have you profiled your machine to see if you're swapping? How much memory are you giving your JVM? What is the underlying hardware setup? Best Erick On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson erickerick...@gmail.com wrote: 200K docs and 36G index? It sounds like you're storing your documents in the Solr index. In and of itself, that shouldn't hurt your query times, *unless* you have lazy field loading turned off, have you checked that lazy field loading is enabled? Best Erick On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han khanuniver...@gmail.com wrote: Another insteresting thing is : all one word or more word queries including phrase queries such as barack obama slower in shingle configuration. What i am doing wrong ? without shingle barack obama Querytime 300ms with shingle 780 ms.. On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, What is the difference between solr 3.3 and the trunk ? I will try 3.3 and let you know the results. Here the search handler: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !--str name=fqcategory:vv/str-- str name=fqmrank:[0 TO 100]/str str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str !--str name=qftitle^0.05 url^1.2 content^1.7 m_title^10.0/str-- str name=qftitle^1.05 url^1.2 content^1.7 m_title^10.0/str !-- str name=bfrecip(ee_score,-0.85,1,0.2)/str -- str name=pfcontent^18.0 m_title^5.0/str int name=ps1/int int name=qs0/int str name=mm2lt;-25%/str str name=spellchecktrue/str !--str name=spellcheck.collatetrue/str -- str name=spellcheck.count5/str str name=spellcheck.dictionarysubobjective/str str name=spellcheck.onlyMorePopularfalse/str str name=hl.tag.prelt;bgt;/str str name=hl.tag.postlt;/bgt;/str str name=hl.useFastVectorHighlightertrue/str /lst On Sat, Aug 27, 2011 at 5:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: I'm not sure what the issue could be at this point. I see you've got qt=search - what's the definition of that request handler? What is the parsed query (from the debugQuery response)? Have you tried this with Solr 3.3 to see if there's any appreciable difference? Erik On Aug 27, 2011, at 09:34 , Lord Khan Han wrote: When grouping off the query time ie 3567 ms to 1912 ms . Grouping increasing the query time and make useless to cache. But same config faster without shingle still. We have and head to head test this wednesday tihs commercial search engine. So I am looking for all suggestions. On Sat, Aug 27, 2011 at 3:37 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Please confirm is this is caused by grouping. Turn grouping off, what's query time like? On Aug 27, 2011, at 07:27 , Lord Khan Han wrote: On the other hand We couldnt use the cache for below types queries. I think its caused from grouping. Anyway we need to be sub second without cache. On Sat, Aug 27, 2011 at 2:18 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, Thanks for the reply. Here the solr log capture.: ** hl.fragsize=100spellcheck=truespellcheck.q=Xgroup.limit=5hl.simple.pre=bhl.fl=contentspellcheck.collate=truewt=javabinhl=truerows=20version=2fl=score,approved,domain,host,id,lang,mimetype,title,tstamp,url,categoryhl.snippets=3start=0q=%2B+-X+-X+-XX+-XX+-XX+-+-XX+-XXX+-X+-+-+-X+-X+-X+-+-+-X+-XX+-X+-XX+-XX+-+-X+-XX+-+-X+-X+-X+-X+-X+-X+-X+-X+-XX+-XX+-XX+-X+-X+X+X+XX++group.field=hosthl.simple.post=/bgroup=trueqt=searchfq=mrank:[0+TO+100]fq=word_count:[70+TO+*] ** is the words. All phrases x has two words inside. The timing from the DebugQuery: lst name=timing double name=time8654.0/double lst name=prepare double name=time16.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time16.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst
Re: How to send an OpenBitSet object from Solr server?
Satish Talim, il 30/08/2011 05:42, ha scritto: [...] Is there a work-around wherein I can send an OpenBitSet object? JavaBinCodec (used by default by solr) supports writing arrays. you can getBits() from openbitset and throw them into the binary response federico
Re: How to send an OpenBitSet object from Solr server?
Satish Talim, il 30/08/2011 05:42, ha scritto: [...] Is there a work-around wherein I can send an OpenBitSet object? JavaBinCodec (used by default by solr) supports writing arrays. you can getBits() from openbitset and throw them into the binary response federico
Re: Solr custom plugins: is it possible to have them persistent?
my problem is i still don't understand where i have to put that singleton (or how i can load it into solr) i have my singleton class Connector for mysql, with all its methods defined. Now what? This is the point i'm missing :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to send an OpenBitSet object from Solr server?
But how to throw? As a stream of bits? Satish On Tue, Aug 30, 2011 at 5:39 PM, Federico Fissore feder...@fissore.orgwrote: Satish Talim, il 30/08/2011 05:42, ha scritto: [...] Is there a work-around wherein I can send an OpenBitSet object? JavaBinCodec (used by default by solr) supports writing arrays. you can getBits() from openbitset and throw them into the binary response federico
Re: How to send an OpenBitSet object from Solr server?
Satish Talim, il 30/08/2011 14:22, ha scritto: But how to throw? As a stream of bits? getBits() return a long[] add a long[] part to your response rb.rsp.add(long_array, obs.getBits()) federico
strField
I have a string fieldtype defined as so fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ And I have a field defined as field name=guid type=string indexed=true stored=true required=false / The fields are of this format 92E8EF8FC9F362BBE0408CA5785A29D4 But in the index they are like this: str name=guid[B@520ed128/str I thought it must be compression but compression=true|false is no longer supported by strField I don't see any base64 encoding in this field. Anyone shed light on this? Thanks
Re: Post Processing Solr Results
This might work in conjunction with what POST processing to help to pair down the results, but the logic for the actual access to the data is too complex to have entirely in solr. On Mon, Aug 29, 2011 at 2:02 PM, Erick Erickson erickerick...@gmail.com wrote: It's reasonable, but post-filtering is often difficult, you have too many documents to wade through. If you can see any way at all to just include a clause in the query, you'll save a world of effort... Is there any way you can include a value in some kind of permissions field? Let's say you have a document that is only to be visible for tier 1 customers. If your permissions field contained the tiers (e.g. tier0, tier1), then a simple AND permissions:tier1 would do the trick... I know this is a trivial example, but you see where this is headed. The documents can contain as many of these tokens in permissions as you want. As long as you can string together a clause like AND permissions:(A OR B OR C) and not have the clause get ridiculously long (as in thousands of values), that works best. Any such scheme depends upon being able to assign the documents some kind of code that doesn't change too often (because when it does you have to re-index) and figure out, at query time, what permissions a user has. Using FieldCache or low-level Lucene routines can answer the question Does doc X contain token Y in field Z reasonably easily. What it has a hard time doing is answering For document X, what are all the value in the inverted index in field Z. If this doesn't make sense, could you explain a bit more about your permissions model? Hope this helps Erick On Mon, Aug 29, 2011 at 11:46 AM, Jamie Johnson jej2...@gmail.com wrote: Thanks guys, perhaps I am just going about this the wrong way. So let me explain my problem and perhaps there is a more appropriate solution. What I need to do is basically hide certain results based on some passed in user parameter (say their service tier for instance). What I'd like to do is have some way to plugin my custom logic to basically remove certain documents from the result set using this information. Now that being said I technically don't need to remove the documents from the full result set, I really only need to remove them from current page (but still ensuring that a page is filled and sorted). At present I'm trying to see if there is a way for me to add this type of logic after the QueryComponent has executed, perhaps by going through the DocIdandSet at this point and then intersecting the DocIdSet with a DocIdSet which would filter out the stuff I don't want seen. Does this sound reasonable or like a fools errand? On Mon, Aug 29, 2011 at 10:51 AM, Erik Hatcher erik.hatc...@gmail.com wrote: I haven't followed the details, but what I'm guessing you want here is Lucene's FieldCache. Perhaps something along the lines of how faceting uses it (in SimpleFacets.java) - FieldCache.DocTermsIndex si = FieldCache.DEFAULT.getTermsIndex(searcher.getIndexReader(), fieldName); Erik On Aug 29, 2011, at 09:58 , Erick Erickson wrote: If you're asking whether there's a way to find, say, all the values for the auth field associated with a document... no. The nature of an inverted index makes this hard (think of finding all the definitions in a dictionary where the word earth was in the definition). Best Erick On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson jej2...@gmail.com wrote: Thanks Erick, if I did not know the token up front that could be in the index is there not an efficient way to get the field for a specific document and do some custom processing on it? On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson erickerick...@gmail.com wrote: Start here I think: http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html Best Erick On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson jej2...@gmail.com wrote: Thanks for the reply. The fields I want are indexed, but how would I go directly at the fields I wanted? In regards to indexing the auth tokens I've thought about this and am trying to get confirmation if that is reasonable given our constraints. On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson erickerick...@gmail.com wrote: Yeah, loading the document inside a Collector is a definite no-no. Have you tried going directly at the fields you want (assuming they're indexed)? That *should* be much faster, but whether it'll be fast enough is a good question. I'm thinking some of the Terms methods here. You *might* get some joy out of making sure lazy field loading is enabled (and make sure the fields you're accessing for your logic are indexed), but I'm not entirely sure about that bit. This kind of problem is sometimes handled by indexing auth tokens with the documents and including an OR clause on the query with the authorizations for a particular user, but that works best if
Re: Does Solr flush to disk even before ramBufferSizeMB is hit?
On 8/30/2011 12:57 AM, roz dev wrote: Thanks Shawn. If Solr writes this info to Disk as soon as possible (which is what I am seeing) then ramBuffer setting seems to be misleading. Anyone else has any thoughts on this? The stored fields are only two of the eleven Lucene files in each segment. The buffer is not needed for them, because there is no transformation or data aggregation, they are written continuously as data is read. The other files have to utilize the buffer, and can only be written once all the data for that segment has been read, transformed, and aggregated. Thanks, Shawn
Reading results from FieldCollapsing
Hi All I am trying to use FieldCollapsing feature in Solr. On the Solr admin interface, I give ...group=truegroup.field=fieldA and I can see grouped results. But, I am not able to figure out how to read those results in that order on java. Something like: SolrDocumentList doclist = response.getResults(); gives me a set of results, on which I iterate, and get something like doclist.get(1).getFieldValue(title) etc. After grouping, doing the same step throws me error (apparently, because the returned xml formats are different too). How can I read groupValues and thereby other fieldvalues of the documents inside that group? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Context-Sensitive Spelling Suggestions Collations
Using the DirectSolrSpellChecker im very interested in this. According to https://issues.apache.org/jira/browse/SOLR-2585 some changes need to be made to DirectSolrSpellChecker. Does anybody know how to get this working? -- View this message in context: http://lucene.472066.n3.nabble.com/Context-Sensitive-Spelling-Suggestions-Collations-tp3295570p3295570.html Sent from the Solr - User mailing list archive at Nabble.com.
escaping special characters does not seem to be escaping in query
Hi All: I have a few fields that are of the form: A:2B or G:U2 and so on. I would like to be able to search the field using a wild character search like: A:2* or G:U*. I have tried out modifying the field_type definitions to allow for such queries but without any luck Could someone/anyone provided me with a fieldtype that uses the canned Tokenizers and filters which will allow me to do a search as described ? Thanks much Ramdev
Re: Solr Geodist
hi Eric, thank you for the tip, i will try that option. Where can i find a document that shows details of geodist arguments, when i google, i did not find one. so this is what my query is like. I want the distance to be returned. i dont now exactly what all to pass to geodist, as i couldnt find a proper document. http://localhost:/solr/apex_dev/select/?q=city:Quincyfl=city,state,coordinates,score,geodist(39.9435,-120.9226). So i want to pass in the long and lat of Quincy and then i want all the records which are tagged with Quincy should be returned ( as i am doing a q=city:Qunicy search) and also distance to be displayed. Can you please let me know what i should pass into goedist(), in this scenario. thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3295606.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Faceting DIH
I had the same problem with a database here, and we discovered that every item had its own product page, its own url. So, we decided that our unique id had to be the url instead of using sql ids and id concatenations. sometimes it works. You can store all ids if u need them for something, but for uniqueids, urls go just fine. 2011/8/30 Erick Erickson erickerick...@gmail.com I'd really think carefully before disabling unique IDs. If you do, you'll have to manage the records yourself, so your next delta-import will add more records to your search result, even those that have been updated. You might do something like make the uniqueKey the concatenation of productid and attributeid or whatever makes sense. Best Erick On Mon, Aug 29, 2011 at 5:52 PM, Aaron Bains aaronba...@gmail.com wrote: Hello, I am trying to setup Solr Faceting on products by using the DataImportHandler to import data from my database. I have setup my data-config.xml with the proper queries and schema.xml with the fields. After the import/index is complete I can only search one productid record in Solr. For example of the three productid '10100039' records there are I am only able to search for one of those. Should I somehow disable unique ids? What is the best way of doing this? Below is the schema I am trying to index: +---+-+-++ | productid | attributeid | valueid | categoryid | +---+-+-++ | 10100039 | 331100 |1580 | 1 | | 10100039 | 331694 |1581 | 1 | | 10100039 |33113319 | 1537370 | 1 | | 10100040 | 331100 |1580 | 1 | | 10100040 | 331694 | 1540230 | 1 | | 10100040 |33113319 | 1537370 | 1 | +---+-+-++ Thanks! -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Stream still in memory after tika exception? Possible memoryleak?
What version of Solr are you using, and how are you indexing? DIH? SolrJ? I'm guessing you're using Tika, but how? Best Erick On Tue, Aug 30, 2011 at 4:55 AM, Marc Jacobs jacob...@gmail.com wrote: Hi all, Currently I'm testing Solr's indexing performance, but unfortunately I'm running into memory problems. It looks like Solr is not closing the filestream after an exception, but I'm not really sure. The current system I'm using has 150GB of memory and while I'm indexing the memoryconsumption is growing and growing (eventually more then 50GB). In the attached graph I indexed about 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws an exception. The commits are after 64MB, 60 seconds or after a job (there are 6 evenly divided jobs). After indexing the memoryconsumption isn't dropping. Even after an optimize command it's still there. What am I doing wrong? I can't imagine I'm the only one with this problem. Thanks in advance! Kind regards, Marc
Re: Shingle and Query Performance
Can we see the output if you specify both debugQuery=ondebug=true the debug=true will show the time taken up with various components, which is sometimes surprising... Second, we never asked the most basic question, what are you measuring? Is this the QTime of the returned response? (which is the time actually spent searching) or the time until the response gets back to the client, which may involve lots besides searching... Best Erick On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han khanuniver...@gmail.com wrote: Hi Eric, Fields are lazy loading, content stored in solr and machine 32 gig.. solr has 20 gig heap. There is no swapping. As you see we have many phrases in the same query . I couldnt find a way to drop qtime to subsecends. Suprisingly non shingled test better qtime ! On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson erickerick...@gmail.comwrote: Oh, one other thing: have you profiled your machine to see if you're swapping? How much memory are you giving your JVM? What is the underlying hardware setup? Best Erick On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson erickerick...@gmail.com wrote: 200K docs and 36G index? It sounds like you're storing your documents in the Solr index. In and of itself, that shouldn't hurt your query times, *unless* you have lazy field loading turned off, have you checked that lazy field loading is enabled? Best Erick On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han khanuniver...@gmail.com wrote: Another insteresting thing is : all one word or more word queries including phrase queries such as barack obama slower in shingle configuration. What i am doing wrong ? without shingle barack obama Querytime 300ms with shingle 780 ms.. On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, What is the difference between solr 3.3 and the trunk ? I will try 3.3 and let you know the results. Here the search handler: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !--str name=fqcategory:vv/str-- str name=fqmrank:[0 TO 100]/str str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str !--str name=qftitle^0.05 url^1.2 content^1.7 m_title^10.0/str-- str name=qftitle^1.05 url^1.2 content^1.7 m_title^10.0/str !-- str name=bfrecip(ee_score,-0.85,1,0.2)/str -- str name=pfcontent^18.0 m_title^5.0/str int name=ps1/int int name=qs0/int str name=mm2lt;-25%/str str name=spellchecktrue/str !--str name=spellcheck.collatetrue/str -- str name=spellcheck.count5/str str name=spellcheck.dictionarysubobjective/str str name=spellcheck.onlyMorePopularfalse/str str name=hl.tag.prelt;bgt;/str str name=hl.tag.postlt;/bgt;/str str name=hl.useFastVectorHighlightertrue/str /lst On Sat, Aug 27, 2011 at 5:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: I'm not sure what the issue could be at this point. I see you've got qt=search - what's the definition of that request handler? What is the parsed query (from the debugQuery response)? Have you tried this with Solr 3.3 to see if there's any appreciable difference? Erik On Aug 27, 2011, at 09:34 , Lord Khan Han wrote: When grouping off the query time ie 3567 ms to 1912 ms . Grouping increasing the query time and make useless to cache. But same config faster without shingle still. We have and head to head test this wednesday tihs commercial search engine. So I am looking for all suggestions. On Sat, Aug 27, 2011 at 3:37 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Please confirm is this is caused by grouping. Turn grouping off, what's query time like? On Aug 27, 2011, at 07:27 , Lord Khan Han wrote: On the other hand We couldnt use the cache for below types queries. I think its caused from grouping. Anyway we need to be sub second without cache. On Sat, Aug 27, 2011 at 2:18 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, Thanks for the reply. Here the solr log capture.: ** hl.fragsize=100spellcheck=truespellcheck.q=Xgroup.limit=5hl.simple.pre=bhl.fl=contentspellcheck.collate=truewt=javabinhl=truerows=20version=2fl=score,approved,domain,host,id,lang,mimetype,title,tstamp,url,categoryhl.snippets=3start=0q=%2B+-X+-X+-XX+-XX+-XX+-+-XX+-XXX+-X+-+-+-X+-X+-X+-+-+-X+-XX+-X+-XX+-XX+-+-X+-XX+-+-X+-X+-X+-X+-X+-X+-X+-X+-XX+-XX+-XX+-X+-X+X+X+XX++group.field=hosthl.simple.post=/bgroup=trueqt=searchfq=mrank:[0+TO+100]fq=word_count:[70+TO+*] ** is the words. All phrases x
Re: strField
My educated guess is that you're using Java for your indexer, and you're (or something below is) doing a toString on a Java object. You're sending over a Java object address, not the string itself. A simple change to your indexer should fix this. Erik On Aug 30, 2011, at 08:42 , Twomey, David wrote: I have a string fieldtype defined as so fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ And I have a field defined as field name=guid type=string indexed=true stored=true required=false / The fields are of this format 92E8EF8FC9F362BBE0408CA5785A29D4 But in the index they are like this: str name=guid[B@520ed128/str I thought it must be compression but compression=true|false is no longer supported by strField I don't see any base64 encoding in this field. Anyone shed light on this? Thanks
Re: Solr custom plugins: is it possible to have them persistent?
OK, maybe I'm getting there. You put it into a .jar file, and then in solrconfig.xml you create a lib... directive that points to where the jar file is. At that point, you can add your custom class to the UpdateRequestProcessor as per Tomas' e-mail. Best Erick On Tue, Aug 30, 2011 at 8:10 AM, samuele.mattiuzzo samum...@gmail.com wrote: my problem is i still don't understand where i have to put that singleton (or how i can load it into solr) i have my singleton class Connector for mysql, with all its methods defined. Now what? This is the point i'm missing :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr UIMA exception
thanks Tommaso, there is some problem in my solrconfig.xml. now its fixed. thanks again. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-UIMA-exception-tp3285158p3295743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reading results from FieldCollapsing
Have you looked at the XML (or JSON) response format? You're right, it is different and you have to parse it differently, there are move levels. Try this query and you'll see the format (default data set). http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact Best Erick On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I am trying to use FieldCollapsing feature in Solr. On the Solr admin interface, I give ...group=truegroup.field=fieldA and I can see grouped results. But, I am not able to figure out how to read those results in that order on java. Something like: SolrDocumentList doclist = response.getResults(); gives me a set of results, on which I iterate, and get something like doclist.get(1).getFieldValue(title) etc. After grouping, doing the same step throws me error (apparently, because the returned xml formats are different too). How can I read groupValues and thereby other fieldvalues of the documents inside that group? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: escaping special characters does not seem to be escaping in query
There is very little information to go on here, but at a guess WordDelimiterFilterFactory is your problem. have you looked at the admin/analysis page to try to figure out what your analysis chain is doing? Best Erick On Tue, Aug 30, 2011 at 9:46 AM, ramdev.wud...@thomsonreuters.com wrote: Hi All: I have a few fields that are of the form: A:2B or G:U2 and so on. I would like to be able to search the field using a wild character search like: A:2* or G:U*. I have tried out modifying the field_type definitions to allow for such queries but without any luck Could someone/anyone provided me with a fieldtype that uses the canned Tokenizers and filters which will allow me to do a search as described ? Thanks much Ramdev
Re: Solr Geodist
q=*:*sfield=storept=45.15,-93.85fl=name,store,geodist() Actually, you don't even have to specify the d=, I misunderstood. Best Erick On Tue, Aug 30, 2011 at 9:56 AM, solrnovice manisha...@yahoo.com wrote: hi Eric, thank you for the tip, i will try that option. Where can i find a document that shows details of geodist arguments, when i google, i did not find one. so this is what my query is like. I want the distance to be returned. i dont now exactly what all to pass to geodist, as i couldnt find a proper document. http://localhost:/solr/apex_dev/select/?q=city:Quincyfl=city,state,coordinates,score,geodist(39.9435,-120.9226). So i want to pass in the long and lat of Quincy and then i want all the records which are tagged with Quincy should be returned ( as i am doing a q=city:Qunicy search) and also distance to be displayed. Can you please let me know what i should pass into goedist(), in this scenario. thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3295606.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to get update record from database using delta-query?
Have you tried the debug page? See: http://wiki.apache.org/solr/DataImportHandler#interactive Best Erick On Tue, Aug 30, 2011 at 12:44 AM, vighnesh svighnesh...@gmail.com wrote: hi all I am facing the problem in get a update record from database using delta query in solr please give me the solution and my delta query is entity name=groups_copy pk=id dataSource=datasource-1 query=select id,name from groups_copy ; deltaQuery=select id,name from groups_copy where date_created'${dataimporter.last_index_time}' deltaImportQuery=select id,name from groups_copy where id='${dataimporter.delta.id}' ; field column=id name=id / field column=name name=name / /entity is there any wrong in this code please let me know thanks in advance. Regards, Vighnesh. -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-update-record-from-database-using-delta-query-tp3294510p3294510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: strField
Hmmm, I'm using DIH defined in data-config.xml I have an Oracle data source configured using JDBC connect string. On 8/30/11 10:41 AM, Erik Hatcher erik.hatc...@gmail.com wrote: My educated guess is that you're using Java for your indexer, and you're (or something below is) doing a toString on a Java object. You're sending over a Java object address, not the string itself. A simple change to your indexer should fix this. Erik On Aug 30, 2011, at 08:42 , Twomey, David wrote: I have a string fieldtype defined as so fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ And I have a field defined as field name=guid type=string indexed=true stored=true required=false / The fields are of this format 92E8EF8FC9F362BBE0408CA5785A29D4 But in the index they are like this: str name=guid[B@520ed128/str I thought it must be compression but compression=true|false is no longer supported by strField I don't see any base64 encoding in this field. Anyone shed light on this? Thanks
Relative performance of updating documents of different sizes
I was curious to know if anyone has any information about the relative performance of document updates (delete/add operations) on documents of different sizes. I have a use case in which I can either create large Solr documents first and subsequently add a small amount of information to them, or do the opposite (add the small doc first, then update with the big one.) My guess is that adding smaller ones first will be faster, since the time to delete a small document is presumably longer than the time to delete a small one. Thanks, Jeff
Re: Reading results from FieldCollapsing
Hi Erick Yes, I did see the XML format. But, I did not understand how to read the response using SolrJ. I found some information about Collapse Component on googling, which looks like a normal Solr XML results format. http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/ However, this class CollapseComponent does not seem to exist in Solr 3.3. (org.apache.solr.handler.component.CollapseComponent) was the component mentioned in that link, which is not there in Solr3.3 class files. Sowmya. On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.comwrote: Have you looked at the XML (or JSON) response format? You're right, it is different and you have to parse it differently, there are move levels. Try this query and you'll see the format (default data set). http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact Best Erick On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I am trying to use FieldCollapsing feature in Solr. On the Solr admin interface, I give ...group=truegroup.field=fieldA and I can see grouped results. But, I am not able to figure out how to read those results in that order on java. Something like: SolrDocumentList doclist = response.getResults(); gives me a set of results, on which I iterate, and get something like doclist.get(1).getFieldValue(title) etc. After grouping, doing the same step throws me error (apparently, because the returned xml formats are different too). How can I read groupValues and thereby other fieldvalues of the documents inside that group? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Relative performance of updating documents of different sizes
Document size should not have any impact on deleting document as they are only marked for deletion. On Tuesday 30 August 2011 17:06:05 Jeff Leedy wrote: I was curious to know if anyone has any information about the relative performance of document updates (delete/add operations) on documents of different sizes. I have a use case in which I can either create large Solr documents first and subsequently add a small amount of information to them, or do the opposite (add the small doc first, then update with the big one.) My guess is that adding smaller ones first will be faster, since the time to delete a small document is presumably longer than the time to delete a small one. Thanks, Jeff -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Solr custom plugins: is it possible to have them persistent?
ok so my two singleton classes are MysqlConnector and JFJPConnector basically: 1 - jar them 2 - cp them to /custom/path/within/solr/ 3 - modify solrconfig.xml with lib/custom/path/within/solr//lib my two jars are then automatically loaded? nice! in my CustomUpdateProcessor class i can call MysqlConnector.start_query() and JFJPConnector.other_method(), and it will refer to an active instance of those 2 classes? Is this how it works, without any other trick around? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reading results from FieldCollapsing
Ahhh, see: https://issues.apache.org/jira/browse/SOLR-2637 Short form: It's in 3.4, not 3.3. So, your choices are: 1 parse the XML yourself 2 get a current 3x build (as in one of the nightlys) and use SolrJ there. Best Erick On Tue, Aug 30, 2011 at 11:09 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi Erick Yes, I did see the XML format. But, I did not understand how to read the response using SolrJ. I found some information about Collapse Component on googling, which looks like a normal Solr XML results format. http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/ However, this class CollapseComponent does not seem to exist in Solr 3.3. (org.apache.solr.handler.component.CollapseComponent) was the component mentioned in that link, which is not there in Solr3.3 class files. Sowmya. On Tue, Aug 30, 2011 at 4:48 PM, Erick Erickson erickerick...@gmail.comwrote: Have you looked at the XML (or JSON) response format? You're right, it is different and you have to parse it differently, there are move levels. Try this query and you'll see the format (default data set). http://localhost:8983/solr/select?q=*:*group=ongroup.field=manu_exact Best Erick On Tue, Aug 30, 2011 at 9:25 AM, Sowmya V.B. vbsow...@gmail.com wrote: Hi All I am trying to use FieldCollapsing feature in Solr. On the Solr admin interface, I give ...group=truegroup.field=fieldA and I can see grouped results. But, I am not able to figure out how to read those results in that order on java. Something like: SolrDocumentList doclist = response.getResults(); gives me a set of results, on which I iterate, and get something like doclist.get(1).getFieldValue(title) etc. After grouping, doing the same step throws me error (apparently, because the returned xml formats are different too). How can I read groupValues and thereby other fieldvalues of the documents inside that group? S. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Solr custom plugins: is it possible to have them persistent?
Right, you're on track. Note that the changes you make to solrconfig.xml require you to give the qualified class name (e.g. org.myproj.myclass), but it all just gets found man. Also, it's not even necessary to be at a custom path within Solr, although it does have to be *relative* to SOLR_HOME. I often point it directly to the output directory that my IDE puts artifacts in. Although the paths get weird, things like ../../../erick/project/out/blahblahblabh Best Erick On Tue, Aug 30, 2011 at 11:14 AM, samuele.mattiuzzo samum...@gmail.com wrote: ok so my two singleton classes are MysqlConnector and JFJPConnector basically: 1 - jar them 2 - cp them to /custom/path/within/solr/ 3 - modify solrconfig.xml with lib/custom/path/within/solr//lib my two jars are then automatically loaded? nice! in my CustomUpdateProcessor class i can call MysqlConnector.start_query() and JFJPConnector.other_method(), and it will refer to an active instance of those 2 classes? Is this how it works, without any other trick around? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr custom plugins: is it possible to have them persistent?
i think it's better for me to keep it under some solr installation path, i don't want to loose files :) ok, i'm going to try this out :) i already got into the package issue (my.package.whatever) this one i know how to handle! thanks for all the help, i'll post again to tell you It Works! (but i'm not sure about it!) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3295842.html Sent from the Solr - User mailing list archive at Nabble.com.
Document Size for Indexing
Hi, I have a machine (win 2008R2) with 16GB RAM, I am having issue indexing 1/2GB files. How do we avoid creating a SOLRInputDocument or is there any way to directly use Lucene Index writer classes. What would be the best approach. We need some suggestions. Thanks, Tirthankar **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Re: Solr Geodist
Eric, thank you for the quick update, so in the below query you sent to me, i can also add any conditions right? i mean city:Boston and state:MA...etc , can i also use dismax query syntax? The confusion from the beginning seems to be the version of solr i was trying and the one you are trying. Looks like the latest trunc of solr has the geodist and the one i am using is not returning geodist. q=*:*sfield=storept=45.15,-93.85fl=name,store,geodist() thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3295868.html Sent from the Solr - User mailing list archive at Nabble.com.
add documents to the slave
Hi I've read that it's possible add documents to slave machine: http://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F ¿Is there anyway to not allow add to documents to slave machine? for example, touch on configurations files to only allow handler /select. Thanks.
Re: strField
Ok. Figured it out. Thanks for the pointer. The field was of type RAW in Oracle so it was being converted to a java string by DIH with the behaviour below. I just changed the SQL query in DIH to add RAWTOHEX(guid) On 8/30/11 11:03 AM, Twomey, David david.two...@novartis.com wrote: Hmmm, I'm using DIH defined in data-config.xml I have an Oracle data source configured using JDBC connect string. On 8/30/11 10:41 AM, Erik Hatcher erik.hatc...@gmail.com wrote: My educated guess is that you're using Java for your indexer, and you're (or something below is) doing a toString on a Java object. You're sending over a Java object address, not the string itself. A simple change to your indexer should fix this. Erik On Aug 30, 2011, at 08:42 , Twomey, David wrote: I have a string fieldtype defined as so fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ And I have a field defined as field name=guid type=string indexed=true stored=true required=false / The fields are of this format 92E8EF8FC9F362BBE0408CA5785A29D4 But in the index they are like this: str name=guid[B@520ed128/str I thought it must be compression but compression=true|false is no longer supported by strField I don't see any base64 encoding in this field. Anyone shed light on this? Thanks
Re: How to send an OpenBitSet object from Solr server?
: We have a need to query and fetch millions of document ids from a Solr 3.3 : index and convert the same to a BitSet. To speed things up, we want to : convert these document ids into OpenBitSet on the server side, put them into : the response object and read the same on the client side. This smells like an XY Problem ... what do you intend to do with this BitSet on the client side? the lucene doc ids are meaningless outside of hte server, and for any given doc, the id could change from one request to the next -- so how would having this data on the clinet be of any use to you? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: dependency injection in solr
Tomás Fernández Löbbe, il 29/08/2011 20:32, ha scritto: You can use reflection to instantiate the correct object (specify the class name on the parameter on the solrconfig and then invoke the constructor via reflection). You'll have to manage the life-cycle of your object yourself. If I understand your requirement, you probably have created a SearchComponent that uses that retriever, right? sorry for the delay: I was experimenting. Raw reflections do not suffice: you can't specify a dependency required by your reflection-constructed object I've ended up using spring this way (will cut some code for brevity) I've enabled spring the usual way, adding a ContextLoaderListener in web.xml and configuring the spring xml or java configuration files (I do java configuration) I've declared a spring bean named myComponentDeclaredInTheSpringConf, that is an extension of SearchComponent, with it's collaborators I've created SpringAwareSearchComponent, that is a delegate of SearchComponent public SpringAwareSearchComponent() { this.ctx = ContextLoader.getCurrentWebApplicationContext(); } ... public void init(NamedList args) { super.init(args); inner = ctx.getBean(args.get(__beanname__).toString(), SearchComponent.class); inner.init(args); } public void prepare(ResponseBuilder rb) throws IOException { inner.prepare(rb); } public void process(ResponseBuilder rb) throws IOException { inner.process(rb); } In solrconfig.xml I've declared the search component as searchComponent name=myComponent class=SpringAwareSearchComponent str name=__beanname__myComponentDeclaredInTheSpringConf/str ...other bean specific parameters /searchComponent and added myComponent to the list of search components And it works like a charm. Maybe I can implement some other solr class delegate and add hooks between spring and solr as needed any comment will be appreciated best regards federico
Re: Shingle and Query Performance
Below the output of the debug. I am measuring pure solr qtime which show in the Qtime field in solr xml. arr name=parsed_filter_queries strmrank:[0 TO 100]/str /arr lst name=timing double name=time8584.0/double lst name=prepare double name=time12.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time12.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.SpellCheckComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time8572.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time4480.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time41.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.SpellCheckComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time4051.0/double /lst On Tue, Aug 30, 2011 at 5:38 PM, Erick Erickson erickerick...@gmail.comwrote: Can we see the output if you specify both debugQuery=ondebug=true the debug=true will show the time taken up with various components, which is sometimes surprising... Second, we never asked the most basic question, what are you measuring? Is this the QTime of the returned response? (which is the time actually spent searching) or the time until the response gets back to the client, which may involve lots besides searching... Best Erick On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han khanuniver...@gmail.com wrote: Hi Eric, Fields are lazy loading, content stored in solr and machine 32 gig.. solr has 20 gig heap. There is no swapping. As you see we have many phrases in the same query . I couldnt find a way to drop qtime to subsecends. Suprisingly non shingled test better qtime ! On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote: Oh, one other thing: have you profiled your machine to see if you're swapping? How much memory are you giving your JVM? What is the underlying hardware setup? Best Erick On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson erickerick...@gmail.com wrote: 200K docs and 36G index? It sounds like you're storing your documents in the Solr index. In and of itself, that shouldn't hurt your query times, *unless* you have lazy field loading turned off, have you checked that lazy field loading is enabled? Best Erick On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han khanuniver...@gmail.com wrote: Another insteresting thing is : all one word or more word queries including phrase queries such as barack obama slower in shingle configuration. What i am doing wrong ? without shingle barack obama Querytime 300ms with shingle 780 ms.. On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, What is the difference between solr 3.3 and the trunk ? I will try 3.3 and let you know the results. Here the search handler: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !--str name=fqcategory:vv/str-- str name=fqmrank:[0 TO 100]/str str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str !--str name=qftitle^0.05 url^1.2 content^1.7 m_title^10.0/str-- str name=qftitle^1.05 url^1.2 content^1.7 m_title^10.0/str !-- str name=bfrecip(ee_score,-0.85,1,0.2)/str -- str name=pfcontent^18.0 m_title^5.0/str int name=ps1/int int name=qs0/int str name=mm2lt;-25%/str str name=spellchecktrue/str !--str name=spellcheck.collatetrue/str -- str name=spellcheck.count5/str str name=spellcheck.dictionarysubobjective/str str name=spellcheck.onlyMorePopularfalse/str str name=hl.tag.prelt;bgt;/str str name=hl.tag.postlt;/bgt;/str str name=hl.useFastVectorHighlightertrue/str /lst On Sat, Aug 27, 2011 at 5:31 PM, Erik Hatcher erik.hatc...@gmail.com wrote: I'm not sure what the issue could be at this point. I see you've got qt=search - what's the definition of that request handler? What
Re: add documents to the slave
That's basically it. remove all /update URLs from the slave config On Tue, Aug 30, 2011 at 8:34 AM, Miguel Valencia miguel.valen...@juntadeandalucia.es wrote: Hi I've read that it's possible add documents to slave machine: http://wiki.apache.org/solr/**SolrReplication#What_if_I_add_** documents_to_the_slave_or_if_**slave_index_gets_corrupted.3Fhttp://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F ¿Is there anyway to not allow add to documents to slave machine? for example, touch on configurations files to only allow handler /select. Thanks.
Re: Document Size for Indexing
what issues exactly ? are you using 32 bit Java ? That will restrict the JVM heap size to 2GB max. -Simon On Tue, Aug 30, 2011 at 11:26 AM, Tirthankar Chatterjee tchatter...@commvault.com wrote: Hi, I have a machine (win 2008R2) with 16GB RAM, I am having issue indexing 1/2GB files. How do we avoid creating a SOLRInputDocument or is there any way to directly use Lucene Index writer classes. What would be the best approach. We need some suggestions. Thanks, Tirthankar **Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message in error, please advise the sender by reply email and delete the message. Thank you. *
Re: Solr Geodist
Eric, can you please let me know the solr build, that you are using. I went to this below site, but i want to use the same build, you are using, so i can make sure the queries work. http://wiki.apache.org/solr/FrontPage#solr_development thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3296210.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Stream still in memory after tika exception? Possible memoryleak?
: The current system I'm using has 150GB of memory and while I'm indexing the : memoryconsumption is growing and growing (eventually more then 50GB). : In the attached graph (http://postimage.org/image/acyv7kec/) I indexed about : 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws Unless i'm missunderstanding sometihng about your graph, only ~12GB of memory is used by applications on that machine. About 60GB is in use by the filesystem cache. The Filesystem cache is not memory being used by Solr, it's memory that is free and not in use by an application, so your OS is (wisely) using it to cache files from disk that you've recently accessed in case you need them again. This is handy, and for max efficients (when keeping your index on disk) it's useful to make sure you allocate resources so that you have enough extra memory on your server that the entire index can be kept in the filesystem cache -- but the OS will happily free up that space for other apps that need it if they ask for more memory. : After indexing the memoryconsumption isn't dropping. Even after an optimize : command it's still there. as for why your Used memory grows to ~12GB and doesn't decrease even after an optimize: that's the way the Java memory model works. whe nyou run the JVM you specificy (either explicitly or implicitly via defaults) a min max heap size for hte JVM to allocate for itself. it starts out asking the OS for the min, and as it needs more it asks for more up to the max. but (most JVM implementations i know of) don't give back ram to the OS if they don't need it anymore -- they keep it as free space in the heap for future object allocation. -Hoss
Re: Stream still in memory after tika exception? Possible memoryleak?
Hi Erick, I am using Solr 3.3.0, but with 1.4.1 the same problems. The connector is a homemade program in the C# programming language and is posting via http remote streaming (i.e. http://localhost:8080/solr/update/extract?stream.file=/path/to/file.docliteral.id=1 ) I'm using Tika to extract the content (comes with the Solr Cell). A possible problem is that the filestream needs to be closed, after extracting, by the client application, but it seems that there is going something wrong while getting a Tika-exception: the stream never leaves the memory. At least that is my assumption. What is the common way to extract content from officefiles (pdf, doc, rtf, xls etc) and index them? To write a content extractor / validator yourself? Or is it possible to do this with the Solr Cell without getting a huge memory consumption? Please let me know. Thanks in advance. Marc 2011/8/30 Erick Erickson erickerick...@gmail.com What version of Solr are you using, and how are you indexing? DIH? SolrJ? I'm guessing you're using Tika, but how? Best Erick On Tue, Aug 30, 2011 at 4:55 AM, Marc Jacobs jacob...@gmail.com wrote: Hi all, Currently I'm testing Solr's indexing performance, but unfortunately I'm running into memory problems. It looks like Solr is not closing the filestream after an exception, but I'm not really sure. The current system I'm using has 150GB of memory and while I'm indexing the memoryconsumption is growing and growing (eventually more then 50GB). In the attached graph I indexed about 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws an exception. The commits are after 64MB, 60 seconds or after a job (there are 6 evenly divided jobs). After indexing the memoryconsumption isn't dropping. Even after an optimize command it's still there. What am I doing wrong? I can't imagine I'm the only one with this problem. Thanks in advance! Kind regards, Marc
Re: strField
: Ok. Figured it out. Thanks for the pointer. The field was of type RAW : in Oracle so it was being converted to a java string by DIH with the : behaviour below. RAW is probably very similar to BLOB... https://wiki.apache.org/solr/DataImportHandlerFaq#Blob_values_in_my_table_are_added_to_the_Solr_document_as_object_strings_like_B.401f23c5 -Hoss
Re: Viewing the complete document from within the index
Thanks Everyone for the responses. Yes, the way Eric described would work for trivial debugging but when i actually need to debug something in production this would be a big hassle ;-) For now I am going to mark the field to be stored=true to get around this problem. We are migrating away from FAST and FAST has a feature where it can dump the entire documents content from the index to a txt file. Thanks again. On Mon, Aug 29, 2011 at 8:27 AM, Erick Erickson erickerick...@gmail.comwrote: You can use Luke to re-construct the doc from the indexed terms. It takes a while, because it's not a trivial problem, so I'd use a small index for verification first If you have Luke show you the doc, it'll return stored fields, but as I remember there's a button like reconstruct and edit that does what you want... You can use the TermsComponent to see what's in the inverted part of the index, but it doesn't tell you which document is associated with the terms, so might not help much. But it seems you could do this empirically by controlling the input to a small set of docs and then querying on terms you *know* you didn't have in the input but were in the synonyms Best Erick On Mon, Aug 29, 2011 at 3:55 AM, pravesh suyalprav...@yahoo.com wrote: Reconstructing the document might not be possible, since,only the stored fields are actually stored document-wise(un-inverted), where as the indexed-only fields are put as inverted way. In don't think SOLR/Lucene currently provides any way, so, one can re-construct document in the way you desire. (It's sort of reverse engineering not supported) Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Viewing-the-complete-document-from-within-the-index-tp3288076p3292111.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Geodist
I think i found the link to the nightly build, i am going to try this flavor of solr and run the query and check what happens. The link i am using is https://builds.apache.org/job/Solr-trunk/lastSuccessfulBuild/artifact/artifacts/ thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3296316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search by range in multivalued fields
if you remove the single quotes from your query syntax it should work. in general using multivalued fields where you want to coordinate matches based on the position in the multivalued field (ie: a multivalued list of author first names and a multivalued lsit of author lastnames and you want any doc where an author is named john smith) isn't really possible -- but in your case you don't seem to really care about coordinating by position of the values in the multivalued field, because you have codes for hte companies as a prefix, so it doesn't matter where in the list it is. If i'm missudnerstaninding your question, you'll need to explain better what docs you wnat to match, and what docs you *don't* want to match. : I have a solr core with job records and one guy can work in different : companies in : a specific range of dateini to dateend. : : doc : arr name=companyinimultivaluefield : companyiniIBM10012005companyini : companyiniAPPLE10012005companyini : /arr : arr name=companyendmultivaluefield : companyendIBM10012005companyend : companyendAPPLE10012005companyend : /arr : /doc : : Is possible to make a range query on a multivalue field over text fields. : For instance something like that. : companyinimultivaluefield['IBM10012005' TO *] AND : companyendmultivaluefield['IBM10012005' TO *] -Hoss
Re: Search the contents of given URL in Solr.
For indexing the webpages, you can use Nutch with Solr, which would do the scarping and indexing of the page. For finding similar documents/pages you can use http://wiki.apache.org/solr/MoreLikeThis, by querying the above document (by id or search terms) and it would return similar documents from the index for the result. Regards, Jayendra On Tue, Aug 30, 2011 at 8:23 AM, Sheetal rituzprad...@gmail.com wrote: Hi, Is it possible to give the URL address of a site and solr search server reads the contents of the given site and recommends similar projects to that. I did scrapped the web contents from the given URL address and now have the plain text format of the contents in URL. But when I pass that scrapped text as query into Solr. It doesn't work as query being too large(depends on the given contents of URL). I read it somewhere that its possible , Given the URL address and outputs you the relevant projects to it. But I don't remember whether its using Solr search or other search engine. Does anyone have any ideas or suggestions for this..Would highly appreciate your comments Thank you in advance.. - Sheetal -- View this message in context: http://lucene.472066.n3.nabble.com/Search-the-contents-of-given-URL-in-Solr-tp3294376p3294376.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 3.3 dismax MM parameter not working properly
Anyone else strugglin' with dismax's MM parameter? We're having a problem here, seems that configs from 3 terms and more are being ignored by solr and it assumes previous configs. if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get the same results for a 3-term query. If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I also get same data for a 4-term query. I'm searching: windows service pack str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results str name=mm1lt;100% 2lt;50%/str - very same 13000 results str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get the 33 clause working. str name=mm2lt;100%/str - same 93 results, just in case. str name=mm2lt;50%/str - very same 13000 results as it should str name=mm2lt;-50%/str - 1121 results (weird) then i tried to control 3-term queries. str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring the 3 clause. str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again it. I'd like to accomplish something like this: str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8 terms at least half rounded up (5-3, 6-3, 7-4, 8-4) seems that he's only using 1 and 2 clauses. thanks in advance alexei
Re: Search the contents of given URL in Solr.
Hi Jayendra, Thank you for the reply. I figured it out finally. I had to configure my web servelet container Jetty for this..Now it works:-) - Sheetal -- View this message in context: http://lucene.472066.n3.nabble.com/Search-the-contents-of-given-URL-in-Solr-tp3294376p3296487.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get all the terms in a document as Luke does?
you might want to check - http://wiki.apache.org/solr/TermVectorComponent Should provide you with the term vectors with a lot of additional info. Regards, Jayendra On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, This time I'm trying to duplicate Luke's functionality of knowing which terms occur in a search result/document (w/o parsing it again). Any Solrj API to do that? P.S. I've also posted the question on SOhttp://stackoverflow.com/q/7219111/300248 . On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout gabri...@mysimpatico.comwrote: From you patch I see TermFreqVector which provides the information I want. I also found FieldInvertState.getLength() which seems to be exactly what I want. I'm after the word count (sum of tf for every term in the doc). I'm just not sure whether FieldInvertState.getLength() returns just the number of terms (not multiplied by the frequency of each term - word count) or not though. It seems as if it returns word count, but I've not tested it sufficienctly. On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger the.apache.t...@gmail.comwrote: Gabriele, I created a patch that does this about a year ago. See https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr 1.4 and is based upon the Document Reconstructor in Luke. The patch adds a link to the main solr admin page to a docinspector page which will reconstruct the document given a uniqueid (required). Keep in mind that you're only looking at what's in the index for non-stored fields, not the original text. If you have any issues using this on the most recent release, let me know and I'd be happy to create a new patch for solr 3.3. One of these days I'll remove the JSP dependency and this may eventually making it into trunk. Thanks, -Trey Grainger Search Technology Development Team Lead, Careerbuilder.com Site Architect, Celiaccess.com On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Stream still in memory after tika exception? Possible memoryleak?
Hi Chris, Thanks for the response. Eventualy I want to install Solr on a machine with a maximum memory of 4GB. I tried to index the data on that machine before, but it resulted in index locks and memory errors. Is 4GB not enough to index 100,000 documents in a row? How much should it be? Is there a way to tune this? Regards, Marc 2011/8/30 Chris Hostetter hossman_luc...@fucit.org : The current system I'm using has 150GB of memory and while I'm indexing the : memoryconsumption is growing and growing (eventually more then 50GB). : In the attached graph (http://postimage.org/image/acyv7kec/) I indexed about : 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws Unless i'm missunderstanding sometihng about your graph, only ~12GB of memory is used by applications on that machine. About 60GB is in use by the filesystem cache. The Filesystem cache is not memory being used by Solr, it's memory that is free and not in use by an application, so your OS is (wisely) using it to cache files from disk that you've recently accessed in case you need them again. This is handy, and for max efficients (when keeping your index on disk) it's useful to make sure you allocate resources so that you have enough extra memory on your server that the entire index can be kept in the filesystem cache -- but the OS will happily free up that space for other apps that need it if they ask for more memory. : After indexing the memoryconsumption isn't dropping. Even after an optimize : command it's still there. as for why your Used memory grows to ~12GB and doesn't decrease even after an optimize: that's the way the Java memory model works. whe nyou run the JVM you specificy (either explicitly or implicitly via defaults) a min max heap size for hte JVM to allocate for itself. it starts out asking the OS for the min, and as it needs more it asks for more up to the max. but (most JVM implementations i know of) don't give back ram to the OS if they don't need it anymore -- they keep it as free space in the heap for future object allocation. -Hoss
Re: Solr custom plugins: is it possible to have them persistent?
i thinki i have to drop the singleton class solution, since my boss wants to add 2 other different solr installation and i need to reuse the plugins i'm working on... so i'll have to use a connectionpool or i will create hangs when the 3 cores update their indexes at the same time :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3296627.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Viewing the complete document from within the index
I am trying to peek into the index to see if my index-time synonym expansions are working properly or not. For this I have successfully used the analysis page of the admin application that comes out of the box. Works really well for debugging schema changes. JRJ -Original Message- From: Paul Libbrecht [mailto:p...@hoplahup.net] Sent: Saturday, August 27, 2011 5:15 AM To: solr-user@lucene.apache.org Subject: Re: Viewing the complete document from within the index Karthik, I sure could be wrong but I never found this. My search tool implementations (3 thus far, one on solr, all on the web) have always proceeded with one tool for experts called something like indexed-view which basically remade the indexing process as a dry-run. This can also be done with analysis but I have not with solr yet. I personaly find it would be nice to have a post servlet within solr that would do exactly that: returned the array of indexed token-streams, provided I send it the document data. I think you would see what you are looking for below then./ paul Le 26 août 2011 à 23:40, karthik a écrit : Hi Everyone, I am trying to see whats the best way to view the entire document as its indexed within solr/lucene. I have tried to use Luke but it's still showing me the fields that i have configured to be returned back [ie., stored=true] unless I am not enabling some option in the tool. Is there a way to see whats actually stored in the index itself? I am trying to peek into the index to see if my index-time synonym expansions are working properly or not. The field for which I have enabled index-time synonym expansion is just used for searching so i have set stored=false. Thanks
RE: add documents to the slave
Another way that occurs to me is that if you have a securityconstraint on the update URL(s) in your web.xml, you can map them to no groups / empty groups in the JEE container. JRJ -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: Tuesday, August 30, 2011 12:21 PM To: solr-user@lucene.apache.org Subject: Re: add documents to the slave That's basically it. remove all /update URLs from the slave config On Tue, Aug 30, 2011 at 8:34 AM, Miguel Valencia miguel.valen...@juntadeandalucia.es wrote: Hi I've read that it's possible add documents to slave machine: http://wiki.apache.org/solr/**SolrReplication#What_if_I_add_** documents_to_the_slave_or_if_**slave_index_gets_corrupted.3Fhttp://wiki.apache.org/solr/SolrReplication#What_if_I_add_documents_to_the_slave_or_if_slave_index_gets_corrupted.3F ¿Is there anyway to not allow add to documents to slave machine? for example, touch on configurations files to only allow handler /select. Thanks.
RE: missing field in schema browser on solr admin
Also... Did he restart either his web app server container or at least the Solr servlet inside the container? JRJ -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Friday, August 26, 2011 5:29 AM To: solr-user@lucene.apache.org Subject: Re: missing field in schema browser on solr admin Is the field stored? Do you see it on documents when you do a q=*:* search? How is that field defined and populated? (exact config/code needed here) Erik On Aug 25, 2011, at 23:07 , deniz wrote: hi all... i have added a new field to index... but now when i check solr admin, i see some interesting stuff... i can see the field in schema and also db config file but there is nothing about the field in schema browser... in addition i cant make a search in that field... all of the config files seem correct but still no change... any ideas or anyone who has ever had a similar problem? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/missing-field-in-schema-browser-on-solr-admin-tp3285739p3285739.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.3 dismax MM parameter not working properly
Hmmm I believe I discovered the problem. When you have something like this: 250% 6-60% you should read it from right to left and use the word MORE. MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that includes 3, 4 and 5 AND 6) half is mandatory. if you wanna a special rule for 2 terms just add: 11 250% 6-60% MORE THAN ONE clauses (2) should match 1. NOW this makes sense! 2011/8/30 Alexei Martchenko ale...@superdownloads.com.br Anyone else strugglin' with dismax's MM parameter? We're having a problem here, seems that configs from 3 terms and more are being ignored by solr and it assumes previous configs. if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get the same results for a 3-term query. If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I also get same data for a 4-term query. I'm searching: windows service pack str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results str name=mm1lt;100% 2lt;50%/str - very same 13000 results str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get the 33 clause working. str name=mm2lt;100%/str - same 93 results, just in case. str name=mm2lt;50%/str - very same 13000 results as it should str name=mm2lt;-50%/str - 1121 results (weird) then i tried to control 3-term queries. str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring the 3 clause. str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again it. I'd like to accomplish something like this: str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8 terms at least half rounded up (5-3, 6-3, 7-4, 8-4) seems that he's only using 1 and 2 clauses. thanks in advance alexei -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Stream still in memory after tika exception? Possible memoryleak?
See solrconfig.xml, particularly ramBufferSizeMB, also maxBufferedDocs. There's no reason you can't index as many documents as you want, unless your documents are absolutely huge (as in 100s of M, possibly G size). Are you actually getting out of memory problems? Erick On Tue, Aug 30, 2011 at 4:24 PM, Marc Jacobs jacob...@gmail.com wrote: Hi Chris, Thanks for the response. Eventualy I want to install Solr on a machine with a maximum memory of 4GB. I tried to index the data on that machine before, but it resulted in index locks and memory errors. Is 4GB not enough to index 100,000 documents in a row? How much should it be? Is there a way to tune this? Regards, Marc 2011/8/30 Chris Hostetter hossman_luc...@fucit.org : The current system I'm using has 150GB of memory and while I'm indexing the : memoryconsumption is growing and growing (eventually more then 50GB). : In the attached graph (http://postimage.org/image/acyv7kec/) I indexed about : 70k of office-documents (pdf,doc,xls etc) and between 1 and 2 percent throws Unless i'm missunderstanding sometihng about your graph, only ~12GB of memory is used by applications on that machine. About 60GB is in use by the filesystem cache. The Filesystem cache is not memory being used by Solr, it's memory that is free and not in use by an application, so your OS is (wisely) using it to cache files from disk that you've recently accessed in case you need them again. This is handy, and for max efficients (when keeping your index on disk) it's useful to make sure you allocate resources so that you have enough extra memory on your server that the entire index can be kept in the filesystem cache -- but the OS will happily free up that space for other apps that need it if they ask for more memory. : After indexing the memoryconsumption isn't dropping. Even after an optimize : command it's still there. as for why your Used memory grows to ~12GB and doesn't decrease even after an optimize: that's the way the Java memory model works. whe nyou run the JVM you specificy (either explicitly or implicitly via defaults) a min max heap size for hte JVM to allocate for itself. it starts out asking the OS for the min, and as it needs more it asks for more up to the max. but (most JVM implementations i know of) don't give back ram to the OS if they don't need it anymore -- they keep it as free space in the heap for future object allocation. -Hoss
Re: Solr custom plugins: is it possible to have them persistent?
Well, your singleton can be the connection pool manager.. Best Erick On Tue, Aug 30, 2011 at 4:45 PM, samuele.mattiuzzo samum...@gmail.com wrote: i thinki i have to drop the singleton class solution, since my boss wants to add 2 other different solr installation and i need to reuse the plugins i'm working on... so i'll have to use a connectionpool or i will create hangs when the 3 cores update their indexes at the same time :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-custom-plugins-is-it-possible-to-have-them-persistent-tp3292781p3296627.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.3 dismax MM parameter not working properly
Yep, that one takes a while to figure out, then I wind up re-figuring it out every time I have to change it G... Best Erick On Tue, Aug 30, 2011 at 6:36 PM, Alexei Martchenko ale...@superdownloads.com.br wrote: Hmmm I believe I discovered the problem. When you have something like this: 250% 6-60% you should read it from right to left and use the word MORE. MORE THAN SIX clauses, 60% are optional, MORE THAN TWO clauses (and that includes 3, 4 and 5 AND 6) half is mandatory. if you wanna a special rule for 2 terms just add: 11 250% 6-60% MORE THAN ONE clauses (2) should match 1. NOW this makes sense! 2011/8/30 Alexei Martchenko ale...@superdownloads.com.br Anyone else strugglin' with dismax's MM parameter? We're having a problem here, seems that configs from 3 terms and more are being ignored by solr and it assumes previous configs. if I use str name=mm3lt;1/str or str name=mm3lt;100%/str i get the same results for a 3-term query. If i try str name=mm4lt;25%/str or str name=mm4lt;100%/str I also get same data for a 4-term query. I'm searching: windows service pack str name=mm1lt;100% 2lt;50% 3lt;100%/str - 13000 results str name=mm1lt;100% 2lt;50% 3lt;1/str - the same 13000 results str name=mm1lt;100% 2lt;50%/str - very same 13000 results str name=mm1lt;100% 2lt;100%/str - 93 results. seems that here i get the 33 clause working. str name=mm2lt;100%/str - same 93 results, just in case. str name=mm2lt;50%/str - very same 13000 results as it should str name=mm2lt;-50%/str - 1121 results (weird) then i tried to control 3-term queries. str name=mm2lt;-50% 3lt;100%/str 1121, the same as 2-50%, ignoring the 3 clause. str name=mm2lt;-50% 3lt;1/str the same 1121 results, ignoring again it. I'd like to accomplish something like this: str name=mm2lt;1 3lt;2 4lt;3 8lt;-50%/str translating: 1 or 2 - 1 term, 3 at least 2, 4 at least 3 and 5, 6, 7, 8 terms at least half rounded up (5-3, 6-3, 7-4, 8-4) seems that he's only using 1 and 2 clauses. thanks in advance alexei -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Shingle and Query Performance
OK, I'll have to defer because this makes no sense. 4+ seconds in the debug component? Sorry I can't be more help here, but nothing really jumps out. Erick On Tue, Aug 30, 2011 at 12:45 PM, Lord Khan Han khanuniver...@gmail.com wrote: Below the output of the debug. I am measuring pure solr qtime which show in the Qtime field in solr xml. arr name=parsed_filter_queries strmrank:[0 TO 100]/str /arr lst name=timing double name=time8584.0/double lst name=prepare double name=time12.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time12.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.SpellCheckComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst lst name=process double name=time8572.0/double lst name=org.apache.solr.handler.component.QueryComponent double name=time4480.0/double /lst lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.HighlightComponent double name=time41.0/double /lst lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.SpellCheckComponent double name=time0.0/double /lst lst name=org.apache.solr.handler.component.DebugComponent double name=time4051.0/double /lst On Tue, Aug 30, 2011 at 5:38 PM, Erick Erickson erickerick...@gmail.comwrote: Can we see the output if you specify both debugQuery=ondebug=true the debug=true will show the time taken up with various components, which is sometimes surprising... Second, we never asked the most basic question, what are you measuring? Is this the QTime of the returned response? (which is the time actually spent searching) or the time until the response gets back to the client, which may involve lots besides searching... Best Erick On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han khanuniver...@gmail.com wrote: Hi Eric, Fields are lazy loading, content stored in solr and machine 32 gig.. solr has 20 gig heap. There is no swapping. As you see we have many phrases in the same query . I couldnt find a way to drop qtime to subsecends. Suprisingly non shingled test better qtime ! On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson erickerick...@gmail.com wrote: Oh, one other thing: have you profiled your machine to see if you're swapping? How much memory are you giving your JVM? What is the underlying hardware setup? Best Erick On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson erickerick...@gmail.com wrote: 200K docs and 36G index? It sounds like you're storing your documents in the Solr index. In and of itself, that shouldn't hurt your query times, *unless* you have lazy field loading turned off, have you checked that lazy field loading is enabled? Best Erick On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han khanuniver...@gmail.com wrote: Another insteresting thing is : all one word or more word queries including phrase queries such as barack obama slower in shingle configuration. What i am doing wrong ? without shingle barack obama Querytime 300ms with shingle 780 ms.. On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han khanuniver...@gmail.com wrote: Hi, What is the difference between solr 3.3 and the trunk ? I will try 3.3 and let you know the results. Here the search handler: requestHandler name=search class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows10/int !--str name=fqcategory:vv/str-- str name=fqmrank:[0 TO 100]/str str name=echoParamsexplicit/str int name=rows10/int str name=defTypeedismax/str !--str name=qftitle^0.05 url^1.2 content^1.7 m_title^10.0/str-- str name=qftitle^1.05 url^1.2 content^1.7 m_title^10.0/str !-- str name=bfrecip(ee_score,-0.85,1,0.2)/str -- str name=pfcontent^18.0 m_title^5.0/str int name=ps1/int int name=qs0/int str name=mm2lt;-25%/str str name=spellchecktrue/str !--str name=spellcheck.collatetrue/str -- str name=spellcheck.count5/str str name=spellcheck.dictionarysubobjective/str str name=spellcheck.onlyMorePopularfalse/str str name=hl.tag.prelt;bgt;/str str name=hl.tag.postlt;/bgt;/str str
Re: Solr Geodist
That should be fine. I'm not actually sure what version of Trunk I have, I update it sporadically and build from scratch. But the last successful build artifacts will certainly have the pseudo-field return of function in it, so you should be fine. Best Erick On Tue, Aug 30, 2011 at 2:33 PM, solrnovice manisha...@yahoo.com wrote: I think i found the link to the nightly build, i am going to try this flavor of solr and run the query and check what happens. The link i am using is https://builds.apache.org/job/Solr-trunk/lastSuccessfulBuild/artifact/artifacts/ thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3296316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Geodist
hi Erik, today i had the distance working. Since the solr version under LucidImagination is not returning geodist(), I downloaded Solr 4.0 from the nightly build. On lucid we had the full schema defined. So i copied that schema to the example directory of solr-4 and removed all references to Lucid and started the index. I wanted to try our schema under solr-4. Then i had the data indexed ( we have a rake written in ruby to index the contents) and ran the geodist queries and they all run like a charm. I do get distance as a pseudo column. Is there any documentation that gives me all the arguments of geodist(), i couldnt find it online. Erick, thanks for your help in going through my examples. NOw they all work on my solr installation. thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297088.html Sent from the Solr - User mailing list archive at Nabble.com.
Why I can't take an full-import with entity name?
I am using solr1.3,I updated solr index throgh solr delta import every two hours. but the delta import is database connection wasteful. So i want to use full-import with entity name instead of delta import. my db-data-config.xml file: entity name=article pk=Article_ID query=select Article_ID,Article_Title,Article_Abstract from Article_Detail field name=Article_ID column=Article_ID / /entity entity name=delta_article pk=Article_ID rootEngity=false query=select Article_ID,Article_Title,Article_Abstract from Article_Detail where Article_IDgt;'${dataimporter.request.minID}' and Article_ID lt;='{dataimporter.request.maxID}' field name=Article_ID column=Article_ID / /entity then I uses http://192.168.1.98:8081/solr/db_article/dataimport?command=full-importentity=delta_articlecommit=trueclean=falsemaxID=1000minID=10 but the solr will finish nearyly instant,and there is no any record imported. but what the fact is there are many records meets the condtion of maxID and minID. the tomcat log: 信息: [db_article] webapp=/solr path=/dataimport params={maxID=6737277clean=falsecommit=trueentity=delta_articlecommand=full-importminID=6736841} status=0 QTime=0 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.DataImporter doFullImport 信息: Starting Full Import 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties 信息: Read dataimport.properties 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.SolrWriter persistStartTime 信息: Wrote last indexed time to dataimport.properties 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.DocBuilder commit 信息: Full Import completed successfully some body who can help or some advices?
Re: Solr Geodist
Lucid also has an online forum for questions about the LucidWorksEnterprise product: http://www.lucidimagination.com/forum/lwe The Lucidi Imagination engineers all read the forum and endeavor to quickly answer questions like this. On Tue, Aug 30, 2011 at 6:09 PM, solrnovice manisha...@yahoo.com wrote: hi Erik, today i had the distance working. Since the solr version under LucidImagination is not returning geodist(), I downloaded Solr 4.0 from the nightly build. On lucid we had the full schema defined. So i copied that schema to the example directory of solr-4 and removed all references to Lucid and started the index. I wanted to try our schema under solr-4. Then i had the data indexed ( we have a rake written in ruby to index the contents) and ran the geodist queries and they all run like a charm. I do get distance as a pseudo column. Is there any documentation that gives me all the arguments of geodist(), i couldnt find it online. Erick, thanks for your help in going through my examples. NOw they all work on my solr installation. thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297088.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Changing the DocCollector
So I looked at doing this, but I don't see a way to get the scores from the docs as well. Am I missing something in that regards? On Mon, Aug 29, 2011 at 8:53 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Hoss. I am actually ok with that, I think something like 50,000 results from each shard as a max would be reasonable since my check takes about 1s for 50,000 records. I'll give this a whirl and see how it goes. On Mon, Aug 29, 2011 at 6:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Also I see that this is before sorting, is there a way to do something : similar after sorting? The reason is that I'm ok with the total : result not being completely accurate so long as the first say 10 pages : are accurate. The results could get more accurate as you page through : them though. Does that make sense? munging results after sorting is dangerous in the general case, but if you have a specific usecase where you're okay with only garunteeing accurate results up to result #X, then you might be able to get away with something like... * custom SearchComponent * configure to run after QueryComponent * in prepare, record the start rows params, and replace them with 0 (MAX_PAGE_NUM * rows) * in process, iterate over the the DocList and build up your own new DocSlice based on the docs that match your special criteria - then use the original start/rows to generate a subset and return that ...getting this to play nicely with stuff like faceting be possible with more work, and manipulation of the DocSet (assuming you're okay with the facet counts only being as accurate as much as the DocList is -- filtered up to row X). it could fail misserablly with distributed search since you hvae no idea how many results will pass your filter. (note: this is all off the top of my head ... no idea if it would actually work) -Hoss
Re: How to send an OpenBitSet object from Solr server?
I was not referring to Lucene's doc ids but the doc numbers (unique key) Satish On Tue, Aug 30, 2011 at 9:28 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : We have a need to query and fetch millions of document ids from a Solr 3.3 : index and convert the same to a BitSet. To speed things up, we want to : convert these document ids into OpenBitSet on the server side, put them into : the response object and read the same on the client side. This smells like an XY Problem ... what do you intend to do with this BitSet on the client side? the lucene doc ids are meaningless outside of hte server, and for any given doc, the id could change from one request to the next -- so how would having this data on the clinet be of any use to you? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
solrClound
Hi all: I'm using SolrClound for distributed search. Everything comes to well but there is a small problem.Each node have searched quickly and process the data to top request,I found a request like q=solrids=id1,id2,id3,id4,id5,id6...,id10. Solr handles this request with a 'for' loop.Each id cost 300ms or so。If there are 100 per page,the costs willn't be tolerate。 And also,I'm studying solrclund hard because of lacking of documents.I just found in http://wiki.apache.org/solr/SolrCloud.Is there any others? Hope your answer.
Re: Solr Geodist
hi Lance, thanks for the link, i went to their site, lucidimagination forum, when i searched on geodist, i see my own posts. Is this forum part of lucidimagination? Just curious. thanks SN -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Geodist-tp3287005p3297262.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Changing the DocCollector
Found score, so this works for regular queries but now I'm getting an exception when faceting. SEVERE: Exception during facet.field of type:java.lang.NullPointerException at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:451) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:313) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:357) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:191) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:81) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1290) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Any insight into what would cause that? On Tue, Aug 30, 2011 at 10:13 PM, Jamie Johnson jej2...@gmail.com wrote: So I looked at doing this, but I don't see a way to get the scores from the docs as well. Am I missing something in that regards? On Mon, Aug 29, 2011 at 8:53 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Hoss. I am actually ok with that, I think something like 50,000 results from each shard as a max would be reasonable since my check takes about 1s for 50,000 records. I'll give this a whirl and see how it goes. On Mon, Aug 29, 2011 at 6:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Also I see that this is before sorting, is there a way to do something : similar after sorting? The reason is that I'm ok with the total : result not being completely accurate so long as the first say 10 pages : are accurate. The results could get more accurate as you page through : them though. Does that make sense? munging results after sorting is dangerous in the general case, but if you have a specific usecase where you're okay with only garunteeing accurate results up to result #X, then you might be able to get away with something like... * custom SearchComponent * configure to run after QueryComponent * in prepare, record the start rows params, and replace them with 0 (MAX_PAGE_NUM * rows) * in process, iterate over the the DocList and build up your own new DocSlice based on the docs that match your special criteria - then use the original start/rows to generate a subset and return that ...getting this to play nicely with stuff like faceting be possible with more work, and manipulation of the DocSet (assuming you're okay with the facet counts only being as accurate as much as the DocList is -- filtered up to row X). it could fail misserablly with distributed search since you hvae no idea how many results will pass your filter. (note: this is all off the top of my head ... no idea if it would actually work) -Hoss
Duplication of Output
Hello, What is the best way to remove duplicate values on output. I am using the following query: /solr/select/?q=wrt54g2version=2.2start=0rows=10indent=on*fl=productid* And I get the following results: doc int name=productid1011630553/int /doc doc int name=productid1011630553/int /doc docint name=productid1011630553/int /doc docint name=productid1011630553/int /doc docint name=productid1011630553/int /doc docint name=productid1011630553/int /doc docint name=productid1011630553/int /doc docint name=productid1013033708/int /doc docint name=productid1013033708/int /doc docint name=productid1013033708/int /doc But I don't want those results because there are duplicates. I am looking for results like below: doc int name=productid1011630553/int /doc doc int name=productid1013033708/int /doc I know there is deduplication and field collapsing but I am not sure if they are applicable in this situation. Thanks for your help!
Re: How to get all the terms in a document as Luke does?
The Term Vector Component (TVC) is a SearchComponent designed to return information about documents that is stored when setting the termVector attribute on a field: Will I have to re-index after adding that to the schema? On Tue, Aug 30, 2011 at 11:06 PM, Jayendra Patil jayendra.patil@gmail.com wrote: you might want to check - http://wiki.apache.org/solr/TermVectorComponent Should provide you with the term vectors with a lot of additional info. Regards, Jayendra On Tue, Aug 30, 2011 at 3:34 AM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, This time I'm trying to duplicate Luke's functionality of knowing which terms occur in a search result/document (w/o parsing it again). Any Solrj API to do that? P.S. I've also posted the question on SOhttp://stackoverflow.com/q/7219111/300248 . On Wed, Jul 6, 2011 at 11:09 AM, Gabriele Kahlout gabri...@mysimpatico.comwrote: From you patch I see TermFreqVector which provides the information I want. I also found FieldInvertState.getLength() which seems to be exactly what I want. I'm after the word count (sum of tf for every term in the doc). I'm just not sure whether FieldInvertState.getLength() returns just the number of terms (not multiplied by the frequency of each term - word count) or not though. It seems as if it returns word count, but I've not tested it sufficienctly. On Wed, Jul 6, 2011 at 1:39 AM, Trey Grainger the.apache.t...@gmail.comwrote: Gabriele, I created a patch that does this about a year ago. See https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr 1.4 and is based upon the Document Reconstructor in Luke. The patch adds a link to the main solr admin page to a docinspector page which will reconstruct the document given a uniqueid (required). Keep in mind that you're only looking at what's in the index for non-stored fields, not the original text. If you have any issues using this on the most recent release, let me know and I'd be happy to create a new patch for solr 3.3. One of these days I'll remove the JSP dependency and this may eventually making it into trunk. Thanks, -Trey Grainger Search Technology Development Team Lead, Careerbuilder.com Site Architect, Celiaccess.com On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y.
shareSchema=true - location of schema.xml?
I have 1000's of cores and to reduce the cost of loading unloading schema.xml, I have my solr.xml as mentioned here - http://wiki.apache.org/solr/CoreAdmin namely: solr cores adminPath=/admin/cores shareSchema=true ... /cores /solr However, I am not sure where to keep the common schema.xml file? In which case, do I need the schema.xml in the conf folder of each and every core? My folder structure is: multicore (contains solr.xml) |_ core0 |_ conf ||_ schema.xml ||_ solrconfig.xml ||_ other files core1 |_ conf ||_ schema.xml ||_ solrconfig.xml ||_ other files | exampledocs (contains 1000's of .csv files and post.jar) Satish