cleanup after OutOfMemoryError
I have an application where I am calling DirectUpdateHandler2 directly with: update.addDoc(cmd); This will sometimes hit: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248) at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273) at org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126) at org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212) at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303) and then a little while later: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) Is there anythign I can/should do to cleanup after the OOME? At a minimum I do not want any new requests using the same IndexWriter. Should I use: catch(OutOfMemoryError ex) { update.getCommitTracker().cancelPendingCommit(); update.newIndexWriter(false); ... or perhaps 'true' for rollback? Thanks Ryan
NRT persistant flags?
I'm looking for a way to quickly flag/unflag documents. This could be one at a time or by query (even *:*) I have hacked together something based on ExternalFileField that is essentially a FST holding all the ids (solr not lucene). Like the FieldCache, it holds a WeakHashMapAtomicReader,OpenBitSet where the OpenBitSet is loaded by iterating the FST on the reader (just like ExternalFileField) This seems to work OK, but there *must* be something better! Any ideas on the right approach for something like this? This feels like it should be related to DocValues or the FieldCache Thanks for any pointers! ryan
edismax bq, ignore tf/idf?
Hi- I am trying to add a setting that will boost results based on existence in different buckets. Using edismax, I added the bq parameter: location:A^5 location:B^3 I want this to put everything in location A above everything in location B. This mostly works, BUT depending on the number of matches for each location, location:B can get a higher final score. Is there a way to ignore tf/idf when boosting this location? location from a field type: class=solr.StrField omitNorms=true Thanks for any pointers! ryan
Re: edismax bq, ignore tf/idf?
thanks! On Fri, Oct 26, 2012 at 4:20 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : How about a boost function, bf or boost? : : bf=if(exists(query(location:A)),5,if(exists(query(location:B)),3,0)) Right ... assuming you only want to ignore tf/idf on these fields in this specifc context, function queries are the way to go -- otherwise you could just use a per-field similarity to ignore tf/idf. I would suggest however that instead of using the exists(query()) consider the tf() function ... bf=if(tf(location,A),5,0)bf=if(tf(location,B),3,0) s/bf/boost/g s/0/1/g if you wnat mutiplicitive boosts. -Hoss
Re: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser
If you optimize the index, are the results the same? maybe it is showing counts for deleted docs (i think it does... and this is expected) ryan On Sat, Aug 25, 2012 at 9:57 AM, Fuad Efendi f...@efendi.ca wrote: This is bug in Solr 4.0.0-Beta Schema Browser: Load Term Info shows 9682 News, but direct query shows 3577. /solr/core0/select?q=channel:Newsfacet=truefacet.field=channelrows=0 response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=facettrue/str str name=qchannel:News/str str name=facet.fieldchannel/str str name=rows0/str /lst /lst result name=response numFound=3577 start=0/ lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=channel int name=News3577/int int name=Blogs0/int int name=Message Boards0/int int name=Video0/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst /response -Original Message- Sent: August-24-12 11:29 PM To: solr-user@lucene.apache.org Cc: sole-...@lucene.apache.org Subject: RE: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser Importance: High Any news? CC: Dev -Original Message- Subject: Solr-4.0.0-Beta Bug with Load Term Info in Schema Browser Hi there, Load term Info shows 3650 for a specific term MyTerm, and when I execute query channel:MyTerm it shows 650 documents foundŠ possibly bugŠ it happens after I commit data too, nothing changes; and this field is single-valued non-tokenized string. -Fuad -- Fuad Efendi 416-993-2060 http://www.tokenizer.ca - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ContentStreamUpdateRequest method addFile in 4.0 release.
for the ExtractingRequestHandler, you can put anything into the request contentType. try: addFile( file, application/octet-stream ) but anything should work ryan On Thu, Jun 7, 2012 at 2:32 PM, Koorosh Vakhshoori kvakhsho...@gmail.com wrote: In latest 4.0 release, the addFile() method has a new argument 'contentType': addFile(File file, String contentType) In context of Solr Cell how should addFile() method be called? Specifically I refer to the Wiki example: ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(mailing_lists.pdf)); up.setParam(literal.id, mailing_lists.pdf); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); result = server.request(up); assertNotNull(Couldn't upload mailing_lists.pdf, result); rsp = server.query( new SolrQuery( *:*) ); Assert.assertEquals( 1, rsp.getResults().getNumFound() ); given at URL: http://wiki.apache.org/solr/ExtractingRequestHandler Since Solr Cell is calling Tika under the hood, doesn't the file content-type is already identified by Tika? Looking at the code, it seems passing NULL would do the job, is that correct? Also for Solr Cell, is the ContentStreamUpdateRequest class is the right one to use or there is a different class that is more appropriate here? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ContentStreamUpdateRequest-method-addFile-in-4-0-release-tp3988344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: - Solr 4.0 - How do I enable JSP support ? ...
In 4.0, solr no longer uses JSP, so it is not enabled in the example setup. You can enable JSP in your servlet container using whatever method they provide. For Jetty, using start.jar, you need to add the command line: java -jar start.jar -OPTIONS=jsp ryan On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
Re: - Solr 4.0 - How do I enable JSP support ? ...
just use the admin UI -- look at the 'cloud' tab On Tue, May 15, 2012 at 12:53 PM, Naga Vijayapuram nvija...@tibco.com wrote: Alright; thanks. Tried with -OPTIONS=jsp and am still seeing this on console Š 2012-05-15 12:47:08.837:INFO:solr:No JSP support. Check that JSP jars are in lib/jsp and that the JSP option has been specified to start.jar I am trying to go after http://localhost:8983/solr/collection1/admin/zookeeper.jsp (or its equivalent in 4.0) after going through http://wiki.apache.org/solr/SolrCloud May I know the right zookeeper url in 4.0 please? Thanks Naga On 5/15/12 10:56 AM, Ryan McKinley ryan...@gmail.com wrote: In 4.0, solr no longer uses JSP, so it is not enabled in the example setup. You can enable JSP in your servlet container using whatever method they provide. For Jetty, using start.jar, you need to add the command line: java -jar start.jar -OPTIONS=jsp ryan On Mon, May 14, 2012 at 2:34 PM, Naga Vijayapuram nvija...@tibco.com wrote: Hello, How do I enable JSP support in Solr 4.0 ? Thanks Naga
Re: syntax for negative query OR something
thanks! On Wed, May 2, 2012 at 4:43 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : How do I search for things that have no value or a specified value? Things with no value... (*:* -fieldName:[* TO *]) Things with a specific value... fieldName:A Things with no value or a specific value... (*:* -fieldName:[* TO *]) fieldName:A ...or if you aren't using OR as your default op (*:* -fieldName:[* TO *]) OR fieldName:A : I have a few variations of: : -fname:[* TO *] OR fname:(A B C) that is just syntacitic sugar for... -fname:[* TO *] fname:(A B C) which is an empty set. you need to be explicit that the exclude docs with a value in this field clause should applied to the set of all documents -Hoss
Re: Latest solr4 snapshot seems to be giving me a lot of unhappy logging about 'Log4j', should I be concerned?
check a release since r1332752 If things still look problematic, post a comment on: https://issues.apache.org/jira/browse/SOLR-3426 this should now have a less verbose message with an older SLF4j and with Log4j On Tue, May 1, 2012 at 10:14 AM, Gopal Patwa gopalpa...@gmail.com wrote: I have similar issue using log4j for logging with trunk build, the CoreConatainer class print big stack trace on our jboss 4.2.2 startup, I am using sjfj 1.5.2 10:07:45,918 WARN [CoreContainer] Unable to read SLF4J version java.lang.NoSuchMethodError: org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder; at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:101) On Tue, May 1, 2012 at 9:25 AM, Benson Margulies bimargul...@gmail.comwrote: On Tue, May 1, 2012 at 12:16 PM, Mark Miller markrmil...@gmail.com wrote: There is a recent JIRA issue about keeping the last n logs to display in the admin UI. That introduced a problem - and then the fix introduced a problem - and then the fix mitigated the problem but left that ugly logging as a by product. Don't remember the issue # offhand. I think there was a dispute about what should be done with it. On May 1, 2012, at 11:14 AM, Benson Margulies wrote: CoreContainer.java, in the method 'load', finds itself calling loader.NewInstance with an 'fname' of Log4j of the slf4j backend is 'Log4j'. Couldn't someone just fix the if statement to say, 'OK, if we're doing log4j, we have no log watcher' and skip all the loud failing on the way? e.g.: 2012-05-01 10:40:32,367 org.apache.solr.core.CoreContainer - Unable to load LogWatcher org.apache.solr.common.SolrException: Error loading class 'Log4j' What is it actually looking for? Have I misplaced something? - Mark Miller lucidimagination.com
Re: Ampersand issue
If your json value is amp; the proper xml value is amp;amp; What is the value you are setting on the stored field? is is or amp;? On Mon, Apr 30, 2012 at 12:57 PM, William Bell billnb...@gmail.com wrote: One idea was to wrap the field with CDATA. Or base64 encode it. On Fri, Apr 27, 2012 at 7:50 PM, Bill Bell billnb...@gmail.com wrote: We are indexing a simple XML field from SQL Server into Solr as a stored field. We have noticed that the amp; is outputed as amp;amp; when using wt=XML. When using wt=JSON we get the normal amp;. If there a way to indicate that we don't want to encode the field since it is already XML when using wt=XML ? Bill Bell Sent from mobile -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: EmbeddedSolrServer and StreamingUpdateSolrServer
In general -- i would not suggest mixing EmbeddedSolrServer with a different style (unless the other instances are read only). If you have multiple instances writing to the same files on disk you are asking for problems. Have you tried just using StreamingUpdateSolrServer for daily update? I would suspect that it would be faster then EmbeddedSolrServer anyway. ryan On Wed, Apr 25, 2012 at 11:32 PM, pcrao purn...@gmail.com wrote: Hi, Any more thoughts?? Thanks, PC Rao. -- View this message in context: http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3940383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting fields in SOLR using Solrj
I would suggest debugging with browser requests -- then switching to Solrj after you are at 1st base. In particular, try adding the debugQuery=true parameter to the request and see what solr thinks is happening. The value that will work for the 'qt' parameter depends on what is configured in solrconfig.xml -- I suspect you want to point to a requestHandler that is configured to use edismax query parser. This can be configured by default with: lst name=defaults str name=defTypeedismax/str /lst ryan On Wed, Apr 25, 2012 at 3:57 PM, Joe joe.pol...@gmail.com wrote: Hi, I'm using the solrj API to query my SOLR 3.6 index. I have multiple text fields, which I would like to weight differently. From what I've read, I should be able to do this using the dismax or edismax query types. I've tried the following: SolrQuery query = new SolrQuery(); query.setQuery( title:apples oranges content:apples oranges); query.setQueryType(edismax); query.set(qf, title^10.0 content^1.0); QueryResponse rsp = m_Server.query( query ); But this doesn't work. I've tried the following variations to set the query type, but it doesn't seem to make a difference. query.setQueryType(dismax); query.set(qt,dismax); query.set(type,edismax); query.set(qt,edismax); query.set(type,dismax); I'd like to retain the full Lucene query syntax, so I prefer ExtendedDisMax to DisMax. Boosting individual terms in the query (as shown below) does work, but is not a valid solution, since the queries are automatically generated and can get arbitrarily complex is syntax. query.setQuery( title:apples^10.0 oranges^10.0 content:apples oranges); Any help would be much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-fields-in-SOLR-using-Solrj-tp3939789p3939789.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 'No JSP support' error in embedded Jetty for solrCloud as of apache-solr-4.0-2012-04-02_11-54-55
zookeeper.jsp was removed (along with all JSP stuff) in trunk Take a look at the cloud tab in the UI, or check the /zookeeper servlet for the JSON raw output ryan On Mon, Apr 9, 2012 at 6:42 AM, Benson Margulies bimargul...@gmail.com wrote: Starting the leader with: java -Dbootstrap_confdir=./solr/conf -Dcollection.configName=rnicloud -DzkRun -DnumShards=3 -Djetty.port=9167 -jar start.jar and browsing to http://localhost:9167/solr/rnicloud/admin/zookeeper.jsp I get: HTTP ERROR 500 Problem accessing /solr/rnicloud/admin/zookeeper.jsp. Reason: JSP support not configured Powered by Jetty://
Re: SolrCloud Zookeeper view does not work on latest snapshot
There have been a bunch of changes getting the zookeeper info and UI looking good. The info moved from being on the core to using a servlet at the root level. Note, it is not a request handler anymore, so the wt=XXX has no effect. It is always JSON ryan On Fri, Apr 6, 2012 at 7:01 AM, Jamie Johnson jej2...@gmail.com wrote: I looked at our old system and indeed it used to make a call to /solr/zookeeper not /solr/corename/zookeeper. I am making a change locally so I can run with this but is this a bug or did I much something up with my configuration? On Fri, Apr 6, 2012 at 9:33 AM, Jamie Johnson jej2...@gmail.com wrote: I just downloaded the latest snapshot and fired it up to take a look around and I'm getting the following error when looking at the Cloud view. Loading of undefined failed with HTTP-Status 404 The request I see going out is as follows http://localhost:8501/solr/slice1_shard1/zookeeper?wt=json this doesn't work but this does http://localhost:8501/solr/zookeeper?wt=json Any thoughts why this would happen?
Re: solr geospatial / spatial4j
On Wed, Mar 7, 2012 at 7:25 AM, Matt Mitchell goodie...@gmail.com wrote: Hi, I'm researching options for handling a better geospatial solution. I'm currently using Solr 3.5 for a read-only database, and the point/radius searches work great. But I'd like to start doing point in polygon searches as well. I've skimmed through some of the geospatial jira issues, and read about spaitial4j, which is very interesting. I see on the github page that this will soon be part of lucene, can anyone confirm this? perhaps -- see the discussion on: https://issues.apache.org/jira/browse/LUCENE-3795 This will involve a few steps before it is actually integrated with the lucene project -- and then a few more to be usable from solr I attempted to build the spatial4j demo but no luck. It had problems finding lucene 4.0-SNAPSHOT, which I guess is because there are no 4.0-SNAPSHOT nightly builds? If anyone knows how I can get around this, please let me know! ya they are published -- you just have to specify where you want to pull them from. If you use the 'updateLucene' profile, it will pull them from: https://repository.apache.org/content/groups/snapshots/ use: mvn clean install -P updateLucene Other than spatial4j, is there a way to do point in polgyon searches with solr 3.5.0 right now? Is there some tricky indexing/querying strategy that would allow this? I don't know of anything else -- and note that polygon stuff has a ways to go before it is generally ready for prime-time. ryan
Re: Improving performance for SOLR geo queries?
Hi Matthias- I'm trying to understand how you have your data indexed so we can give reasonable direction. What field type are you using for your locations? Is it using the solr spatial field types? What do you see when you look at the debug information from debugQuery=true? From my experience, there is no single best practice for spatial queries -- it will depend on your data density and distribution if. You may also want to look at: http://code.google.com/p/lucene-spatial-playground/ but note this is off lucene trunk -- the geohash queries are super fast though ryan 2012/2/8 Matthias Käppler matth...@qype.com: Hi Erick, if we're not doing geo searches, we filter by location tags that we attach to places. This is simply a hierachical regional id, which is simple to filter for, but much less flexible. We use that on Web a lot, but not on mobile, where we want to performance searches in arbitrary radii around arbitrary positions. For those location tag kind of queries, the average time spent in SOLR is 43msec (I'm looking at the New Relic snapshot of the last 12 hours). I have disabled our optimization again just yesterday, so for the bbox queries we're now at an avg of 220ms (same time window). That's a 5 fold increase in response time, and in peak hours it's worse than that. I've also found a blog post from 3 years ago which outlines the inner workings of the SOLR spatial indexing and searching: http://www.searchworkings.org/blog/-/blogs/23842 From that it seems as if SOLR already performs a similar optimization we had in mind during the index step, so if I understand correctly, it doesn't even search over all records, only those that were mapped to the grid box identified during indexing. What I would love to see is what the suggested way is to perform a geo query on SOLR, considering that they're so difficult to cache and expensive to run. Is the best approach to restrict the candidate set as much as possible using cheap filter queries, so that SOLR merely has to do the geo search against these subsets? How does the query planner work here? I see there's a cost attached to a filter query, but one can only set it when cache is set to false? Are cached geo queries executed last when there are cheaper filter queries to cut down on documents? If you have a real world practical setup to share, one that performs well in a production environment that serves requests in the Millions per day, that would be great. I'd love to contribute documentation by the way, if you knew me you'd know I'm an avid open source contributor and actually run several open source projects myself. But tell me, how can I possibly contribute answer to questions I don't have an answer to? That's why I'm here, remember :) So please, these kinds of snippy replies are not helping anyone. Thanks -Matthias On Tue, Feb 7, 2012 at 3:06 PM, Erick Erickson erickerick...@gmail.com wrote: So the obvious question is what is your performance like without the distance filters? Without that knowledge, we have no clue whether the modifications you've made had any hope of speeding up your response times As for the docs, any improvements you'd like to contribute would be happily received Best Erick 2012/2/6 Matthias Käppler matth...@qype.com: Hi, we need to perform fast geo lookups on an index of ~13M places, and were running into performance problems here with SOLR. We haven't done a lot of query optimization / SOLR tuning up until now so there's probably a lot of things we're missing. I was wondering if you could give me some feedback on the way we do things, whether they make sense, and especially why a supposed optimization we implemented recently seems to have no effect, when we actually thought it would help a lot. What we do is this: our API is built on a Rails stack and talks to SOLR via a Ruby wrapper. We have a few filters that almost always apply, which we put in filter queries. Filter cache hit rate is excellent, about 97%, and cache size caps at 10k filters (max size is 32k, but it never seems to reach that many, probably because we replicate / delta update every few minutes). Still, geo queries are slow, about 250-500msec on average. We send them with cache=false, so as to not flood the fq cache and cause undesirable evictions. Now our idea was this: while the actual geo queries are poorly cacheable, we could clearly identify geographical regions which are more often queried than others (naturally, since we're a user driven service). Therefore, we dynamically partition Earth into a static grid of overlapping boxes, where the grid size (the distance of the nodes) depends on the maximum allowed search radius. That way, for every user query, we would always be able to identify a single bounding box that covers it. This larger bounding box (200km edge length) we would send to SOLR as a cached filter query, along with the actual user query
Best approach to Intersect results with big SetString?
I have an application where I need to return all results that are not in a SetString (the Set is managed from hazelcast... but that is not relevant) As a fist approach, i have a SerachComponent that injects a BooleanQuery: BooleanQuery bq = new BooleanQuery(true); for( String id : ids) { bq.add(new BooleanClause(new TermQuery(new Term(id,id)),Occur.MUST_NOT)); } This works, but i'm concerned about how many terms we could end up with as the size grows. Another possibility could be a Filter that iterates though FieldCache and checks if each value is in the SetString Any thoughts/directions on things to look at? thanks ryan
Re: Using FieldCache in SolrIndexSearcher - crazy idea?
Ah, thanks Hoss - I had meant to respond to the original email, but then I lost track of it. Via pseudo-fields, we actually already have the ability to retrieve values via FieldCache. fl=id:{!func}id But using CSF would probably be better here - no memory overhead for the FieldCache entry. Not sure if this is related, but we should also consider using the memory codec for id field https://issues.apache.org/jira/browse/LUCENE-3209
Re: Is solrj 3.3.0 ready for field collapsing?
patches are always welcome! On Tue, Jul 5, 2011 at 3:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote: i've tried to add the params for group=true and group.field=myfield by using the SolrQuery. But the result is null. Do i have to configure something? In wiki part for field collapsing i couldn't find anything. No specific (type-safe) support for grouping is in SolrJ currently. But you should still have access to the complete generic solr response via SolrJ regardless (i.e. use getResponse()) -Yonik http://www.lucidimagination.com
Re: JOIN, query on the parent?
On Fri, Jul 1, 2011 at 9:06 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Jun 30, 2011 at 6:19 PM, Ryan McKinley ryan...@gmail.com wrote: Hello- I'm looking for a way to find all the links from a set of results. Consider: doc id:1 type:X link:a link:b /doc doc id:2 type:X link:a link:c /doc doc id:3 type:Y link:a /doc Is there a way to search for all the links from stuff of type X -- in this case (a,b,c) Do the links point to other documents somehow? Let's assume that there are documents with ids of a,b,c fq={!join from=link to=id}type:X Basically, you start with the set of documents that match type:X, then follow from link to id to arrive at the new set of documents. Yup -- that works. Thank you! ryan
JOIN, query on the parent?
Hello- I'm looking for a way to find all the links from a set of results. Consider: doc id:1 type:X link:a link:b /doc doc id:2 type:X link:a link:c /doc doc id:3 type:Y link:a /doc Is there a way to search for all the links from stuff of type X -- in this case (a,b,c) If I'm understanding the {!join stuff, it lets you search on the children, but i don't really see how to limit the parent values. Am I missing something, or is this a further extension to the JoinQParser? thanks ryan
Re: Solr: Images, Docs and Binary data
You can store binary data using a binary field type -- then you need to send the data base64 encoded. I would strongly recommend against storing large binary files in solr -- unless you really don't care about performance -- the file system is a good option that springs to mind. ryan 2011/4/6 Ezequiel Calderara ezech...@gmail.com: Another question that maybe is easier to answer, how can i store binary data? Any example schema? 2011/4/6 Ezequiel Calderara ezech...@gmail.com Hello everyone, i need to know if some has used solr for indexing and storing images (upt to 16MB) or binary docs. How does solr behaves with this type of docs? How affects performance? Thanks Everyone -- __ Ezequiel. Http://www.ironicnet.com -- __ Ezequiel. Http://www.ironicnet.com
Re: [WKT] Spatial Searching
Does anyone know of a patch or even when this functionality might be included in to Solr4.0? I need to query for polygons ;-) check: http://code.google.com/p/lucene-spatial-playground/ This is my sketch / soon-to-be-proposal for what I think lucene spatial should look like. It includes a WKTField that can do complex geometry queries: https://lucene-spatial-playground.googlecode.com/svn/trunk/spatial-lucene/src/main/java/org/apache/lucene/spatial/search/jts/ ryan
Re: please make JSONWriter public
You may have noticed the ResponseWriter code is pretty hairy! Things are package protected so that the API can change between minor release without concern for back compatibility. In 4.0 (/trunk) I hope to rework the whole ResponseWriter framework so that it is more clean and hopefully stable enough that making parts public is helpful. For now, you can: - copy the code - put your class in the same package name - make it public in your own distribution ryan On Mon, Feb 28, 2011 at 2:56 PM, Paul Libbrecht p...@hoplahup.net wrote: Hello fellow SOLR experts, may I ask to make top-level and public the class org.apache.solr.request.JSONWriter inside org.apache.solr.request.JSONResponseWriter I am re-using it to output JSON search result to code that I wish not to change on the client but the current visibility settings (JSONWriter is package protected) makes it impossible for me without actually copying the code (which is possible thanks to the good open-source nature). thanks in advance paul
Re: Solr 4.0 trunk in production
Not crazy -- but be aware of a few *key* caviates. 1. Do good testing on a stable snapshot. 2. Don't get surprised if you have to rebuild the index from scratch to upgrade in the future. The official releases will upgrade smoothly -- but within dev builds, anything may happen. On Sat, Feb 19, 2011 at 9:50 AM, Mark static.void@gmail.com wrote: Would I be crazy even to consider putting this in production? Thanks
Re: boosting results by a query?
found something that works great! in 3.1+ we can sort by a function query, so: sort=query({!lucene v='field:value'}) desc, score desc will put everything that matches 'field:value' first, then order the rest by score check: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function On Fri, Feb 11, 2011 at 4:31 PM, Ryan McKinley ryan...@gmail.com wrote: I have an odd need, and want to make sure I am not reinventing a wheel... Similar to the QueryElevationComponent, I need to be able to move documents to the top of a list that match a given query. If there were no sort, then this could be implemented easily with BooleanQuery (i think) but with sort it gets more complicated. Seems like I need: sortSpec.setSort( new Sort( new SortField[] { new SortField( something that only sorts results in the boost query ), new SortField( the regular sort ) })); Is there an existing FieldComparator I should look at? Any other pointers/ideas? Thanks ryan
boosting results by a query?
I have an odd need, and want to make sure I am not reinventing a wheel... Similar to the QueryElevationComponent, I need to be able to move documents to the top of a list that match a given query. If there were no sort, then this could be implemented easily with BooleanQuery (i think) but with sort it gets more complicated. Seems like I need: sortSpec.setSort( new Sort( new SortField[] { new SortField( something that only sorts results in the boost query ), new SortField( the regular sort ) })); Is there an existing FieldComparator I should look at? Any other pointers/ideas? Thanks ryan
edismax with windows path input?
I am using the edismax query parser -- its awesome! works well for standard dismax type queries, and allows explicit fields when necessary. I have hit a snag when people enter something that looks like a windows path: lst name=params str name=qF:\path\to\a\file/str /lst this gets parsed as: str name=rawquerystringF:\path\to\a\file/str str name=querystringF:\path\to\a\file/str str name=parsedquery+()/str Putting it in quotes makes the not-quite right query: str name=rawquerystringF:\path\to\a\file/str str name=querystringF:\path\to\a\file/str str name=parsedquery +DisjunctionMaxQuery((path:f:pathtoafile^4.0 | name:f (pathtoafile fpathtoafile)^7.0)~0.01) /str str name=parsedquery_toString +(path_path:f:pathtoafile^4.0 | name:f (pathtoafile fpathtoafile)^7.0)~0.01 /str Telling people to escape the query: q=F\:\\path\\to\\a\\file is unrealistic, but gives the proper parsed query: +DisjunctionMaxQuery((path_path:f:/path/to/a/file^4.0 | name:f path to a (file fpathtoafile)^7.0)~0.01) Any ideas on how to support this? I could try looking for things like paths in the app, and then modify the query, or maybe look at extending edismax. Perhaps when F: does not match a given field, it could auto escape the rest of the word? thanks ryan
Re: edismax with windows path input?
ah -- that makes sense. Yonik... looks like you were assigned to it last week -- should I take a look, or do you already have something in the works? On Thu, Feb 10, 2011 at 2:52 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : extending edismax. Perhaps when F: does not match a given field, it : could auto escape the rest of the word? that's actually what yonik initially said it was suppose to do, but when i tried to add a param to let you control which fields would be supported using the : syntax i discovered it didn't work but oculdn't figure out why ... details are in the SOLR-1553 comments -Hoss
Re: edismax with windows path input?
foo_s:foo\-bar is a valid lucene query (with only a dash between the foo and the bar), and presumably it should be treated the same in edismax. Treating it as foo_s:foo\\-bar (a backslash and a dash between foo and bar) might cause more problems than it's worth? I don't think we should escape anything that has a valid field name. If foo_s is a field, then foo_s:foo\-bar should be used as is. If foo_s is not a field, I would want the whole thing escaped to: foo_s\:foo\\-bar before getting passed to the rest of the dismax mojo. Does that make sense? marking edismax as experimental for 3.1 makes sense! ryan
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
Where do you get your Lucene/Solr downloads from? [] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [X] I/we build them from source via an SVN/Git checkout.
Re: Different behavior for q=goo.com vs q=@goo.com in queries?
also try debugQuery=true and see why each result matched On Thu, Dec 30, 2010 at 4:10 PM, mrw mikerobertsw...@gmail.com wrote: Basically, just what you've suggested. I did the field/query analysis piece with verbose output. Not entirely sure how to interpret the results, of course. Currently reading anything I can find on that. Thanks Erick Erickson wrote: What steps have you taken to figure out whether the contents of your index are what you think? I suspect that the fields you're indexing aren't being analyzed/tokenized quite the way you expect either at query time or index time (or maybe both!). Take a look at the admin/analysis page for the field you're indexing the data into. If that doesn't shed any light on the problem, please paste in the fieldType definition for the field in question, maybe another set of eyes can see the issue. Best Erick -- View this message in context: http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2169478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: API for using Multi cores with SolrJ
On Mon, Oct 18, 2010 at 10:12 AM, Tharindu Mathew mcclou...@gmail.com wrote: Thanks Peter. That helps a lot. It's weird that this not documented anywhere. :( Feel free to edit the wiki :)
Re: how can i use solrj binary format for indexing?
Do you already have the files as solr XML? If so, I don't think you need solrj If you need to build SolrInputDocuments from your existing structure, solrj is a good choice. If you are indexing lots of stuff, check the StreamingUpdateSolrServer: http://lucene.apache.org/solr/api/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html On Sun, Oct 17, 2010 at 11:01 PM, Jason, Kim hialo...@gmail.com wrote: Hi all I have a huge amount of xml files for indexing. I want to index using solrj binary format to get performance gain. Because I heard that using xml files to index is quite slow. But I don't know how to use index through solrj binary format and can't find examples. Please give some help. Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/how-can-i-use-solrj-binary-format-for-indexing-tp1722612p1722612.html Sent from the Solr - User mailing list archive at Nabble.com.
query pending commits?
I have an indexing pipeline that occasionally needs to check if a document is already in the index (even if not commited yet). Any suggestions on how to do this without calling commit/ before each check? I have a list of document ids and need to know which ones are in the index (actually I need to know which ones are not in the index) I figured I would write a custome RequestHandler that would check the main Reader and the UpdateHander reader, but it now looks like 'update' is handled directly within IndexWriter. Any ideas? thanks ryan
Re: is indexing single-threaded?
Multiple threads work well. If you are using solrj, check the StreamingSolrServer for an implementation that will keep X number of threads busy. Your mileage will very, but in general I find a reasonable thread count is ~ (number of cores)+1 On Wed, Sep 22, 2010 at 5:52 AM, Andy angelf...@yahoo.com wrote: Does Solr index data in a single thread or can data be indexed concurrently in multiple threads? Thanks Andy
Re: How can I delete the entire contents of the index?
deletequery*:*/query/delete will leave you a fresh index On Thu, Sep 23, 2010 at 12:50 AM, xu cheng xcheng@gmail.com wrote: deletequerythe query that fetch the data you wanna delete/query/delete I did like this to delete my data best regards 2010/9/23 Igor Chudov ichu...@gmail.com Let's say that I added a number of elements to Solr (I use Webservice::Solr as the interface to do so). Then I change my mind and want to delete them all. How can I delete all contents of the database, but leave the database itself, just empty? Thanks i
Re: No more trunk support for 2.9 indexes
I suppose an index 'remaker' might be something like a DIH reader for a Solr index - streams everything out of the existing index, writing it into the new one? This works fine if all fields are stored (and copy field does not go to a stored field), otherwise you would need/want to start with the orignial source. ryan
Re: Logic behind Solr creating files in .../data/index path.
Check: http://lucene.apache.org/java/3_0_2/fileformats.html On Tue, Sep 7, 2010 at 3:16 AM, rajini maski rajinima...@gmail.com wrote: All, While we post data to Solr... The data get stored in //data/index path in some multiple files with different file extensions... Not worrying about the extensions, I want to know how are these number of files created ? Does anyone know on what logic are these multiple index files created in data/index path ... ? If we do an optimize , The number of files get reduced... Else, say some N number of files are created.. Based on what parameter it creates? And how are the sizes of file varies there? Hope I am clear about the doubt I have...
help refactoring from 3.x to 4.x
I have a function that works well in 3.x, but when I tried to re-implement in 4.x it runs very very slow (~20ms vs 45s on an index w ~100K items). Big picture, I am trying to calculate a bounding box for items that match the query. To calculate this, I have two fields bboxNS, and bboxEW that get filled with the min and max values for that doc. To get the bounding box, I just need the first matching term in the index and the last matching term. In 3.x the code looked like this: public class FirstLastMatchingTerm { String first = null; String last = null; public static FirstLastMatchingTerm read(SolrIndexSearcher searcher, String field, DocSet docs) throws IOException { FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm(); if( docs.size() 0 ) { IndexReader reader = searcher.getReader(); TermEnum te = reader.terms(new Term(field,)); do { Term t = te.term(); if( null == t || !t.field().equals(field) ) { break; } if( searcher.numDocs(new TermQuery(t), docs) 0 ) { firstLast.last = t.text(); if( firstLast.first == null ) { firstLast.first = firstLast.last; } } } while( te.next() ); } return firstLast; } } In 4.x, I tried: public class FirstLastMatchingTerm { String first = null; String last = null; public static FirstLastMatchingTerm read(SolrIndexSearcher searcher, String field, DocSet docs) throws IOException { FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm(); if( docs.size() 0 ) { IndexReader reader = searcher.getReader(); Terms terms = MultiFields.getTerms(reader, field); TermsEnum te = terms.iterator(); BytesRef term = te.next(); while( term != null ) { if( searcher.numDocs(new TermQuery(new Term(field,term)), docs) 0 ) { firstLast.last = term.utf8ToString(); if( firstLast.first == null ) { firstLast.first = firstLast.last; } } term = te.next(); } } return firstLast; } } but the results are slow (and incorrect). I tried some variations of using ReaderUtil.Gather(), but the real hit seems to come from if( searcher.numDocs(new TermQuery(new Term(field,term)), docs) 0 ) Any ideas? I'm not tied to the approach or indexing strategy, so if anyone has other suggestions that would be great. Looking at it again, it seems crazy that you have to run a query for each term, but in 3.x thanks ryan
Re: Problem in setting the request writer in SolrJ (wiki page wrong?)
Note that the 'setRequestWriter' is not part of the SolrServer API, it is on the CommonsHttpSolrServer: http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html#setRequestWriter%28org.apache.solr.client.solrj.request.RequestWriter%29 If you are using EmbeddedSolrServer, the params are not serialized via RequestWriter, so you don't have any options there. ryan On Mon, Aug 23, 2010 at 9:24 AM, Constantijn Visinescu baeli...@gmail.com wrote: Hello, I'm using an embedded solrserver in my Java webapp, but as far as i can tell it's defaulting to sending updates in XML, which seems like a huge waste compared to sending it in Java binary format. According to this page: http://wiki.apache.org/solr/Solrj#Setting_the_RequestWriter I'm supposed to be able to set the requestwriter like so: server.setRequestWriter(new BinaryRequestWriter()); However this method doesn't seem to exists in the SolrServer class of SolrJ 1.4.1 ? How do i set it to process updates in the java binary format? Thanks in advance, Constantijn Visinescu P.S. I'm creating my SolrServer instance like this: private SolrServer solrServer; CoreContainer container = new CoreContainer.Initializer().initialize(); solrServer = new EmbeddedSolrServer(container, ); this solrServer wont let me set a request writer.
Sort by index order desc?
Any pointers on how to sort by reverse index order? http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc it seems like it should be easy to do with the function query stuff, but i'm not sure what to sort by (unless I add a new field for indexed time) Any pointers? Thanks Ryan
Re: Sort by index order desc?
Looks like you can sort by _docid_ to get things in index order or reverse index order. ?sort=_docid_ asc thank you solr! On Fri, Jul 23, 2010 at 2:23 PM, Ryan McKinley ryan...@gmail.com wrote: Any pointers on how to sort by reverse index order? http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc it seems like it should be easy to do with the function query stuff, but i'm not sure what to sort by (unless I add a new field for indexed time) Any pointers? Thanks Ryan
Re: REST calls
If there is a real desire/need to make things restful in the official sense, it is worth looking at using a REST framework as the controller rather then the current solution. perhaps: http://www.restlet.org/ https://jersey.dev.java.net/ These would be cool since they encapsulate lots of the request plumbing work that it would be better if we could leverage more widely used approaches then support our own. That said, what we have is functional and powerful -- if you are concerned about people editing the index (with GET/POST or whatever) there are plenty of ways to solve this. ryan On Wed, Jun 30, 2010 at 5:31 PM, Lance Norskog goks...@gmail.com wrote: I've looked at the problem. It's fairly involved. It probably would take several iterations. (But not as many as field collapsing :) On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote: Apparently this is not ReStFuL It is IMVHO insane. Patches welcome... -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: Build query programmatically with lucene, but issue to solr?
Interesting -- I don't think there is anything that does this. Though it seems like something the XML Query syntax should be able to do, but we would still need to add the ability to send the xml style query to solr. On Fri, May 28, 2010 at 12:23 PM, Phillip Rhodes rhodebumpl...@gmail.com wrote: Hi. I am building up a query with quite a bit of logic such as parentheses, plus signs, etc... and it's a little tedious dealing with it all at a string level. I was wondering if anyone has any thoughts on constructing the query in lucene and using the string representation of the query to send to solr. Thanks, Phillip
Re: multicore Vs multiple solr webapps
The two approaches solve different needs. In 'multicore' you have a single webapp with multiple indexes. This means they are all running in the same JVM. This may be an advantage or a disadvantage depending on what you are doing. ryan On Thu, May 27, 2010 at 10:44 AM, Antonello Mangone antonello.mang...@gmail.com wrote: Hi to all, I have a question for you ... Can someone exaplain me the differences between a unique solr application multicore and multiple solr webapps ??? Thank you all in advance
Re: SolrJ/EmbeddedSolrServer
Check: http://wiki.apache.org/solr/CoreAdmin Unless I'm missing something, I think you should be able to sort what you need On Fri, May 21, 2010 at 7:55 PM, Ken Krugler kkrugler_li...@transpac.com wrote: I've got a situation where my data directory (a) needs to live elsewhere besides inside of Solr home, (b) moves to a different location when updating indexes, and (c) setting up a symlink from solr_home/data isn't a great option. So what's the best approach to making this work with SolrJ? The low-level solution seems to be - create my own SolrCore instance, where I specify the data directory - use that to update the CoreContainer - create a new EmbeddedSolrServer But recreating the EmbeddedSolrServer with each index update feels wrong, and I'd like to avoid mucking around with low-level SolrCore instantiation. Any other approaches? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Special Circumstances for embedded Solr
Any other commonly compelling reasons to use SolrJ? The most compelling reason (I think) is that if you program against the Solrj API, you can switch between embedded/http/streaming implementations without changing anything. This is great for our app that is either run as a small local instance of a big enterprise setting. ryan
Re: Moving from Lucene to Solr?
On Wed, May 19, 2010 at 6:38 AM, Peter Karich peat...@yahoo.de wrote: Hi all, while asking a question on stackoverflow [1] some other questions appear: Is SolrJ a recommended way to access Solr or should I prefer the HTTP interface? solrj vs HTTP interface? That will just be a matter of taste. If you are working in java, then solrj is likely a good option. How can I (j)unit-test Solr? (e.g. create+delete index via Java call) If you want to mess with creating/removing indexes at runtime, see: http://wiki.apache.org/solr/CoreAdmin Is Lucene faster than Solr? ... do you have experiences, preferable with the same index? solr is built ontop of lucene, so in that regard it is the same speed. Depending on your app, the abstractions that solr makes may make it less efficient then working directly in lucene. Unless you have very specialized needs, I doubt this will make a big difference.
Re: cheking the size of the index using solrj API's
On Fri, Apr 2, 2010 at 7:07 AM, Na_D nabam...@zaloni.com wrote: hi, I need to monitor the index for the following information: 1. Size of the index 2 Last time the index was updated. If by 'size o the index' you mean document count, then check the Luke Request Handler http://wiki.apache.org/solr/LukeRequestHandler ryan
Re: [POLL] Users of abortOnConfigurationError ?
The 'abortOnConfigurationError' option was added a long time ago... at the time, there were many errors that would just be written to the logs but startup would continue normally. I felt (and still do) that if there is a configuration error everything should fail loudly. The option in solrconfig.xml was added as a back-compatible way to get both behaviors. I don't see any value in letting solr continue working even though something was configured wrong. Does a lack replies to this thread imply that everyone agrees? (Reading the email, and following directions, i should just ignore this email) Ryan On Thu, Mar 18, 2010 at 9:12 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Due to some issues with the (lack of) functionality behind the abortOnConfigurationError option in solrconfig.xml, I'd like to take a quick poll of the solr-user community... * If you have never heard of the abortOnConfigurationError option prior to this message, please ignore this email. * If you have seen abortOnConfigurationError in solrconfig.xml, or in error messages when using Solr, but you have never modified the value of this option in your configs, or changed it at run time, please ignore this email. * If you have ever set abortOnConfigurationError=false, either in your config files or at run time, please reply to these three questions... 1) What version of Solr are you using ? 2) What advantages do you percieve that you have by setting abortOnConfigurationError=false ? 3) What problems do you suspect you would encounter if this option was eliminated in future versions of Solr ? Thank you. (For people who are interested, the impetuses for this Poll can be found in SOLR-1743, SOLR-1817, SOLR-1824, and SOLR-1832) -Hoss
Re: Interesting OutOfMemoryError on a 170M index
On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote: Agreed, commit every second. Do you need the index to be updated this often? Are you reading from it every second? and need results that are that fresh If not, i imagine increasing the auto-commit time to 1min or even 10 secs would help some. Re, calling commit from the client with auto-commit... if you are using auto-commit, you should not call commit from the client ryan Assuming I understand what you're saying correctly: There shouldn't be any index readers - as at this point, just writing to the index. Did I understand correctly what you meant? -Nick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 13 January 2010 22:28 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of- memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com wrote: Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick = = = Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = -- Lance Norskog goks...@gmail.com = = = = = = = = = == Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = = = = = = = ==
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 7, 2010, at 12:11 PM, MitchK wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? see erik's response on 'analysis request handler' ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolJ and query parameters
On Jan 7, 2010, at 1:05 PM, Jon Poulton wrote: I've also just noticed that QueryParsing is not in the SolrJ API. It's in one of the other Solr jar dependencies. I'm beginning to think that maybe the best approach it to write a query string generator which can generate strings of the form: q={!lucene q.op=AND df=text}myfield:foo +bar -baz Then just set this on a SolrQuery instance and send it over the wire. It not the kind of string you'd want an end user to have to type out. Yes, if you need to manipulate the local params, that seems like a good approach. Solrj was written before the local params syntax was introduced. A patch that adds LocalParams support to solrj would be welcome :) ryan
Re: Corrupted Index
what version of solr are you running? On Jan 7, 2010, at 3:08 PM, Jake Brownell wrote: Hi all, Our application uses solrj to communicate with our solr servers. We started a fresh index yesterday after upping the maxFieldLength setting in solrconfig. Our task indexes content in batches and all appeared to be well until noonish today, when after 40k docs, I started seeing errors. I've placed three stack traces below, the first occurred once and was the initial error, the second occurred a few times before the third started occurring on each request. I'd really appreciate any insight into what could have caused this, a missing file and then a corrupt index. If you know we'll have to nuke the entire index and start over I'd like to know that too-oddly enough searches against the index appear to be working. Thanks! Jake #1 January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1 January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ _fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ _fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update org.benetech.exception.WrappedException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) org .apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) org.apache.solr.client.solrj.SolrServer#commit(86) org.apache.solr.client.solrj.SolrServer#commit(75) org.bookshare.search.solr.SolrSearchServerWrapper#add(63) org.bookshare.search.solr.SolrSearchEngine#index(232) org .bookshare .service.task.SearchEngineIndexingTask#initialInstanceLoad(95) org.bookshare.service.task.SearchEngineIndexingTask#run(53) org.bookshare.service.scheduler.TaskWrapper#run(233) java.util.TimerThread#mainLoop(512) java.util.TimerThread#run(462) Caused by: solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update org.apache.solr.common.SolrException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) org .apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) org.apache.solr.client.solrj.SolrServer#commit(86) org.apache.solr.client.solrj.SolrServer#commit(75) org.bookshare.search.solr.SolrSearchServerWrapper#add(63) org.bookshare.search.solr.SolrSearchEngine#index(232) org .bookshare .service.task.SearchEngineIndexingTask#initialInstanceLoad(95) org.bookshare.service.task.SearchEngineIndexingTask#run(53) org.bookshare.service.scheduler.TaskWrapper#run(233) java.util.TimerThread#mainLoop(512) java.util.TimerThread#run(462) #2 January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1 January 7, 2010 12:10:10 PM CST org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update January 7, 2010 12:10:10 PM CST org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update org.benetech.exception.WrappedException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 6, 2010, at 3:48 PM, MitchK wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. The stored value is always the original input. The *indexed* values are transformed by analysis. If you really need to store the analyzed fields, that may be possible with an UpdateRequestProcessor. also see: https://issues.apache.org/jira/browse/SOLR-314 ryan
Re: how to do a Parent/Child Mapping using entities
Ya, structured data gets a little funny. For starters, the order of multi-valued fields should be maintained, so if you have: doc field name=urlhttp://aaa/field field name=url_rank5/field field name=urlhttp://bbb/field field name=url_rank4/field /doc the response will return result in order, so you can map them with array indicies. I have played some tricks with a JSON field analyzer that give you some more control. For example, if you index: doc field name=url{ url:http://host/;, rank:5 }/field /doc Then I use an analyzer that indexes the terms: url:http://host/ rank:5 I just posted SOLR-1690, if you want to take a look at that approach ryan On Dec 30, 2009, at 4:25 AM, magui wrote: Thanks Sascha for your post, but i find it interresting, but in my case i don't want to use an additionnal field, i want to be able with the same schema to do a simple query like : q=res_url:some url, and a query like the other one; in other word; is there any solution to make two or more multivalued fields in the same document linked with each other, e.g: in this result: - result name=response numFound=1 start=0 - doc str name=id1/str str name=keywordKey1/str - arr name=res_url strurl1/str strurl2/str strurl3/str strurl4/str /arr - arr name=res_rank str1/str str2/str str3/str str4/str /arr /doc /result i would like to make solr understand that for this document, value:url1 of res_url field is linked to value:1 of res_rank field, and all of them are linked to the commen field keyword. I think that i should use a custom field analyser or some thing like that; but i don't know what to do. but thanks for all; and any supplied help will be lovable. Sascha Szott wrote: Hi, you could create an additional index field res_ranked_url that contains the concatenated value of an url and its corresponding rank, e.g., res_rank + + res_url Then, q=res_ranked_url:1 url1 retrieves all documents with url1 as the first url. A drawback of this workaround is that you have to use a phrase query thus preventing wildcard searches for urls. -Sascha Hello everybody, i would like to know how to create index supporting a parent/child mapping and then querying the child to get the results. in other words; imagine that we have a database containing 2 tables:Keyword[id(int), value(string)] and Result[id(int), res_url(text), res_text(tex), res_date(date), res_rank(int)] For indexing, i used the DataImportHandler to import data and it works well, and my query response seems good:(q=*:*) (imagine that we have only this to keywords and their results) ?xml version=1.0 encoding=UTF-8 ? -response -lst name=responseHeader int name=status0/int int name=QTime0/int -lst name=params str name=q*:*/str /lst /lst -result name=response numFound=2 start=0 -doc str name=id1/str str name=keywordKey1/str -arr name=res_url strurl1/str strurl2/str strurl3/str strurl4/str /arr -arr name=res_rank str1/str str2/str str3/str str4/str /arr /doc -doc str name=id2/str str name=keywordKey2/str -arr name=res_url strurl1/str strurl5/str strurl8/str strurl7/str /arr -arr name=res_rank str1/str str2/str str3/str str4/str /arr /doc /result /response but the problem is when i tape a query kind of this:q=res_url:url2 AND res_rank:1 and this to say that i want to search for the keywords in which the url (url2) is ranked at the first position, i have a result like this: ?xml version=1.0 encoding=UTF-8 ? -response -lst name=responseHeader int name=status0/int int name=QTime0/int -lst name=params str name=qres_url:url2 AND res_rank:1/str /lst /lst -result name=response numFound=1 start=0 -doc str name=id1/str str name=keywordKey1/str -arr name=res_url strurl1/str strurl2/str strurl3/str strurl4/str /arr -arr name=res_rank str1/str str2/str str3/str str4/str /arr /doc /result /response But this is not true; because the url present in the 1st position in the results of the keyword key1 is url1 and not url2. So what i want to say is : is there any solution to make the values of the multivalued fields linked; so in our case we can see that the previous result say that: - url1 is present in 1st position of key1 results - url2 is present in 2nd position of key1 results - url3 is present in 3rd position of key1 results - url4 is present in 4th position of key1 results and i would like that solr consider this when executing queries. Any helps please; and thanks for all :) -- View this message in context: http://old.nabble.com/how-to-do-a-Parent-Child-Mapping-using-entities-tp26956426p26965478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR or Hibernate Search?
If you need to search via the Hibernate API, then use hibernate search. If you need a scaleable HTTP (REST) then solr may be the way to go. Also, i don't think hibernate has anything like the faceting / complex query stuff etc. On Dec 29, 2009, at 3:25 PM, Márcio Paulino wrote: Hey Everyone! I was make a comparison of both technologies (SOLR AND Hibernate Search) and i see many things are equals. Anyone could told me when i must use SOLR and when i must use Hibernate Search? Im my project i will have: 1. Queries for indexed fields (Strings) and for not indexed Fields (Integer, Float, Date). [In Hibernate Search on in SOLR, i must search on index and, with results of query, search on database (I can't search in both places ate same time).] I Will Have search like: Give me all Register Where Value 190 And Name Contains = 'JAVA' 2. My client need process a lot of email (20.000 per day) and i must indexed all fields (excluded sentDate ) included Attachments, and performance is requirement of my System 3. My Application is multiclient, and i need to separate the index by client. In this Scenario, whats the best solution? SOLR or HIbernateSearch I See SOLR is a dedicated server and has a good performance test. I don't see advantages to use hibernate-search in comparison with SOLR (Except the fact of integrate with my Mapped Object) Thanks for Help -- att, ** Márcio Paulino Campo Grande - MS MSN / Gtalk: mcopaul...@gmail.com ICQ: 155897898 **
Re: logger in embedded solr
check: http://wiki.apache.org/solr/SolrLogging if you are using 1.4 you want to drop in the slf4j-log4j jar file and then it should read your log4j configs On Nov 19, 2009, at 2:15 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] wrote: Hi all, I have an J2EE application using embedded solr via solr4j. It seems the logging that SOLR produces has a mind of its own, and is not changeable via my log4j.properties. In fact I know this because I wired in a Log4J config listener in my web.xml and redirected all my logs to a custom location. Which works, but now all my messages go to the custom location and all the embedded SOLR messages are still going into catalina.out. How can I get access to the logger of the Embedded SOLR. Thanks, Tim Harsch Sr. Software Engineer Perot Systems
Re: Missing slf4j jar in solr 1.4.0 distribution?
Solr includes slf4j-jdk14-1.5.5.jar, if you want to use the nop (or log4j, or loopback) impl you will need to include that in your own project. Solr uses slf4j so that each user can decide their logging implementation, it includes the jdk version so that something works off-the-shelf, but if you want more control, then you can switch in whatever you want. ryan On Nov 18, 2009, at 1:22 AM, Per Halvor Tryggeseth wrote: Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar file missing in solrj-lib, so I suggest that it should be included in the next release. Per Halvor -Opprinnelig melding- Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sendt: 17. november 2009 20:51 Til: 'solr-user@lucene.apache.org' Emne: Re: Missing slf4j jar in solr 1.4.0 distribution? : I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a : required slf4j jar was missing in the distribution (i.e. : apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError: : org/slf4j/impl/StaticLoggerBinder when using solrj ... : Have I overlooked something or are not all necessary classes required : for using solrj in solr 1.4.0 included in the distribution? Regretably, Solr releases aren't particularly consistent about where third-party libraries can be found. If you use the the pre-built war, the 'main' dependencies are allready bunlded into it. If you want to roll your own, you need to look at the ./lib directory -- ./dist is only *suppose* to contain the artifacts built from solr source But that solrj-lib directory can be confusing)... hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-* lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar -Hoss
Re: The status of Local/Geo/Spatial/Distance Solr
It looks like solr+spatial will get some attention in 1.5, check: https://issues.apache.org/jira/browse/SOLR-1561 Depending on your needs, that may be enough. More robust/scaleable solutions will hopefully work their way into 1.5 (any help is always appreciated!) On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr ) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 221) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 302) at org .apache .catalina .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4222) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access $0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase $PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 544) at org .apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 626) at org .apache .catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 443) at org.apache.catalina.core.StandardService.start(StandardService.java: 448) at org.apache.catalina.core.StandardServer.start(StandardServer.java: 700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect
Re: The status of Local/Geo/Spatial/Distance Solr
Also: https://issues.apache.org/jira/browse/SOLR-1302 On Nov 13, 2009, at 11:12 AM, Bertie Shen wrote: Hey, I am interested in using LocalSolr to go Local/Geo/Spatial/Distance search. But the wiki of LocalSolr(http://wiki.apache.org/solr/LocalSolr ) points to pretty old documentation. Is there a better document I refer to for the setting up of LocalSolr and some performance analysis? Just sync-ed Solr codebase and found LocalSolr is still NOT in the contrib package. Do we have a plan to incorporate it? I download a LocalSolr lib localsolr-1.5.jar from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and notice that the namespace is com.pjaol.search. blah blah, while LocalLucene package is in Lucene codebase and the package name is org.apache.lucene.spatial blah blah. But localsolr-1.5.jar from from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ does not work with lucene-spatial-3.0-dev.jar I build from Lucene codebase directly. After I restart tomcat, I could not load solr admin page. The error is as follows. It looks solr is still looking for old named classes. Thanks. HTTP Status 500 - Severe errors in solr configuration. Check your log files for more detailed information on what may be wrong. If you want solr to continue after configuration errors, change: abortOnConfigurationErrorfalse/abortOnConfigurationError in null - java.lang.NoClassDefFoundError: com/pjaol/search/geo/utils/DistanceFilter at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org .apache .solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java: 833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer $Initializer.initialize(CoreContainer.java:137) at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 83) at org .apache .catalina .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java: 221) at org .apache .catalina .core .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java: 302) at org .apache .catalina .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:78) at org .apache .catalina.core.StandardContext.filterStart(StandardContext.java:3635) at org.apache.catalina.core.StandardContext.start(StandardContext.java: 4222) at org .apache .catalina.core.ContainerBase.addChildInternal(ContainerBase.java:760) at org.apache.catalina.core.ContainerBase.access $0(ContainerBase.java:744) at org.apache.catalina.core.ContainerBase $PrivilegedAddChild.run(ContainerBase.java:144) at java.security.AccessController.doPrivileged(Native Method) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java: 738) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java: 544) at org .apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java: 626) at org .apache .catalina.startup.HostConfig.deployDescriptors(HostConfig.java:553) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java: 488) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1138) at org .apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java: 311) at org .apache .catalina .util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:120) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1022) at org.apache.catalina.core.StandardHost.start(StandardHost.java:736) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java: 1014) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java: 443) at org.apache.catalina.core.StandardService.start(StandardService.java: 448) at org.apache.catalina.core.StandardServer.start(StandardServer.java: 700) at org.apache.catalina.startup.Catalina.start(Catalina.java:552) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:295) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597)
Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter
The HTMLStripCharFilter will strip the html for the *indexed* terms, it does not effect the *stored* field. If you don't want html in the stored field, can you just strip it out before passing to solr? On Nov 11, 2009, at 8:07 PM, aseem cheema wrote: Hey Guys, How do I add HTML/XML documents using SolrJ such that it does not by pass the HTML char filter? SolrJ escapes the HTML/XML value of a field, and that make it bypass the HTML char filter. For example centercontent/center if added to a field with HTMLStripCharFilter on the field using SolrJ, is not stripped of center tags. But if check in analysis.jsp, it does get stripped. When I look at the SolrJ XML feed, it looks like this: adddoc boost=1.0field name=idhttp://haha.com/fieldfield name=textlt;centergt;contentlt;/centergt;/field/doc/add Any help is highly appreciated. Thanks. -- Aseem
Re: Problems downloading lucene 2.9.1
On Nov 2, 2009, at 8:29 AM, Grant Ingersoll wrote: On Nov 2, 2009, at 12:12 AM, Licinio Fernández Maurelo wrote: Hi folks, as we are using an snapshot dependecy to solr1.4, today we are getting problems when maven try to download lucene 2.9.1 (there isn't a any 2.9.1 there). Which repository can i use to download it? They won't be there until 2.9.1 is officially released. We are trying to speed up the Solr release by piggybacking on the Lucene release, but this little bit is the one downside. Until then, you can add a repo to: http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven/
Re: Programmatically configuring SLF4J for Solr 1.4?
I'm sure it is possible to configure JDK logging (java.util.loging) programatically... but I have never had much luck with it. It is very easy to configure log4j programatically, and this works great with solr. To use log4j rather then JDK logging, simply add slf4j- log4j12-1.5.8.jar (from http://www.slf4j.org/download.html) to your classpath ryan On Nov 1, 2009, at 11:05 PM, Don Werve wrote: So, I've spent a bit of the day banging my head against this, and can't get it sorted. I'm using a DirectSolrConnection embedded in a JRuby application, and everything works great, except I can't seem to get it to do anything except log to the console. I've tried pointing 'java.util.logging.config.file' to a properties file, as well as specifying a logfile as part of the constructor for DirectSolrConnection, but so far, nothing has really worked. What I'd like to do is programmatically direct the Solr logs to a logfile, so that I can have my app start up, parse its config, and throw the Solr logs where they need to go based on that. So, I don't suppose anybody has a code snippet (in Java) that sets up SLF4J for Solr logging (and that doesn't reference an external properties file)? Using the latest (1 Nov 2009) nightly build of Solr 1.4.0-dev
Re: (Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?
I wonder why the common classes are in the solrj JAR? Is the solrj JAR not just for the clients? the solr server uses solrj for distributed search. This makes solrj the general way to talk to solr (even from within solr)
releasing memory?
Hello- I have an application that can run in the background on a user Desktop -- it will go through phases of being used and not being used. I want to be able to free as many system resources when not in use as possible. Currently I have a timer that wants for 10 mins of inactivity and releases a bunch of memory (unrelated to lucene/solor). Any suggestion on the best way to do this in lucene/solor? perhaps reload a core? thanks for any pointers ryan
Re: Solrj possible deadlock
do you have anything custom going on? The fact that the lock is in java2d seems suspicious... On Sep 23, 2009, at 7:01 PM, pof wrote: I had the same problem again yesterday except the process halted after about 20mins this time. pof wrote: Hello, I was running a batch index the other day using the Solrj EmbeddedSolrServer when the process abruptly froze in it's tracks after running for about 4-5 hours and indexing ~400K documents. There were no document locks so it would seem likely that there was some kind of thread deadlock. I was hoping someone might be able to tell me some information about the following thread dump taken at the time: Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode): DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition [0x..0x0018a044] java.lang.Thread.State: RUNNABLE Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in Object.wait() [0x0311d000..0x0311def4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue $Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 133) - locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 149) at sun.java2d.Disposer.run(Disposer.java:143) at java.lang.Thread.run(Thread.java:636) pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on condition [0x08a6a000..0x08a6b074] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x967acfd0 (a java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer $ConditionObject.await(AbstractQueuedSynchronizer.java:1978) at java .util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java: 386) at java .util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 1043) at java .util .concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 1103) at java.util.concurrent.ThreadPoolExecutor $Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5 runnable [0x..0x] java.lang.Thread.State: RUNNABLE CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on condition [0x..0x096a7af4] java.lang.Thread.State: RUNNABLE Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting on condition [0x..0x] java.lang.Thread.State: RUNNABLE Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait() [0x005ca000..0x005caef4] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue $Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 133) - locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 149) at java.lang.ref.Finalizer $FinalizerThread.run(Finalizer.java:177) Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in Object.wait() [0x00579000..0x00579d74] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) - locked 0x966e6dc8 (a java.lang.ref.Reference$Lock) VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on condition JNI global references: 1072 Heap def new generation total 36288K, used 23695K [0x93f1, 0x9667, 0x9667) eden space 32256K, 73% used [0x93f1, 0x95633f60, 0x95e9) from space 4032K, 0% used [0x95e9, 0x95e9, 0x9628) to space 4032K, 0% used [0x9628, 0x9628, 0x9667) tenured generation total 483968K, used 72129K [0x9667, 0xb3f1, 0xb3f1) the space 483968K, 14% used [0x9667, 0x9ace04b8, 0x9ace0600, 0xb3f1) compacting perm gen total 23040K, used 22983K [0xb3f1, 0xb559, 0xb7f1) the space 23040K, 99% used [0xb3f1, 0xb5581ff8, 0xb5582000, 0xb559) No shared spaces configured. Cheers. Brett. -- View this message in context: http://www.nabble.com/Solrj-possible-deadlock-tp25530146p25531321.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr SVN build problem
Should be fixed in trunk. Try updating and see if it works for you See: https://issues.apache.org/jira/browse/SOLR-1424 On Sep 9, 2009, at 8:12 PM, Allahbaksh Asadullah wrote: Hi , I am building Solr from source. During building it from source I am getting following error. generate-maven-artifacts: [mkdir] Created dir: c:\Downloads\solr_trunk\build\maven [mkdir] Created dir: c:\Downloads\solr_trunk\dist\maven [copy] Copying 1 file to c:\Downloads\solr_trunk\build\maven\c:\Downloads\s olr_trunk\src\maven BUILD FAILED c:\Downloads\solr_trunk\build.xml:741: The following error occurred while execut ing this line: c:\Downloads\solr_trunk\common-build.xml:261: Failed to copy c:\Downloads\solr_t runk\src\maven\solr-parent-pom.xml.template to c:\Downloads\solr_trunk\build\mav en\c:\Downloads\solr_trunk\src\maven\solr-parent-pom.xml.template due to java.io .FileNotFoundException c:\Downloads\solr_trunk\build\maven\c:\Downloads\solr_tru nk\src\maven\solr-parent-pom.xml.template (The filename, directory name, or volu me label syntax is incorrect) Regards, Allahbaksh
Re: If field A is empty take field B. Functionality available?
can you just add a new field that has the real or ave price? Just populate that field at index time... make it indexed but not stored If you want the real or average price to be treated the same in faceting, you are really going to want them in the same field. On Aug 28, 2009, at 1:16 PM, Britske wrote: I have 2 fields: realprice avgprice I'd like to be able to take the contents of avgprice if realprice is not available. due to design the average price cannot be encoded in the 'realprice'- field. Since I need to be able to filter, sort and facet on these fields, it would be really nice to be able to do that just on something like a virtual-field called 'price' or something. That field should contain the conditional logic to know from which actual field to take the contents from. I was looking at using functionqueries, but to me knowledge these can't be used to filter and facet on. Would creating a custom field work for this or does a field know nothing from its sibling-fields? What would performance impact be like, since this is really important in this instance. Any better ways? Subclassing standardrequestHandler and hacking it all together seems rather ugly to me, but if it's needed... Thanks, Geert-Jan -- View this message in context: http://www.nabble.com/If-field-A-is-empty-take-field-B.-Functionality-available--tp25193668p25193668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why isn't this working?
On Aug 27, 2009, at 10:35 PM, Paul Tomblin wrote: Yesterday or the day before, I asked specifically if I would need to restart the Solr server if somebody else loaded data into the Solr index using the EmbeddedServer, and I was told confidently that no, the Solr server would see the new data as soon as it was committed. So today I fired up the Solr server (and after making apache-tomcat-6.0.20/solr/data a symlink to where the Solr data really lives and restarting the web server), and did some queries. Then I ran a program that loaded a bunch of data and committed it. Then I did the queries again. And the new data is NOT showing. Using Luke, I can see 10022 documents in the index, but the Solr statistics page (http://localhost:8080/solrChunk/admin/stats.jsp) is still showing 8677, which is how many there were before I reloaded the data. So am I doing something wrong, or was the assurance I got yesterday that this is possible wrong? did not follow the advice from yesterday... but... the commit word can be a but misleading, it could also be called reload Say you have an embedded solr server and an http solr server pointed to the same location. 1. make sure only is read only! otherwise you can make a mess. 2. calling commit on the embedded solr instance, will not have any effect on the http instance UNTIL you call commit (reload) on the http instance. ryan
Re: ${solr.abortOnConfigurationError:false} - does it defaults to false
On Aug 26, 2009, at 3:33 PM, djain101 wrote: I have one quick question... If in solrconfig.xml, if it says ... abortOnConfigurationError${solr.abortOnConfigurationError:false}/ abortOnConfigurationError does it mean abortOnConfigurationError defaults to false if it is not set as system property? correct
Re: Solr-773 (GEO Module) question
On Aug 19, 2009, at 6:45 AM, johan.sjob...@findwise.se wrote: Hi, we're glancing at the GEO search module known from the jira issue 773 (http://issues.apache.org/jira/browse/SOLR-773). It seems to us that the issue is still open and not yet included in the nightly builds. correct Is there a release plan for the nightly builds, and is this module considered core or contrib? activity on the nightly builds is winding down as we gear up for the 1.4 release. After 1.4 is out, I expect progress on the geo stuff. It will be in contrib (not core) and will likely be marked experimental for a while. That is, stuff will be added without the expectation that the interfaces will be set in stone. best ryan
Re: Posting data in JSON
check: https://issues.apache.org/jira/browse/SOLR-945 this will not likely make it into 1.4 On Jul 30, 2009, at 1:41 PM, Jérôme Etévé wrote: Hi, Nope, I'm not using solrj (my client code is in Perl), and I'm with solr 1.3. J. 2009/7/30 Shalin Shekhar Mangar shalinman...@gmail.com: On Thu, Jul 30, 2009 at 8:31 PM, Jérôme Etévé jerome.et...@gmail.com wrote: Hi All, I'm wondering if it's possible to post documents to solr in JSON format. JSON is much faster than XML to get the queries results, so I think it'd be great to be able to post data in JSON to speed up the indexing and lower the network load. If you are using Java,Solrj on 1.4 (trunk), you can use the binary format which is extremely compact and efficient. Note that with Solr/Solrj 1.3, binary became the default response format for Solrj clients. -- Regards, Shalin Shekhar Mangar. -- Jerome Eteve. Chat with me live at http://www.eteve.net jer...@eteve.net
Re: LocalSolr - order of fields on xml response
ya... 'expected', but perhaps not ideal. As is, LocalSolr munges the document on its way out the door to add the distance. When LocalSolr makes it into the source, it will likely use a method like: https://issues.apache.org/jira/browse/SOLR-705 to augment each document with the calculated distance. This will at least have consistent behavior. On Jul 22, 2009, at 10:47 AM, Daniel Cassiano wrote: Hi folks, When I do some query with LocalSolr to get the geo_distance, the order of xml fields is different of a standard query. It's a simple query, like this: http://myhost.com:8088/solr/core/select?qt=geox=-46.01y=-23.01radius=15sort=geo_distanceascq=*:* Is this an expected behavior of LocalSolr? Thanks! -- Daniel Cassiano _ http://www.apontador.com.br/ http://www.maplink.com.br/
Re: Solr JMX and Cacti
On Jul 20, 2009, at 8:47 AM, Edward Capriolo wrote: Hey all, We have several deployments of Solr across our enterprise. Our largest one is a several GB and when enough documents are added an OOM exception is occurring. To debug this problem I have enable JMX. My goal is to write some cacti templates similar to the ones I have done for hadoop. http://www.jointhegrid.com/hadoop/. The only cacti template for solr I have found is old, broken and is using curl and PHP to try and read the values off the web interface. I have a few general questions/comments and also would like to know how others are dealing with this. 1) SNMP has counters/gauges. With JMX it is hard to know what a variable is without watching it for a while. Some fields are obvious, (total_x) (cumulative_x) it is worth wild to add some notes in the MBEAN info to say works like counter works like gauge. This way a network engineer like me does not have to go code surfing to figure out how to graph them. Has anyone written up a list of what the attributes are, types, and what they mean? 2) The values that are not counter style I am assuming are sampled, what is the sampling rate and is it adjustable? Any tips are helpful. Thank you, Check: http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/RequestHandlerBase.java For cacti, you should probably ignore the two 'rate' based calculations as they are just derivatives: lst.add(avgTimePerRequest, (float) totalTime / (float) this.numRequests); lst.add(avgRequestsPerSecond, (float) numRequests*1000 / (float)(System.currentTimeMillis()-handlerStart));
Re: SolrJ embedded server : error while adding document
not sure what you mean... yes, i guess... you send a bunch of requests with add( doc/collection ) and they are not visible until you send commit() On Jul 20, 2009, at 9:07 AM, Gérard Dupont wrote: my mistake, pb with the buffer I added. But it raises a question : does solr (using embedded server) has its own buffer mechanism in indexing or not ? I guess not but I might be wrong. 2009/7/20 Gérard Dupont ger.dup...@gmail.com Hi SolR guys, I'm starting to play with SolR after few years with classic Lucene. I'm trying to index a single document using the embedded server, but I got a strange error which looks like XML parsing problem (see trace hereafter). To add details, this is a simple Junit which create single document then pass it to the server in a ArraylistSolrInputDocument. The document only have 2 fields id and text as it is described in the configuration. ul 20, 2009 5:50:50 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: missing content stream at org .apache .solr .handler .XmlUpdateRequestHandler .handleRequestBody(XmlUpdateRequestHandler.java:114) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org .apache .solr .client .solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java: 147) at org .apache .solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java: 217) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48) at org .weblab_project .services.solr.SolrComponent.flushIndexBuffer(SolrComponent.java:132) at org .weblab_project .services .solr.SolrComponentTest.testAddOneDocument(SolrComponentTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org .eclipse .jdt .internal .junit .runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130) at org .eclipse .jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38) at org .eclipse .jdt .internal .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460) at org .eclipse .jdt .internal .junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673) at org .eclipse .jdt .internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java: 386) at org .eclipse .jdt .internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java: 196) Jul 20, 2009 5:50:50 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=/update params={} status=500 QTime=6 Cannot flush the index buffer : Server error while adding documents -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory -- Gérard Dupont Information Processing Control and Cognition (IPCC) - EADS DS http://weblab-project.org Document Learning team - LITIS Laboratory
Luke / get doc count for each term
Hi- I'm trying to use the LukeRequestHandler with an index of ~9 million docs. I know that counting the top / distinct terms for each field is expensive and can take a LONG time to return. Is there a faster way to check the number of documents for each field? Currently this gets the doc count for each term: if( sfield != null sfield.indexed() ) { Query q = qp.parse( fieldName+:[* TO *] ); int docCount = searcher.numDocs( q, matchAllDocs ); ... Looking at it again, that could be replaced with: if( sfield != null sfield.indexed() ) { Query q = qp.parse( fieldName+:[* TO *] ); int docCount = searcher.getDocSet( q ).size(); ... Is there any faster option then running a query for each field? thanks ryan
Re: Luke / get doc count for each term
On Jun 16, 2009, at 5:21 PM, Grant Ingersoll wrote: On Jun 16, 2009, at 1:57 PM, Ryan McKinley wrote: Is there a faster way to check the number of documents for each field? Currently this gets the doc count for each term: In the past, I've created a field that contains the names of the Fields present on the document. Then, simply facet on the new Field. I think that gets you what you want and the mechanism is all built in to Solr and is quite speedy. makes sense -- i like this idea. ryan
filter on millions of IDs from external query
I am working with an in index of ~10 million documents. The index does not change often. I need to preform some external search criteria that will return some number of results -- this search could take up to 5 mins and return anywhere from 0-10M docs. I would like to use the output of this long running query as a filter in solr. Any suggestions on how to wire this all together? My initial ideas (I have not implemented anything yet -- just want to check with you all before starting down the wrong path) is to: * assume the index will always be optimized, in this case every id maps to a lucene int id. * Store the results of the expensive query as a bitset. * use the stored bitset in the lucene query. I'm sure I can get this to work, but it seems kinda ugly (and brittle). Any better thoughts on how to do this? If we had some sort of external tagging interface, each document could just get tagged with what query it matches. thanks ryan
Re: When searching for !...@#$%^*() all documents are matched incorrectly
two key things to try (for anyone ever wondering why a query matches documents) 1. add debugQuery=true and look at the explain text below -- anything that contributed to the score is listed there 2. check /admin/analysis.jsp -- this will let you see how analyzers break text up into tokens. Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has something to do with it... On Sat, May 30, 2009 at 5:59 PM, Sam Michaels mas...@yahoo.com wrote: Hi, I'm running Solr 1.3/Java 1.6. When I run a query like - (activity_type:NAME) AND title:(\...@#$%\^\*\(\)) all the documents are returned even though there is not a single match. There is no title that matches the string (which has been escaped). My document structure is as follows doc str name=activity_typeNAME/str str name=titleBathing/str /doc The title field is of type text_title which is described below. fieldType name=text_title class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType When I run the query against Luke, no results are returned. Any suggestions are appreciated. -- View this message in context: http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23797731.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: getting all rows from SOLRJ client using setRows method
careful what you ask for... what if you have a million docs? will you get an OOM? Maybe a better solution is to run a loop where you grab a bunch of docs and then increase the start value. but you can always use: query.setRows( Integer.MAX_VALUE ) ryan On May 21, 2009, at 8:37 PM, darniz wrote: Hello is there a way you can get all the results back from SOLR when querying solrJ client my gut feeling was that this might work query.setRows(-1) The way is to change the configuration xml file, but that like hard coding the configuration, and there also i have to set some valid number, i cant say return all rows. Is there a way to done through query. Thanks rashid -- View this message in context: http://www.nabble.com/getting-all-rows-from-SOLRJ-client-using-setRows-method-tp23662668p23662668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrieve all available Cores in a static way ?
I cringe to suggest this but you can use the deprecated call: SolrCore.getSolrCore().getCoreContainer() On May 19, 2009, at 11:21 AM, Giovanni De Stefano wrote: Hello all, I have a quick question but I cannot find a quick answer :-) I have a Java client running on the same JVM where Solr is running. The Solr I have is a multicore. How can I retrieve from the Java client the different cores available? I tried with: ... CoreContainer container = new CoreContainer(); CollectionSolrCore cores = container.getCores(); ... but I get nothing useful... :-( Is there any static method that lets me get this collection? Thanks a lot! Giovanni
Re: multicore for 20k users?
since there is so little overlap, I would look at a core for each user... However, to manage 20K cores, you will not want to use the off the shelf core management implementation to maintain these cores. Consider overriding SolrDispatchFilter to initialize a CoreContainer that you manage. On May 17, 2009, at 10:11 PM, Chris Cornell wrote: On Sun, May 17, 2009 at 8:38 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Chris, Yes, disk space is cheap, and with so little overlap you won't gain much by putting everything in a single index. Plus, when each user has a separate index, it's easy to to split users and distribute over multiple machines if you ever need to do that, it's easy and fast to completely reindex one user's data without affecting other users, etc. Several years ago I built Simpy at http://www.simpy.com/ that way (but pre-Solr, so it uses Lucene directly) and never regretted it. There are way more than 20K users there with many searches per second and with constant indexing. Each user has an index for bookmarks and an index for notes. Each group has its own index, shared by all group members. The main bookmark search is another index. People search is yet another index. And so on. Single server. Thankyou very much for your insight and experience, sounds like we shouldn't be thinking about prematurely optimizing this. Has someone actually used multicore this way, though? With thousands of them? Independently of advice in that regard, I guess our next step is to explore and create some dummy scenarios/tests to try and stress multicore (search latency is not as much of a factor as memory usage is). I'll report back on any conclusion we come to. Thanks! Chris
Re: multicore for 20k users?
how much overlap is there with the 20k user documents? if you create a separate index for each of them will you be indexing 90% of the documents 20K times? How many total documents could an individual user typically see? How many total distinct documents are you talking about? Is the indexing strategy the same for all users? (the same analysis etc) Is it actually possible to limit visibility by role rather then user? I would start with trying to put everything in one index -- if that is not possible, then look at a multi-core option. On May 17, 2009, at 5:53 PM, Chris Cornell wrote: Trying to create a search solution for about 20k users at a company. Each person's documents are private and different (some overlap... it would be nice to not have to store/index copies). Is multicore something that would work or should we auto-insert a facet into each query generated by the person? Thanks for any advice, I am very new to solr. Any tiny push in the right direction would be appreciated. Thanks, Chris
Re: CommonsHttpSolrServer vs EmbeddedSolrServer
right -- which one you pick will depend more on your runtime environment then anything else. If you need to hit a server (on a different machine) CommonsHttpSolrServer is your only option. If you are running an embedded application -- where your custom code lives in the same JVM as solr -- you can use EmbeddedSolrServer. The nice thing is that since they are the same interface, you can change later. The performance comments on the wiki can be a bit misleading -- yes, in some cases embedded could be faster, but that may depend on how you are sending things -- are you sending 1000s of single document requests really fast? If so, try sending a bunch of documents together in one request. Also consider using the StreamingHttpSolrServer (https://issues.apache.org/jira/browse/SOLR-906 ) -- it has a few quirks, but can be much faster. In any case, as long as you program against the SolrServer interface, then you could swap the implementation as needed. ryan On May 14, 2009, at 3:35 PM, Eric Pugh wrote: CommonsHttpSolrServer is how you access Solr from a Java client via HTTP. You can connect to a Solr running anywhere EmbeddedSolrServer starts up Solr internally, and connects directly, all in a single JVM... Embedded may be faster, the jury is out, but you have to have your Solr server and your Solr client on the same box... Unless you really need it, I would start with CommonsHttpSolrServer, it's easier to configure and get going with and more flexible. Eric On May 14, 2009, at 1:30 PM, sachin78 wrote: What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer. Which is the preferred server to use? In some blog i read that EmbeddedSolrServer is 50% faster than CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer. Can anyone please guide me the right path/way.So that i pick the right implementation. Thanks in advance. --Sachin -- View this message in context: http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html Sent from the Solr - User mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Does solrj return result in XML format? If not then how to make it do that.
use this constructor: public CommonsHttpSolrServer(String solrServerUrl, HttpClient httpClient, ResponseParser parser) throws MalformedURLException { this(new URL(solrServerUrl), httpClient, parser, false); } and give it the XMLResponseParser -- - - - Is this just helpful for debugging with packet sniffing? the XML format will be a bit slower then the binary format. ryan On May 4, 2009, at 8:22 AM, Erik Hatcher wrote: Just out of curiosity, what's the use case for getting the result back in XML from SolrJ? Erik On May 4, 2009, at 8:13 AM, ahmed baseet wrote: Can we get the results as received by Solrj in XML format? If yes how to do that. I think there must be some way to make solrj returns results in XML format. I need some pointers in this direction. As I know solrs returns the result in solrdocument format that we've to iterate to extract the fields. Thank you. --Ahmed.
Re: Does solrj return result in XML format? If not then how to make it do that.
The point of using solrj is that you don't have to do any parsing yourself -- you get access to the results in object form. If you need to do parsing, just grab the xml directly: http://host/solr/select?q=*:*wt=xml On May 4, 2009, at 9:36 AM, ahmed baseet wrote: As I know when we query solr from solr admin interface we get back the results in xml format, so thought there must be something similar for solrj as well, which I'll make to go thru an xml parser at the other end and display all the results in the browser. Otherwise I've to iterate the solrdocumentlist and create a list[may be] to put the results and return it back to the browser which will handle displaying that list/map etc. --Ahmed. On Mon, May 4, 2009 at 5:52 PM, Erik Hatcher e...@ehatchersolutions.com wrote: Just out of curiosity, what's the use case for getting the result back in XML from SolrJ? Erik On May 4, 2009, at 8:13 AM, ahmed baseet wrote: Can we get the results as received by Solrj in XML format? If yes how to do that. I think there must be some way to make solrj returns results in XML format. I need some pointers in this direction. As I know solrs returns the result in solrdocument format that we've to iterate to extract the fields. Thank you. --Ahmed.
Re: How to index the contents from SVN repository
I would suggest looking at Apache commons VFS and using the solrj API: http://commons.apache.org/vfs/ With SVN, you may be able to use the webdav provider. ryan On Apr 26, 2009, at 4:08 AM, Ashish P wrote: Is there any way to index contents of SVN rep in Solr ?? -- View this message in context: http://www.nabble.com/How-to-index-the-contents-from-SVN-repository-tp23240110p23240110.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Access HTTP headers from custom request handler
Right, you will have to build a new war with your own subclass of SolrDispatchFilter *rather* then using the packaged one. On Apr 23, 2009, at 12:34 PM, Noble Paul നോബിള് नोब्ळ् wrote: nope. you must edit the web.xml and register the filter there On Thu, Apr 23, 2009 at 3:45 PM, Giovanni De Stefano giovanni.destef...@gmail.com wrote: Hello Hoss, thank you for your reply. I have no problems subclassing the SolrDispatchFilter...but where shall I configure it? :-) I cannot find any doc/wiki explaining how to configure a custom dispatch filter. I believe it should be in solrconfig.xml requestDispatcher ... ... /requestDispatcher Any idea? Is there a schema for solrconfig.xml? It would make my life easier... ;-) Thanks, Giovanni On Wed, Apr 15, 2009 at 12:48 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Solr cannot assume that the request would always come from http (think : of EmbeddedSolrServer) .So it assumes that there are only parameters exactly. : Your best bet is to modify SolrDispatchFilter and readthe params and : set them in the SolrRequest Object SolrDispatchFilter is designed to be subclassed to make this easy by overriding the execute method... protected void execute( HttpServletRequest req, SolrRequestHandler handler, SolrQueryRequest sreq, SolrQueryResponse rsp) { sreq.getContext().put( HttpServletRequest, req ); super.execute( req, handler, sreq, rsp ) } -Hoss -- --Noble Paul
Re: CollapseFilter with the latest Solr in trunk
I have not looked at this in a while, but I think the biggest thing it is missing right now is a champion -- someone to get the patches (and bug fixes) to a state where it can easily be committed. Minor bug fixes are road blocks to getting things integrated. ryan On Apr 20, 2009, at 10:16 AM, Jeff Newburn wrote: What are the current issues holding this back? Seems to be working with some minor bug fixes. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Otis Gospodnetic otis_gospodne...@yahoo.com Reply-To: solr-user@lucene.apache.org Date: Sun, 19 Apr 2009 20:30:22 -0700 (PDT) To: solr-user@lucene.apache.org Subject: Re: CollapseFilter with the latest Solr in trunk Once somebody really makes it work, I'm sure it will be released! Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Antonio Eggberg antonio_eggb...@yahoo.se To: solr-user@lucene.apache.org Sent: Sunday, April 19, 2009 9:21:20 PM Subject: Re: CollapseFilter with the latest Solr in trunk I wish it would be planned for 1.4 :)) --- Den sön 2009-04-19 skrev Otis Gospodnetic : Från: Otis Gospodnetic Ämne: Re: CollapseFilter with the latest Solr in trunk Till: solr-user@lucene.apache.org Datum: söndag 19 april 2009 15.06 Thanks for sharing! It would be good if you (of Jeff from Zappos or anyone making changes to this) could put up a new patch for this most-voted-JIRA-issue. Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: climbingrose To: solr-user@lucene.apache.org Sent: Sunday, April 19, 2009 8:12:11 AM Subject: Re: CollapseFilter with the latest Solr in trunk Ok, here is how I fixed this problem: public DocListAndSet getDocListAndSet(Query query, ListfilterList, DocSet docSet, Sort lsort, int offset, int len, int flags) throwsIOException { //DocListAndSet ret = new DocListAndSet(); //getDocListC(ret,query,filterList,docSet,lsort,offset,len, flags |= GET_DOCSET); DocSet theFilt = getDocSet(filterList); if (docSet != null) theFilt = (theFilt != null) ? theFilt.intersection(docSet) : docSet; QueryCommand qc = new QueryCommand(); qc.setQuery(query).setFilter(theFilt); qc.setSort(lsort).setOffset(offset).setLen(len).setFlags(flags |= GET_DOCSET); QueryResult result = new QueryResult(); getDocListC(result,qc); return result.getDocListAndSet(); } There is also one-off error in CollapseFilter which you can find solution on Jira. Cheers, Cuong On Sat, Apr 18, 2009 at 4:41 AM, Jeff Newburn wrote: We are currently trying to do the same thing. With the patch unaltered we can use fq as long as collapsing is turned on. If we just send a normal document level query with an fq parameter it blows up. Additionally, it does not appear that the collapse.facet option works at all. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: climbingrose Reply-To: Date: Fri, 17 Apr 2009 16:53:00 +1000 To: solr-user Subject: CollapseFilter with the latest Solr in trunk Hi all, Have any one try to use CollapseFilter with the latest version of Solr in trunk? However, it looks like Solr 1.4 doesn't allow calling setFilterList() and setFilter() on one instance of the QueryCommand. I modified the code in QueryCommand to allow this: public QueryCommand setFilterList(Query f) { // if( filter != null ) { //throw new IllegalArgumentException( Either filter or filterList may be set in the QueryCommand, but not both. ); // } filterList = null; if (f != null) { filterList = new ArrayList(2); filterList.add(f); } return this; } However, I still have a problem which prevent query filters from working when used in conjunction with CollapseFilter. In other words, query filters doesn't seem to have any effects on the result set when CollapseFilter is used. The other problem is related to OpenBitSet: java.lang.ArrayIndexOutOfBoundsException: 2183 at org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java:242) at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java: 202) at org .apache .solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:16 1 ) at org.apache.solr.search.CollapseFilter.(CollapseFilter.java:141) at org .apache .solr.handler.component.QueryComponent.process(QueryComponent.java :2 17) at org .apache .solr.handler.component.SearchHandler.handleRequestBody(SearchHand le r.java:195) at org .apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. ja va:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:30 3 ) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 23 2)
Re: Multiple Solr-instance share same solr.home
as long as you make sure there are never two applications writing to the same index, you *should* be ok. But tread carefully... On Apr 19, 2009, at 3:28 PM, vivek sar wrote: Both Solr instances will be writing to separate indexes, but can they share the same solr.home? So, here is what I want, 1) solr.home = solr/multicore 2) There is a single solr.xml under multicore directory 3) Each instance would use the same solr.xml, which will have entries for multiple cores 4) Each instance will write to different core at a time - so one index will be written by only one writer at a time. not sure if this is a supported configuration. Thanks. -vivek On Sun, Apr 19, 2009 at 5:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Vivek - no, unless you want trouble - only 1 writer can write to a specific index at a time. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Sunday, April 19, 2009 4:33:00 AM Subject: Multiple Solr-instance share same solr.home Hi, Is it possible to have two solr instances share the same solr.home? I've two Solr instances running on the same box and I was wondering if I can configure them to have the same solr.home. I tried it, but looks like the second instance overwrites the first one's value in the solr.xml (I'm using multicore for both instances). This is just for convenience so I don't have to manage multiple solr index directory locations - I can have all the indexes written into the same location and do the clean up from one place itself. If this is not supported then it's not a big deal. Thanks, -vivek
Re: Advice on moving from 1.3 to 1.4-dev or trunk?
When you say Test ... Are you suggesting there is a test suite I should run, or do just do my own testing? your own testing... If you use a 'nightly' the unit tests all pass. BUT if you are not running from a standard release, there is may be things that are not totally flushed out, or configurations that have not been tried yet. For a release build lots of effort is made to make sure all lose ends are tied up. ryan
Re: [solr-user] Upgrade from 1.2 to 1.3 gives 3x slowdown
The work being done is addressing the deletes, AIUI, but of course there are other things happening during shutdown, too. There are no deletes to do. It was a clean index to begin with and there were no duplicates. I have not followed this thread, so forgive me if this has already been suggested If you know that there are not any duplicates, have you tried indexing with allowDups=true? It will not change the fsync cost, but it may reduce some other checking times. ryan
Re: Search included in *all* fields
what about: fieldA:value1 AND fieldB:value2 this can also be written as: +fieldA:value1 +fieldB:value2 On Apr 13, 2009, at 9:53 PM, Johnny X wrote: I'll start a new thread to make things easier, because I've only really got one problem now. I've configured my Solr to search on all fields, so it will only search for a specific query in a specific field (e.g. q=Date:October) will only search the 'Date' field, rather than all the others. The issue is when you build up multiple fields to search on. Only one of those has to match for a result to be returned, rather than all of them. Is there a way to change this? Cheers! -- View this message in context: http://www.nabble.com/Search-included-in-*all*-fields-tp23031829p23031829.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: QueryElevationComponent : hot update of elevate.xml
On Apr 10, 2009, at 7:48 AM, Nicolas Pastorino wrote: Hello ! Browsing the mailing-list's archives did not help me find the answer, hence the question asked directly here. Some context first : Integrating Solr with a CMS ( eZ Publish ), we chose to support Elevation. The idea is to be able to 'elevate' any object from the CMS. This can be achieved through eZ Publish's back office, with a dedicated Elevate administration GUI, the configuration is stored in the CMS temporarily, and then synchronized frequently and/or on demand onto Solr. This synchronisation is currently done as follows : 1. Generate the elevate.xml based on the stored configuration 2. Replace elevate.xml in Solr's dataDir 3. Commit. It appears that when having elevate.xml in Solr's dataDir, and solely in this case, commiting triggers a reload of elevate.xml. This does not happen when elevate.xml is stored in Solr's conf dir. This method has one main issue though : eZ Publish needs to have access to the same filesystem as the one on which Solr's dataDir is stored. This is not always the case when the CMS is clustered for instance -- show stopper :( Hence the following idea / RFC : How about extending the Query Elevation system with the possibility to push an updated elevate.xml file/XML through HTTP ? This would update the file where it is actually located, and trigger a reload of the configuration. Not being very knowledgeable about Solr's API ( yet ! ), i cannot figure out whether this would be possible, how this would be achievable ( which type of plugin for instance ) or even be valid ? Perhaps look at implementing custom RequestHandler: http://wiki.apache.org/solr/SolrRequestHandler maybe it could POST the new elevate.xm and then save it to the right place and call commit... ryan
Re: logging
If you use the off the shelf .war, it *should* be the same. (if not, we need to fix it) If you are building your own .war, how SLF4 behaves depends on what implementation is in the runtime path. If you want to use log4j logging, put in the slf4j-log4j.jar in your classpath and you should be all set. On Apr 9, 2009, at 4:56 PM, Kevin Osborn wrote: We built our own webapp that used the Solr JARs. We used Apache Commons/log4j logging and just put log4j.properties in the Resin conf directory. The commons-logging and log4j jars were put in the Resin lib driectory. Everything worked great and we got log files for our code only. So, I upgraded to Solr 1.4 and I no longer get my log file. I assume it has something to do with Solr 1.4 using SL4J instead of JDK logging, but it seems like my code would be independent of that. Any ideas?
Re: [Newbie]How to influante Revelance in Solr ?
On Mar 29, 2009, at 8:42 AM, Shalin Shekhar Mangar wrote: On Sun, Mar 29, 2009 at 4:57 PM, aerox7 amyne.berr...@me.com wrote: I want to get results orderd by keyword matching (score) and popularity. When i tryed somthing like this : q=hpsort=popularity desc, score desc I get Hp printer, hp laptop and hp jet, so it works ! But when i try to search hp jet (q=hp jetsort=popularity desc, score desc) i get the same result like the first query whos totaly wrong ! How to influante the score in my case ? for exemple give to the matching 1 factor and popularity 1.5 (or 2) Do not sort by popularity first as it will dominate the score. Look at function queries for influencing the score based on the popularity. http://wiki.apache.org/solr/FunctionQuery also consider using the dismax parser with the 'bf' parameter. I think the example has that configured, also check: http://wiki.apache.org/solr/DisMaxRequestHandler#head-14b9ca618089829d139e6f3d6f52ff63e22a80d1 ryan