Re: How are multivalued fields used?
Hello Gene, Am Montag, den 13.10.2008, 23:32 +1300 schrieb ristretto.rb: How does one use of this field type. Forums, wiki, Lucene in Action, all coming up empty. If there's a doc somewhere please point me there. I use pysolr to index. But, that's not a requirement. I'm not sure how one adds multivalues to a document. You add multiple fields with the same name to a document, say keywords. doc field name=keywordssolrfield/ field name=keywordslucenefield/ field name=keywordsseachfield/ /doc If have configured the the field keyword in the index to be multivalued, then solr will dump the values of each field in your document into the field keywords. When you request that solr return the value in the field keywords, then you get a coma seperated list of the keywords. solr,lucene,search And once added, if you want to remove one how do you specify? You cannot remove a field from a document, documents are read only, you have to reindex the document with the same unique id and the new information. Based on http://wiki.apache.org/solr/FieldOptionsByUseCase, it says to use it to add multiple values, maintaining order Is the order for indexing/searching or for storing/returning? Not sure, but I would assume that the values are stored in the field in the order that the fields are specified in the document. Brian
Re: Searching with Wildcards
Hello all, Sorry I have taken so long to get back to Eriks reply, I used the technique of inserting a ? before the * to get at prototype working. However, if 1.3 does not support this anymore, then I really need to look into alternatives. What would be the scope of the work to implement Erik's suggestion, I would have to ask my boss, but I think we would then contribute the code back to Solr. This should probably be continued on solr-dev, right? Brian Am Mittwoch, den 17.09.2008, 17:19 -0400 schrieb Mark Miller: Alas no, the queryparser now uses an unhighlightable constantscore query. I'd personally like to make it work at the Lucene level, but not sure how thats going to proceed. The tradeoff is that you won't have max boolean clause issues and wildcard searches should be faster. It is a bummer though. - Mark dojolava wrote: Hi, I have another question on the wildcard problem: In the previous Solr releases there was a workaround to highlight wildcard queries using the StandardRequestHandler by adding a ? in between: e.g. hou?* would highlight house. But this is not working anymore. Is there maybe another workaround? ;-) Regards, Mathis On Tue, Sep 2, 2008 at 2:15 PM, Erik Hatcher [EMAIL PROTECTED]wrote: Probably your best bet is to create a new QParser(Plugin) that uses Lucene's QueryParser directly. We probably should have that available anyway in the core, just so folks coming from Lucene Java have the same QueryParser. Erik On Sep 2, 2008, at 7:11 AM, Brian Carmalt wrote: Hello all, I need to get wildcard searches with highlighting up and running. I'd like to get it to work with a DismaxHandler, but I'll settle with starting with the StandardRequestHandler. I've been reading the some of the past mails on wildcard searches and Solr-195. It seems I need to change the default behavior for wildcards from PrefixFilter to a PrefixQuery. I know that I will have to deal with TooManyClauses Exceptions, but I want to paly around with it. I have read that this can only be done by modifying the code, but I cann't seem to find the correct section. Can someone point me in the right direction? Thanks. - Brian
Re: Not enough space
Search with Google for swap file linux linux or distro name There is tons of info out there. Am Donnerstag, den 25.09.2008, 02:07 -0700 schrieb sunnyfr: Hi, I've obviously the same error, I just don't know how do you add swap space ? Thanks a lot, Yonik Seeley wrote: On 7/5/07, Xuesong Luo [EMAIL PROTECTED] wrote: Thanks, Chris and Yonik. You are right. I remember the heap size was over 500m when I got the Not enough space error message. Is there a best practice to avoid this kind of problem? add more swap space. -Yonik
Re: How to copy a solr index to another index with a different schema collapsing stored data?
It wouldn't be that bad to merge the index externally and the reindex the results, if it is as simple as your example. Search for id:[1 TO *] and a fq for the category, increment the slice of the results you need to process until you have covered all of the docs in the category. Request the content field and extract them from the xml responses and save them somewhere. When you have all the info, reindex it. Am Mittwoch, den 17.09.2008, 10:00 -0400 schrieb Erick Erickson: You *might* be able to reconstruct enough of the original documents from your indexes to create another without recrawling. I know Luke can reconstruct documents form an index, but for unstored data it's slow and may be lossy. But it may suit your needs given how long it takes to make your index in the first place. Best Erick On Tue, Sep 16, 2008 at 9:14 PM, Gene Campbell [EMAIL PROTECTED] wrote: I was pretty sure you'd say that. But, I means lots that you take the time to confirm it. Thanks Otis. I don't want to give details, but we crawl for our data, and we don't save it in a DB or on disk. It goes from download to index. Was a good idea at the time; when we thought our designs were done evolving. :) cheers gene On Wed, Sep 17, 2008 at 12:51 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: You can't copy+merge+flatten indices like that. Reindexing would be the easiest. Indexing taking weeks sounds suspicious. How much data are you reindexing and how big are your indices? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: ristretto.rb [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, September 16, 2008 8:14:16 PM Subject: How to copy a solr index to another index with a different schema collapsing stored data? Is it possible to copy stored index data from index to another, but concatenating it as you go. Suppose 2 categories A and B both with 20 docs, for a total of 40 docs in the index. The index has a stored field for the content from the docs. I want a new index with only two docs in it, one for A and one for B. And it would have a stored field that is the sum of all the stored data for the 20 docs of A and of B respectively. So, then a query on this index will tell me give me a relevant list of Categories? Perhaps there's a solr query to get that data out, and then I can handle concatenating it, and then indexing it in the new index. I'm hoping I don't have to reindex all this data from scratch? It has taken weeks! thanks gene
Re: too many open files
Am Montag, den 14.07.2008, 09:50 -0400 schrieb Yonik Seeley: Solr uses reference counting on IndexReaders to close them ASAP (since relying on gc can lead to running out of file descriptors). How do you force them to close ASAP? I use File and FileOutputStream objects, I close the output streams and then call delete on the files. I sill have problems with to many open files. After a while I get exceptions that I cannot open any new files. After this the threads stop working and a day later, the files are still open and marked for deletion. I have to kill the server to get it running again or call System.gc() periodically. How do force the VM to realese the files? This happens under RedHat with a 2.4er kernel and under Debian Etch with 2.6er kernel. Thanks, Brian -Yonik On Mon, Jul 14, 2008 at 9:15 AM, Brian Carmalt [EMAIL PROTECTED] wrote: Hello, I have a similar problem, not with Solr, but in Java. From what I have found, it is a usage and os problem: comes from using to many files, and the time it takes the os to reclaim the fds. I found the recomendation that System.gc() should be called periodically. It works for me. May not be the most elegant, but it works. Brian. Am Montag, den 14.07.2008, 11:14 +0200 schrieb Alexey Shakov: now we have set the limt to ~1 files but this is not the solution - the amount of open files increases permanantly. Earlier or later, this limit will be exhausted. Fuad Efendi schrieb: Have you tried [ulimit -n 65536]? I don't think it relates to files marked for deletion... == http://www.linkedin.com/in/liferay Earlier or later, the system crashes with message Too many open files
Re: too many open files
Hello, I have a similar problem, not with Solr, but in Java. From what I have found, it is a usage and os problem: comes from using to many files, and the time it takes the os to reclaim the fds. I found the recomendation that System.gc() should be called periodically. It works for me. May not be the most elegant, but it works. Brian. Am Montag, den 14.07.2008, 11:14 +0200 schrieb Alexey Shakov: now we have set the limt to ~1 files but this is not the solution - the amount of open files increases permanantly. Earlier or later, this limit will be exhausted. Fuad Efendi schrieb: Have you tried [ulimit -n 65536]? I don't think it relates to files marked for deletion... == http://www.linkedin.com/in/liferay Earlier or later, the system crashes with message Too many open files
Re: How to debug ?
Hello Beto, There is a plugin for jetty: http://webtide.com/eclipse. Insert this as and update site and let eclipse install the plugin for you You can then start the jetty server from eclipse and debug it. Brian. Am Mittwoch, den 25.06.2008, 12:48 +1000 schrieb Norberto Meijome: On Tue, 24 Jun 2008 19:17:58 -0700 Ryan McKinley [EMAIL PROTECTED] wrote: also, check the LukeRequestHandler if there is a document you think *should* match, you can see what tokens it has actually indexed... right, I will look into that a bit more. I am actually using the lukeall.jar (0.8.1, linked against lucene 2.4) to look into what got indexed, but I am bit wary of how what I select in the the 'analyzer' drop down option in Luke actually affects what I see. B _ {Beto|Norberto|Numard} Meijome Web2.0 is outsourced RD from Web1.0 companies. The Reverend I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Problem with searching using the DisMaxHandler
Hello all, I have defined a DisMax handler. It should search in the following fields: content1, content2 and id(doc uid). I would like to beable to specify a query like the following: (search terms) AND ( id1 OR id2 .. idn) My intent is to retrieve only the docs in which hits for the search terms occur and that the docs have one of the specified ids. Unfortunately, I get not document matches. Can any one shed some light on the What I am doing wrong? Thanks, Brian
Re: AW: My First Solr
Do you see if the document update is sucessful? When you start solr with java -jar start.jar for the example, Solr will list the the document id of the docs that you are adding and tell you how long the update took. A simple but brute force method to findout if a document has been commited is to stop the server and then restart it. You can also use the solr/admin/stats.jsp page to see if the docs are there. After looking at your query in the results you posted, I would bet that you are not specifying a search field. try searching for anwendung:KIS or id:[1 TO *] to see all the docs in you index. Brian Am Freitag, den 13.06.2008, 07:40 +0200 schrieb Thomas Lauer: i have tested: SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file import_sample.xml SimplePostTool: COMMITting Solr index changes..
Re: My First Solr
http://wiki.apache.org/solr/DisMaxRequestHandler In solrconfig.xml there are example configurations for the DisMax. Sorry I told you the wrong name, not enough coffee this morning. Brian. Am Freitag, den 13.06.2008, 09:40 +0200 schrieb Thomas Lauer:
Re: AW: My First Solr
No, you do not have to reindex. You do have to restart the server. The bf has fields listed that are not in your document: popularity, price. delete the bf field, you do not need it unless you want to use boost functions. Brian Am Freitag, den 13.06.2008, 10:36 +0200 schrieb Thomas Lauer: ok, my dismax requestHandler name=dismax class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qf beschreibung^0.5 ordner^1.0 register^1.2 Benutzer^1.5 guid^10.0 mandant^1.1 /str str name=pf beschreibung^0.2 ordner^1.1 register^1.5 manu^1.4 manu_exact^1.9 /str str name=bf ord(poplarity)^0.5 recip(rord(price),1,1000,1000)^0.3 /str str name=fl guid,beschreibung,mandant,Benutzer /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler must i make a reindex? I seek with this url http://localhost:8983/solr/select?indent=onversion=2.2q=bonowstart=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl= The response is: HTTP Status 400 - undefined field text type Status report message undefined field text description The request sent by the client was syntactically incorrect (undefined field text). Regards Thomas -Ursprüngliche Nachricht- Von: Brian Carmalt [mailto:[EMAIL PROTECTED] Gesendet: Freitag, 13. Juni 2008 09:50 An: solr-user@lucene.apache.org Betreff: Re: My First Solr http://wiki.apache.org/solr/DisMaxRequestHandler In solrconfig.xml there are example configurations for the DisMax. Sorry I told you the wrong name, not enough coffee this morning. Brian. Am Freitag, den 13.06.2008, 09:40 +0200 schrieb Thomas Lauer: __ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank-Version 3182 (20080612) __ E-Mail wurde geprüft mit ESET NOD32 Antivirus. http://www.eset.com
Searching accross many fields
Hello All, We are thinking about a totally dynamic indexing schema, where the only fields that known to be in the index is the ID field. This means that in order to search in the index, the field names of where we want to search must be specified. q=title:solr+content:solr+summary:solr and so on. This works well when the number of fields is small, but what are the performance ramifications when the number of fields is more than 1000? Is this a serious performance killer? If yes, what would we need to counter act it, more RAM or faster CPU's? Or both? Is it better to copy all fields to a content field and then always search there? This works, but then it is hard to boost specific field values. and that is what we want to do. Any advice or experience in this area is appreciated. Thanks, Brian
Re: exception while feeding converted text from pdf
Hello Cam, Are you writing your xml by hand, as in no xml writer? That can cause problems. In your exception it says latitude 59, the should have converted to 'amp;'(I think). If you can use Java6, there is a XMLStreamWriter in java.xml.stream that does automatic special character escaping. This can simplify writing simple xml. Unfortunatly the stream writer does not filter out invalid xml characters. So I will point you to a helpful website: http://cse-mjmcl.cse.bris.ac.uk/blog/2007/02/14/1171465494443.html Hope this helps. Brian Am Mittwoch, den 14.05.2008, 19:23 +0300 schrieb Cam Bazz: Hello, I made a simple java program to convert my pdfs to text, and then to xml file. I am getting a strange exception. I think the converted files have some errors. should I encode the txt string that I extract from the pdfs in a special way? Best, -C.B. EVERE: org.xmlpull.v1.XmlPullParserException: entity reference names can not start with character ' ' (position: START_TAG seen ...ay\n latitude 59 ... @80:64) at org.xmlpull.mxp1.MXParser.parseEntityRef(MXParser.java:2212) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1275) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Re: How to effectively search inside fields that should be indexed with changing them.
Hello Otis, The example I provided was a simplified one. The real usecase is that will have to dynamically adapt to field values, from which we have no idea what form they will have.. So unfortunately, a custom tokenizer will not work. I changed the n-gram values to min=max= 2 and I can match sub terms inside the fields that are analyzed with the NGramTokenizer. But I haven't had the time to test it completely. Can you quickly outline why n-grams are not good solution for my problem? Thanks, Brian Otis Gospodnetic schrieb: Brian, This is not really a job for n-grams. It sounds like you'll want to write a custom Tokenizer that has knowledge about this particular pattern, knows how to split input like the one in your example, and produce multiple tokens out of it. For the natural language part you can probably get away with one of the existing tokenizers/analyzers/factories. For the first part you'll likely want to extract (W+)0+ -- 1 or morel etters followed by 1 or more zeros as one token, and then 0+(D+) -- 1 or more zeros followed by 1 or more digits. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Brian Carmalt [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, December 11, 2007 9:17:32 AM Subject: How to effectively search inside fields that should be indexed with changing them. Hello all, The titles of our docs have the form ABC0001231-This is an important doc.pdf. I would like to be able to search for 'important', or '1231', or 'ABC000*', or 'This is an important doc' in the title field. I looked a the NGramTokenizer and tried to use it. In the index it doesn't seem to work, I cannot get any hits. The analysis tool on the admin pages shows me that the ngram tokenizing works by highlighting the matches between the indexed value and a query. I have set the min and max ngram size to 2 and 6, with side equal to left. Can anyone recommend a procedure that will allow me to search as stated above? I would also like to find out more about how to use the NgramTokenizer, but have found little in the form of documentation. Anyone know about any good sources? Thanks, Brian
How to effectively search inside fields that should be indexed with changing them.
Hello all, The titles of our docs have the form ABC0001231-This is an important doc.pdf. I would like to be able to search for 'important', or '1231', or 'ABC000*', or 'This is an important doc' in the title field. I looked a the NGramTokenizer and tried to use it. In the index it doesn't seem to work, I cannot get any hits. The analysis tool on the admin pages shows me that the ngram tokenizing works by highlighting the matches between the indexed value and a query. I have set the min and max ngram size to 2 and 6, with side equal to left. Can anyone recommend a procedure that will allow me to search as stated above? I would also like to find out more about how to use the NgramTokenizer, but have found little in the form of documentation. Anyone know about any good sources? Thanks, Brian
Re: out of heap space, every day
Hello, I am also fighting with heap exhaustion, however during the indexing step. I was able to minimize, but not fix the problem by setting the thread stack size to 64k with -Xss64k. The minimum size is os specific, but the VM will tell you if you set the size too small. You can try it, it may help Brian Brian Whitman schrieb: This maybe more of a general java q than a solr one, but I'm a bit confused. We have a largish solr index, about 8M documents, the data dir is about 70G. We're getting about 500K new docs a week, as well as about 1 query/second. Recently (when we crossed about the 6M threshold) resin has been stopping with the following: /usr/local/resin/log/stdout.log:[12:08:21.749] [28304] HTTP/1.1 500 Java heap space /usr/local/resin/log/stdout.log:[12:08:21.749] java.lang.OutOfMemoryError: Java heap space Only a restart of resin will get it going again, and then it'll crash again within 24 hours. It's a 4GB machine and we run it with args=-J-mx2500m -J-ms2000m We can't really raise this any higher on the machine. Are there 'native' memory requirements for solr as a function of index size? Does a 70GB index require some minimum amount of wired RAM? Or is there some mis-configuration w/ resin or solr or my system? I don't really know Java well but it seems strange that the VM can't page RAM out to disk or really do something else beside stopping the server.
Re: Weird memory error.
Can you recommend one? I am not familar with how to profile under Java. Yonik Seeley schrieb: Can you try a profiler to see where the memory is being used? -Yonik On Nov 20, 2007 11:16 AM, Brian Carmalt [EMAIL PROTECTED] wrote: Hello all, I started looking into the scalability of solr, and have started getting weird results. I am getting the following error: Exception in thread btpool0-3 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:574) at org.mortbay.thread.BoundedThreadPool.newThread(BoundedThreadPool.java:377) at org.mortbay.thread.BoundedThreadPool.dispatch(BoundedThreadPool.java:94) at org.mortbay.jetty.bio.SocketConnector$Connection.dispatch(SocketConnector.java:187) at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:101) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) This only occurs when I send docs to the server in batches of around 10 as separate processes. If I send the serially, the heap grows up to 1200M and with no errors. When I observe the VM during it's operation, It doesn't seem to run out of memory. The VM starts with 1024M and can allocate up to 1800M. I start getting the error listed above when the memory usage is right around 1 G. I have been using the Jconsole program on windows to observe the jetty server by using the com.sun.management.jmxremote* functions on the server side. The number of threads is always around 30, and jetty can create up 250, so I don't think that's the problem. I can't really image that the monitoring process is using the other 800M of the allowable heap memory, but it could be. But the problem occurs without monitoring, even when the VM heap is set to 1500M. Does anyone have an idea as to why this error is occurring? Thanks, Brian
Weird memory error.
Hello all, I started looking into the scalability of solr, and have started getting weird results. I am getting the following error: Exception in thread btpool0-3 java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:574) at org.mortbay.thread.BoundedThreadPool.newThread(BoundedThreadPool.java:377) at org.mortbay.thread.BoundedThreadPool.dispatch(BoundedThreadPool.java:94) at org.mortbay.jetty.bio.SocketConnector$Connection.dispatch(SocketConnector.java:187) at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:101) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:516) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) This only occurs when I send docs to the server in batches of around 10 as separate processes. If I send the serially, the heap grows up to 1200M and with no errors. When I observe the VM during it's operation, It doesn't seem to run out of memory. The VM starts with 1024M and can allocate up to 1800M. I start getting the error listed above when the memory usage is right around 1 G. I have been using the Jconsole program on windows to observe the jetty server by using the com.sun.management.jmxremote* functions on the server side. The number of threads is always around 30, and jetty can create up 250, so I don't think that's the problem. I can't really image that the monitoring process is using the other 800M of the allowable heap memory, but it could be. But the problem occurs without monitoring, even when the VM heap is set to 1500M. Does anyone have an idea as to why this error is occurring? Thanks, Brian
Re: [jira] Commented: (SOLR-380) There's no way to convert search results into page-level hits of a structured document.
There is more to consider here. Lucene now supports payloads, additional metadata on terms that can be leveraged with custom queries. I've not yet tinkered with them myself, but my understanding is that they would be useful (and in fact designed in part) for representing structured documents. It would behoove us to investigate how payloads might be leveraged for your needs here, such that a single field could represent an entire document, with payloads representing the hierarchical structure. This will require specialized Analyzer and Query subclasses be created to take advantage of payloads. The Lucene community itself is just now starting to exploit this new feature, so there isn't a lot out there on it yet, but I think it holds great promise for these purposes. Erik Hello Erik, Could you elaborate on how payloads could be used to represent a structured doc? Thanks, Brian
Searching dynamic fields
Hello all, Is there a way to search dynamicFields, without having to specify the name of the filed in a Query. Example: I have index a doc with the field name myDoc_text_en. and I have a dynamic field *_text_en which maps to a type of text_en. How can I search this field without knowing its specific name? Can I search according to field type? I have looked at the DisMaxRequestHandler, which might work, but it doesn't accept wildcard field names or field types. I'm using 1.3. Thanks in advance. Brian
Re: Querying for an id with a colon in it
Robert Young schrieb: Hi, If my unique identifier is called guid and one of the ids in it is, for example, article:123. How can I query for that article id? I have tried a number of ways but I always either get no results or an error. It seems to be to do with having the colon in the id value. eg. ?q=guid:article:123 - error ?q=guid:article:123 - error ?q=guid:article%3A123 - error Any ideas? Cheers Rob Try it with a \: That's what the Lucene Query Parser Syntax page says. It doesn't cause an error, but I don't know if it will provide the results you want. Brian
Re: Searching dynamic fields
Hello Erik, A field copy implies a doubling of the data in the Index, right? OR should I not store or index the dynamic field and instead copy it to another field, and then let it be indexed and stored? Another possibility would be to search all fields, but that doesn't seem to be possible. Or am I Missing something? Thanks, Brian. Erik Hatcher schrieb: Brian - you can copyField all *_en fields to a common contents_en field, for example, and then search contents_en:(for whatever). You cannot currently search by field type, though that is an interesting possible feature. I would like to see Solr support wildcarded field names in request parameters, but we're not there yet. Erik On Oct 15, 2007, at 9:32 AM, Brian Carmalt wrote: Hello all, Is there a way to search dynamicFields, without having to specify the name of the filed in a Query. Example: I have index a doc with the field name myDoc_text_en. and I have a dynamic field *_text_en which maps to a type of text_en. How can I search this field without knowing its specific name? Can I search according to field type? I have looked at the DisMaxRequestHandler, which might work, but it doesn't accept wildcard field names or field types. I'm using 1.3. Thanks in advance. Brian
Re: Indexing very large files.
Lance Norskog schrieb: Now I'm curious: what is the use case for documents this large? Thanks, Lance Norskog It is a rand use case, but could become relevant for us. I was told to explore the possibilities, and that's what I'm doing. :) Since I haven't heard any suggestions as to how to do this with a stock Solr install, other than increase vm memory, I'll assume it will have to be done with a custom solution. Thanks for the answers and the interest. Brian
Re: Indexing very large files.
Yonik Seeley schrieb: On 9/5/07, Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. 300MB of what... a single 300MB document? Or is that file represent multiple documents in XML or CSV format? -Yonik Hello Yonik, Thank you for your fast reply. It is one large document. If it was made up of smaller docs, I would split it up and index them separately. Can Solr be made to handle such large docs? Thanks, Brian
Re: Indexing very large files.
Hello again, I run Solr on Tomcat under windows and use the tomcat monitor to start the service. I have set the minimum heap size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of ram. The error that I get after sending approximately 300 MB is: java.lang.OutOfMemoryError: Java heap space at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2947) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:261) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:581) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) After sleeping on the problem I see that it does not directly stem from Solr, but from the module org.xmlpull.mxp1.MXParser. Hmmm. I'm open to sugestions and ideas. First is this doable? If yes, will I have to modify the code to save the file to disk and then read it back in order to index it in chunks. Or can I get it it working on a stock Solr install. Thanks, Brian Norberto Meijome schrieb: On Wed, 05 Sep 2007 17:18:09 +0200 Brian Carmalt [EMAIL PROTECTED] wrote: I've bin trying to index a 300MB file to solr 1.2. I keep getting out of memory heap errors. Even on an empty index with one Gig of vm memory it sill won't work. Hi Brian, VM != heap memory. VM = OS memory heap memory = memory made available by the JavaVM to the Java process. Heap memory errors are hardly ever an issue of the app itself (other , of course, with bad programming... but it doesnt seem to be issue here so far) [EMAIL PROTECTED] [Thu Sep 6 14:59:21 2007] /usr/home/betom $ java -X [...] -Xmssizeset initial Java heap size -Xmxsizeset maximum Java heap size -Xsssizeset java thread stack size [...] For example, start solr as : java -Xms64m -Xmx512m -jar start.jar YMMV with respect to the actual values you use. Good luck, B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Indexing very large files.
Moin Thorsten, I am using Solr 1.2.0. I'll try the svn version out and see of that helps. Thanks, Brian Which version do you use of solr? http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java?view=markup The trunk version of the XmlUpdateRequestHandler is now based on StAX. You may want to try whether that is working better. Please try and report back. salu2