Re: Provide value to uniqueID
You can specify the file name as the id by adding a TemplateTransformer on the entity x and specifying ${f.file} as the template value in the id field. For example: dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt recursive=true rootEntity=false entity name=x processor=LineEntityProcessor url=${f.fileAbsolutePath} transformer=TemplateTransformer field column=rawLine name=category_name / field name=id template=${f.file} / /entity /entity /document On Mon, Jun 9, 2014 at 11:23 AM, ienjreny ismaeel.enjr...@gmail.com wrote: Hello, I am using the following code to read text files dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt recursive=true rootEntity=false entity name=x processor=LineEntityProcessor url=${f.fileAbsolutePath} field column=rawLine name=category_name / field column=??? name=id / /entity /entity /document it is working perfect except the id value, how can I use file name (or any value) as value for uniqeuID field -- View this message in context: http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Documents Added Not Available After Commit (Both Soft and Hard)
I think this may be the same bug as LUCENE-5289 which was fixed in 4.5.1. Can you upgrade to 4.5.1 and see if that solves the problem? On Fri, Jun 6, 2014 at 7:17 PM, Justin Sweeney justin.sweene...@gmail.com wrote: Hi, An application I am working on indexes documents to a Solr index. This Solr index is setup as a single node, without any replication. This index is running Solr 4.5.0. We have noticed an issue lately that is causing some problems for our application. The problem is that we add/update a number of documents in the Solr index and we have the index setup to autoCommit (hard) once every 30 minutes. In the Solr logs, I am able to see the add command to Solr and I can also see Solr start the hard commit. When this hard commit occurs, we see the following message: INFO - 2014-06-04 20:13:55.135; org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit. This only happens sometimes, but Solr will go hours (we have seen 6-12 hours of this behavior) before it does a hard commit where it find changes. After the hard commit where the changes are found, we are then able to search for and find the documents that were added hours ago, but up until that point the documents are not searchable. We tried enabling autoSoftCommit every 5 minutes in the hope that this would help, but we are seeing the same behavior. Here is a sampling of the logs showing this occurring (I've trimmed it down to just show what is happening): INFO - 2014-06-05 20:00:41.300; org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection] webapp=/solr path=/update params={wt=javabinversion=2} {add=[359453225]} 0 0 INFO - 2014-06-05 20:00:41.376; org.apache.solr.update.processor.LogUpdateProcessor; [zoomCollection] webapp=/solr path=/update params={wt=javabinversion=2} {add=[347170717]} 0 1 INFO - 2014-06-05 20:00:51.527; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} INFO - 2014-06-05 20:00:51.533; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@257c43d main INFO - 2014-06-05 20:00:51.533; org.apache.solr.update.DirectUpdateHandler2; end_commit_flush INFO - 2014-06-05 20:00:51.545; org.apache.solr.core.QuerySenderListener; QuerySenderListener sending requests to Searcher@257c43d main{StandardDirectoryReader(segments_acl:1367002775953 _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533 _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139 _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255 _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556 _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240 _2gqy(4.5):C697/215 _2gr2(4.5):C878/352 _2gr7(4.5):C28135/11775 _2gr9(4.5):C3276/1341 _2grb(4.5):C5/1 _2grc(4.5):C3247/1219 _2grd(4.5):C6/1 _2grf(4.5):C5/2 _2grg(4.5):C23659/10967 _2grh(4.5):C1 _2grj(4.5):C1 _2grk(4.5):C5160/1482 _2grm(4.5):C1210/351 _2grn(4.5):C3957/1372 _2gro(4.5):C7734/2207 _2grp(4.5):C220/36)} INFO - 2014-06-05 20:00:51.546; org.apache.solr.core.SolrCore; [zoomCollection] webapp=null path=null params={event=newSearcherq=d_name:ibmdistrib=false} hits=38 status=0 QTime=0 INFO - 2014-06-05 20:00:51.546; org.apache.solr.core.QuerySenderListener; QuerySenderListener done. INFO - 2014-06-05 20:00:51.547; org.apache.solr.core.SolrCore; [zoomCollection] Registered new searcher Searcher@257c43d main{StandardDirectoryReader(segments_acl:1367002775953 _2f28(4.5):C13583563/4081507 _2gl6(4.5):C2754573/193533 _2g21(4.5):C1046256/296354 _2ge2(4.5):C835858/206139 _2gqd(4.5):C383500/31051 _2gmu(4.5):C125197/32491 _2grl(4.5):C46906/1255 _2gpj(4.5):C66480/16562 _2gra(4.5):C364/22 _2gr1(4.5):C36064/2556 _2gqg(4.5):C42504/21515 _2gqm(4.5):C26821/12659 _2gqu(4.5):C24172/10240 _2gqy(4.5):C697/215 _2gr2(4.5):C878/352 _2gr7(4.5):C28135/11775 _2gr9(4.5):C3276/1341 _2grb(4.5):C5/1 _2grc(4.5):C3247/1219 _2grd(4.5):C6/1 _2grf(4.5):C5/2 _2grg(4.5):C23659/10967 _2grh(4.5):C1 _2grj(4.5):C1 _2grk(4.5):C5160/1482 _2grm(4.5):C1210/351 _2grn(4.5):C3957/1372 _2gro(4.5):C7734/2207 _2grp(4.5):C220/36)} INFO - 2014-06-05 20:01:10.557; org.apache.solr.update.DirectUpdateHandler2; start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} INFO - 2014-06-05 20:01:10.559; org.apache.solr.core.SolrCore; [zoomCollection] webapp=/solr path=/select params={fl=d_ticker,d_location,d_id,d_source_count,d_xml_domain,d_cik,d_keyword_count,d_xml_name,d_xml_contact,d_main_domain,d_location_codestart=0q=d_domain:( www.northwestcollege.edu)wt=javabinversion=2rows=99} hits=4 status=0 QTime=40 INFO - 2014-06-05 20:01:10.563; org.apache.solr.search.SolrIndexSearcher; Opening Searcher@69f90ad1 main INFO -
Re: Provide value to uniqueID
Thanks, it is working fine but I had to change the following line field name=id template=${f.file} / to field column=id template=${f.file} / On Mon, Jun 9, 2014 at 9:29 AM, Shalin Shekhar Mangar [via Lucene] ml-node+s472066n4140715...@n3.nabble.com wrote: You can specify the file name as the id by adding a TemplateTransformer on the entity x and specifying ${f.file} as the template value in the id field. For example: dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt recursive=true rootEntity=false entity name=x processor=LineEntityProcessor url=${f.fileAbsolutePath} transformer=TemplateTransformer field column=rawLine name=category_name / field name=id template=${f.file} / /entity /entity /document On Mon, Jun 9, 2014 at 11:23 AM, ienjreny [hidden email] http://user/SendEmail.jtp?type=nodenode=4140715i=0 wrote: Hello, I am using the following code to read text files dataSource type=FileDataSource / document entity name=f processor=FileListEntityProcessor baseDir=F:\Work\Lucene\Solr\Solr Arabic Book fileName=.txt recursive=true rootEntity=false entity name=x processor=LineEntityProcessor url=${f.fileAbsolutePath} field column=rawLine name=category_name / field column=??? name=id / /entity /entity /document it is working perfect except the id value, how can I use file name (or any value) as value for uniqeuID field -- View this message in context: http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712p4140715.html To unsubscribe from Provide value to uniqueID, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4140712code=aXNtYWVlbC5lbmpyZW55QGdtYWlsLmNvbXw0MTQwNzEyfC01NTkxMjYzODg= . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Provide-value-to-uniqueID-tp4140712p4140725.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr spellcheck - onlyMorePopular threshold?
Hello all, I was wondering what does the onlyMorePopular option for spellchecking use as its threshold? Will it always pick the suggestion that returns the most queries or does it base its result based off of some threshold that can be configured? Thanks! Ali. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow performance on simple filter
I'm really at dead point. Mine indeks is 5,6GM and about 8mln documments. Field i'm using for filter is simple as hell. field name=class_name type=string indexed=true stored=true multiValued=false/ Can it be that other fields affect my search if i only do filter query? solr/puls-objects-prod/select?q=*%3A*fq=class_name:License mine results: int name=QTime831/int lst name=params str name=q*:*/str str name=fqclass_name:License/str /lst /lst result name=response numFound=8655108 start=0 -- View this message in context: http://lucene.472066.n3.nabble.com/slow-performance-on-simple-filter-tp4135613p4140728.html Sent from the Solr - User mailing list archive at Nabble.com.
writing logs of a speicific solr posting to a file
Hi I am using SimplepostTool to post the xml files to SOLR llke : java -Durl=http://localhost:8080/solr/collection1/update -jar /var/lib/tomcat6/solr/collection1/dump/xmlinput/post.jar /var/lib/tomcat6/solr/collection1/dump/xmlinput/solr.xml When there are certain errors ,the response from above command just shows the 404 error or 500 server error but doesnt provide the complete log details like in http://localhost:8080/solr/#/~logging or in catalina.out I want to catch the exact log details that are thrown in the logs when the above command is executed and write to a file .I am wondering if there are additional params that need to be passed in the command line or I have to work in the configurations . -- View this message in context: http://lucene.472066.n3.nabble.com/writing-logs-of-a-speicific-solr-posting-to-a-file-tp4140730.html Sent from the Solr - User mailing list archive at Nabble.com.
How Can I modify the DocList and DocSet in solr
I am using solr 4.6 and I am using solr Sharding (Distributed Search). I have situation where I like to modify the solr search result (DocList and DocSet) inside solr QueryComponent right after the following method is called from process() method. searcher.search(result, cmd); Can I modify the DocList and DocSet after the search inside QueryComponent and add it to the QueryResult. Also can I make the DocList unsorted. -- View this message in context: http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Performance Benchmarking
To be of any help we'd need to know what your documents look like, what your queries look like, what is the specifications of your server? How much heap is dedicated to Solr, how much free memory is available for the OS file cache. You have to figure out the bottleneck. Is it CPU or RAM or Disk? Maybe it's excessive garbage collection? Turn on GC logging and look at GC activity. On Sun, Jun 8, 2014 at 11:39 PM, rashi gandhi gandhirash...@gmail.com wrote: Hi, I am using SolrMeter for performance benchmarking. I am able to successfully test my solr setup up to 1000 queries per min while searching. But when I am exceeding this limit say 1500 search queries per min, facing Server Refused Connection in SOLR. Currently, I have only one solr server running on 64-bit 4 GB ram machine for testing. Please provide me some pointers , to optimize SOLR so that it can handle large number of request. (Specially more than 1000 request per min). Is there any change that I can do in solrconfig.xml or some other change to support this? Thanks in Advance DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. -- Regards, Shalin Shekhar Mangar.
Large disjunction query practices
I'm wondering what the best practice for large disjunct queries in Solr is. A user wants to submit a query for several hundred thousand terms, like: (term1 OR term2 OR ... term500,000) I know it might be better to break this up into multiple queries that can be merged on the user's end, but I'm wondering if there's guidance for a good limit of OR'ed terms per query. 100 terms? 200? 500? Any idea what kinds of data set or memory limitations might govern this threshold? Thanks, Joe -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength.*-Philippians 4:12-13*
Re: Large disjunction query practices
Are they expecting relevancy ranking or merely seeking to a bulk read of those documents? Please detail what the user is trying to accomplish with such a monster list of IDs. Generally, queries of more than a few dozen terms are a bad idea. If for no other reason than that if you need to debug them or examine the results by hand, it will be a nightmare. OTOH, some people really love drama and just can't get enough of it. The general guidance is to keep requests and responses relatively small. Keep network traffic down. Keep compute intensity down. Keep memory requirements down. Small is better. -- Jack Krupansky -Original Message- From: Joe Gresock Sent: Monday, June 9, 2014 8:50 AM To: solr-user@lucene.apache.org Subject: Large disjunction query practices I'm wondering what the best practice for large disjunct queries in Solr is. A user wants to submit a query for several hundred thousand terms, like: (term1 OR term2 OR ... term500,000) I know it might be better to break this up into multiple queries that can be merged on the user's end, but I'm wondering if there's guidance for a good limit of OR'ed terms per query. 100 terms? 200? 500? Any idea what kinds of data set or memory limitations might govern this threshold? Thanks, Joe -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength.*-Philippians 4:12-13*
Re: How Can I modify the DocList and DocSet in solr
Can you make a custom Component? They are pluggable. Regards, Alex On 09/06/2014 6:24 pm, Vishnu Mishra vdil...@gmail.com wrote: I am using solr 4.6 and I am using solr Sharding (Distributed Search). I have situation where I like to modify the solr search result (DocList and DocSet) inside solr QueryComponent right after the following method is called from process() method. searcher.search(result, cmd); Can I modify the DocList and DocSet after the search inside QueryComponent and add it to the QueryResult. Also can I make the DocList unsorted. -- View this message in context: http://lucene.472066.n3.nabble.com/How-Can-I-modify-the-DocList-and-DocSet-in-solr-tp4140754.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customizing Solr; Where to draw the line?
I’ve certainly go for the 2nd option. Depending of what you need you won’t need to modify Solr itself but extend it using different plugins for what you need. You’ll need to write different components depending on your specific requirements. I definitely recommend the talks from Trey Grainger, from CareerBuilder. I remember seeing in some of the talks they have A/B testing built into Solr, and a lot of other “crazy” things, so it would be a good starting point, and it will provide a look on what you could accomplish by extending Solr. Of course you’ll need to update your source between big releases of Solr, and perhaps between some minor ones, but this way you don’t need to worry about the latency or maintain a new search layer between the client and Solr. I hope it helps, On Jun 8, 2014, at 10:38 PM, Phanindra R phani...@gmail.com wrote: Hi, We have decided to migrate from Lucene 3.x to latest Solr. A lot of architectural discussions are going on. There are two possible approaches. Please note that our customer-facing app (or any client) and Search are hosted on different machines. *1) Have a clean architecture* - Solr takes care of customized search only. - We certainly have to override some filtering, scoring,etc. - There will be an intermediary search-app that - receives queries - does a/b testing assignments, and other non-search stuff. - does query expansion / rewriting (to avoid every Solr shard doing that) - transforms query into Solr syntax and uses Solr's http API to consume it. - returns the response to customer-facing app or whatever the client is. The problem with this approach is the additional layer and the latency between search-app and solr. The client of search has to make an API call, across the network, to the intermediary search-app which in turns makes another Http API call to Solr. *2) Customize Solr to the full extent* - Do all the crazy stuff within Solr. - We can literally create a new url and register a handler class to process that. With some limitations, we should be able to do almost anything. The benefit of this approach is that it obviates the additional layer and the latency. However, I see a lot of long-term problems like hard to upgrade Solr's version, Dev flexibility (usage of Spring, Hib, etc.). How about a distributed search? Where do above approaches stand? I understand that this is a subjective question. It'd be helpful if you could share your thoughts and experiences. Thanks. VII Escuela Internacional de Verano en la UCI del 30 de junio al 11 de julio de 2014. Ver www.uci.cu
Re: Solr Scale Toolkit Access Denied Error
Thanks, Tim. Worked like a charm. Appreciate your timely assistance. On Sat, Jun 7, 2014 at 9:13 PM, Timothy Potter thelabd...@gmail.com wrote: Hi Mark, Sorry for the trouble! I've now made the ami-1e6b9d76 AMI public; total oversight on my part :-(. Please try again. Thanks Hoss for trying to help out on this one. Cheers, Tim On Fri, Jun 6, 2014 at 6:46 PM, Mark Gershman montan...@gmail.com wrote: Thanks, Hoss. I did substitute the previous AMI ID from the mid-May release of the toolkit and the build process does proceed further; however, it appears the the AMI changed enough that it is not compatible with the new toolkit release. In doing a little more research, I'm inclined to believe that the permissions on the AMI may be the source of the problem and will post to the issue tracker per your suggestion. Mark Gershman On Fri, Jun 6, 2014 at 7:41 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : My guess is that the customized toolkit AMI (ami-1e6b9d76) at AWS is not : accessible by my AWS credentials. Is this an AMI permissioning issue or is : it a problem with my particular account or how it is configured at AWS. I : did not experience this specific problem when working with the previous : iteration of the Solr Scale Toolkit back toward the latter part of May. It : appears that the AMI was updated from ami-96779efe to ami-1e6b9d76 with the : newest version of the toolkit. I'm not much of an AWS expert, but i seem to recall that if you don't have your AWS security group setup properly this type of error can happen? is it possible that when you were trying out solr-scale-tk before you had this setup, but now you don't? https://github.com/LucidWorks/solr-scale-tk You'll need to setup a security group named solr-scale-tk (or update the fabfile.py to change the name). At a minimum you should allow TCP traffic to ports: 8983, 8984-8989, SSH, and 2181 (ZooKeeper). However, it is your responsibility to review the security configuration of your cluster and lock it down appropriately. You'll also need to create an keypair (using the Amazon console) named solr-scale-tk (you can rename the key used by the framework, see: AWS_KEY_NAME). After downloading the keypair file (solr-scale-tk.pem), save it to ~/.ssh/ and change permissions: chmod 600 ~/.ssh/solr-scale-tk.pem ...if I'm wrong, and there really is a problem with the security on the AMI, the best place to report that would be in the project's issue tracker... https://github.com/LucidWorks/solr-scale-tk/issues -Hoss http://www.lucidworks.com/
RE: Solr spellcheck - onlyMorePopular threshold?
I believe it will return the terms that are most similar to the queried terms but have a greater term frequency than the queried terms. It doesn't actually care what the term frequencies are, only that they are greater than the frequencies of the terms you queried on. I do not know your use case, but you may want to consider using spellcheck.alternativeTermCount instead of onlyMorePopular. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount and https://issues.apache.org/jira/browse/SOLR-2585?focusedCommentId=13096153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096153 for why. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Alistair [mailto:ali...@gmail.com] Sent: Monday, June 09, 2014 3:06 AM To: solr-user@lucene.apache.org Subject: Solr spellcheck - onlyMorePopular threshold? Hello all, I was wondering what does the onlyMorePopular option for spellchecking use as its threshold? Will it always pick the suggestion that returns the most queries or does it base its result based off of some threshold that can be configured? Thanks! Ali. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html Sent from the Solr - User mailing list archive at Nabble.com.
Collection communication internally
Hi All, I was curious to know how multiple Collection communication be achieved? If yes then by what means. The use case says, having multiple collection I need to query the first collection and get the unique ids from first collection to query the second one(Foreign Key Relation). Now if the no. of terms to be passed to second collection is relatively small then its fine otherwise the problem arise, as adding them to the query is little time consuming in sense of building the query, querying to solr and waiting for the result to respond back. So the query would look something like - http://localhost:7070/solr/mycollection/select?q= http://localhost:7070/solr/recollection/select?q=*:*fl=idsort=id_S%20descID:( 1 OR 2 OR ... OR 10)fl=* So for the above form of query where the query terms are expanding vigorously I was looking out for some solution where the collections can internally resolve the query and fetch the resultant output. Thanks!
Re: Any way to view lucene files
No, Anyways thanks Alex, but where is the luke jar? With Regards Aman Tandon On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Have you looked at: https://github.com/DmitryKey/luke Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com wrote: I guess this is not available now. I am trying to download from the google, please take a look https://code.google.com/p/luke/downloads/list If you have any link please share With Regards Aman Tandon On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com wrote: Did u try luke 47 On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com wrote: I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA but got this error: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene3x, SimpleText, Appending] With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.com wrote: My solr version is 4.8.1 and luke is 3.5 With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins ch...@geekychris.com wrote: What version of Solr / Lucene are you using? You have to match the Luke version to the same version of Lucene. C On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com wrote: Yes tried, but it not working at all every time i choose my index directory it shows me EOF past With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins ch...@geekychris.com wrote: Have you tried: https://code.google.com/p/luke/ Best Chris On Jun 6, 2014, at 11:24 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Is there any way so that i can view what information and which is there in my _e.fnm, etc files. may be with the help of any application or any viewer tool. With Regards Aman Tandon
Re: ANN: Solr Next
On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley ysee...@gmail.com wrote: [...] Next major feature: Native Code Optimizations. In addition to moving more large data structures off-heap(like UnInvertedField?), I am planning to implement native code optimizations for certain hotspots. Native code faceting would be an obvious first choice since it can often be a CPU bottleneck. It's in! Abbreviated report: 2x performance increase over stock solr faceting (which is already fast!) http://heliosearch.org/native-code-faceting/ -Yonik http://heliosearch.org -- making solr shine Project resources: https://github.com/Heliosearch/heliosearch https://groups.google.com/forum/#!forum/heliosearch https://groups.google.com/forum/#!forum/heliosearch-dev Freenode IRC: #heliosearch #heliosearch-dev -Yonik
Re: Any way to view lucene files
Just click the 'Releases' link: https://github.com/DmitryKey/luke/releases François On Jun 9, 2014, at 10:43 AM, Aman Tandon amantandon...@gmail.com wrote: No, Anyways thanks Alex, but where is the luke jar? With Regards Aman Tandon On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Have you looked at: https://github.com/DmitryKey/luke Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com wrote: I guess this is not available now. I am trying to download from the google, please take a look https://code.google.com/p/luke/downloads/list If you have any link please share With Regards Aman Tandon On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com wrote: Did u try luke 47 On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com wrote: I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA but got this error: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene3x, SimpleText, Appending] With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.com wrote: My solr version is 4.8.1 and luke is 3.5 With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins ch...@geekychris.com wrote: What version of Solr / Lucene are you using? You have to match the Luke version to the same version of Lucene. C On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com wrote: Yes tried, but it not working at all every time i choose my index directory it shows me EOF past With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins ch...@geekychris.com wrote: Have you tried: https://code.google.com/p/luke/ Best Chris On Jun 6, 2014, at 11:24 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Is there any way so that i can view what information and which is there in my _e.fnm, etc files. may be with the help of any application or any viewer tool. With Regards Aman Tandon
Re: Setup a Solr Cloud on a set of powerful machines
Well, you've omitted information about the most precious resource for Solr, memory. That said, this question is impossible to answer in the abstract, see: http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Sun, Jun 8, 2014 at 3:17 PM, shushuai zhu ss...@yahoo.com.invalid wrote: Hi, I would like to get some advice to setup a Solr Cloud on a set of powerful machines. The average size of the documents handled by the Solr Cloud is about 0.5 KB, and the number of documents stored in Solr Cloud could reach billions. When indexing, the incoming document rate could be as high as 20k/second; and the major query operations performed on the Cloud are searching, faceting, and some other aggregations. There will NOT be many concurrent queries (replication factor of 2 may be good enough), but some queries could cover big range of documents. As an example, I have 8 powerful machines (nodes), and each machine (node) has: 16 CPU cores 256GB RAM 48TB physical disk space The Solr Cloud may be setup in following different ways (assuming replication factor is 2): 1) 8 shards on 8 Solr servers, total 16 cores (including replicas) Each machine (node) holds one Solr server (JVM), and each Solr server has one shard. 2) 32 shards on 8 Solr servers, total 64 cores (including replicas) Each machine (node) holds one Solr server (JVM), and each Solr server has 4 shards. 3) 32 shards on 16 Solr servers, total 64 cores (including replicas) Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 shards. 4) 64 shards on 16 Solr servers, total 128 cores (including replicas) Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 shards. 5) 128 shards on 32 Solr servers, total 256 cores (including replicas) Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 shards. Could someone advice which layout is better? Or you have some other better layout? The basic idea is to divide a powerful machine to have more Solr Servers and/or more shards. I would like to get some advice about the trade-offs and general guidelines about the division. It would be very helpful if you can advice an example setup for this use case. Thanks a lot. Shushuai
Re: Any way to view lucene files
Yeah just got it thanks Fracois :) With Regards Aman Tandon On Mon, Jun 9, 2014 at 8:20 PM, François Schiettecatte fschietteca...@gmail.com wrote: Just click the 'Releases' link: https://github.com/DmitryKey/luke/releases François On Jun 9, 2014, at 10:43 AM, Aman Tandon amantandon...@gmail.com wrote: No, Anyways thanks Alex, but where is the luke jar? With Regards Aman Tandon On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Have you looked at: https://github.com/DmitryKey/luke Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, Jun 9, 2014 at 8:12 AM, Aman Tandon amantandon...@gmail.com wrote: I guess this is not available now. I am trying to download from the google, please take a look https://code.google.com/p/luke/downloads/list If you have any link please share With Regards Aman Tandon On Sat, Jun 7, 2014 at 10:32 PM, Summer Shire shiresum...@gmail.com wrote: Did u try luke 47 On Jun 6, 2014, at 11:59 PM, Aman Tandon amantandon...@gmail.com wrote: I also tried with solr 4.2 and with luke version Luke 4.0.0-ALPHA but got this error: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene42' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: [Lucene40, Lucene3x, SimpleText, Appending] With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:22 PM, Aman Tandon amantandon...@gmail.com wrote: My solr version is 4.8.1 and luke is 3.5 With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:21 PM, Chris Collins ch...@geekychris.com wrote: What version of Solr / Lucene are you using? You have to match the Luke version to the same version of Lucene. C On Jun 6, 2014, at 11:42 PM, Aman Tandon amantandon...@gmail.com wrote: Yes tried, but it not working at all every time i choose my index directory it shows me EOF past With Regards Aman Tandon On Sat, Jun 7, 2014 at 12:01 PM, Chris Collins ch...@geekychris.com wrote: Have you tried: https://code.google.com/p/luke/ Best Chris On Jun 6, 2014, at 11:24 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, Is there any way so that i can view what information and which is there in my _e.fnm, etc files. may be with the help of any application or any viewer tool. With Regards Aman Tandon
Re: Deepy nested structure
Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Deepy-nested-structure-tp4140397p4140802.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Deepy nested structure
thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Deepy-nested-structure-tp4140397p4140803.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection communication internally
My first answer is don't do it that way :). Solr works best with flattened (de-normlized) data. If at all possible, you _really_ would be better off combining the two collections and flattening the data even though there would be more data. Whenever I see a question like this, I wonder if you're trying to use Solr like a DB, in this case with collections substituting for tables, and this is almost always a mistake. If you really must do this, consider cross-core joins if at all possible, but I don't think this is supported yet for distributed setups. Best, Erick On Mon, Jun 9, 2014 at 7:32 AM, Vineet Mishra clearmido...@gmail.com wrote: Hi All, I was curious to know how multiple Collection communication be achieved? If yes then by what means. The use case says, having multiple collection I need to query the first collection and get the unique ids from first collection to query the second one(Foreign Key Relation). Now if the no. of terms to be passed to second collection is relatively small then its fine otherwise the problem arise, as adding them to the query is little time consuming in sense of building the query, querying to solr and waiting for the result to respond back. So the query would look something like - http://localhost:7070/solr/mycollection/select?q= http://localhost:7070/solr/recollection/select?q=*:*fl=idsort=id_S%20descID:( 1 OR 2 OR ... OR 10)fl=* So for the above form of query where the query terms are expanding vigorously I was looking out for some solution where the collections can internally resolve the query and fetch the resultant output. Thanks!
How use gorup and facet ?
Dear Solr expert. I have 2 problems need your help. 1) I have to group list with group.limit=1group.main=truegroup.sort=Date desc (many group and each group has 1 element is newest). Then from list group (each group has 1 element), I want to filter in order to remove items (in groups) not matches condition. Could you tell me the way with 1 statement in order to query ? 2) How could I facet and show all records of each facet with 1 statement ? Thanks you. Hai
Re: Setup a Solr Cloud on a set of powerful machines
On 6/8/2014 4:17 PM, shushuai zhu wrote: I would like to get some advice to setup a Solr Cloud on a set of powerful machines. The average size of the documents handled by the Solr Cloud is about 0.5 KB, and the number of documents stored in Solr Cloud could reach billions. When indexing, the incoming document rate could be as high as 20k/second; and the major query operations performed on the Cloud are searching, faceting, and some other aggregations. There will NOT be many concurrent queries (replication factor of 2 may be good enough), but some queries could cover big range of documents. As an example, I have 8 powerful machines (nodes), and each machine (node) has: 16 CPU cores 256GB RAM 48TB physical disk space The Solr Cloud may be setup in following different ways (assuming replication factor is 2): 1) 8 shards on 8 Solr servers, total 16 cores (including replicas) Each machine (node) holds one Solr server (JVM), and each Solr server has one shard. 2) 32 shards on 8 Solr servers, total 64 cores (including replicas) Each machine (node) holds one Solr server (JVM), and each Solr server has 4 shards. 3) 32 shards on 16 Solr servers, total 64 cores (including replicas) Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 shards. 4) 64 shards on 16 Solr servers, total 128 cores (including replicas) Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 shards. 5) 128 shards on 32 Solr servers, total 256 cores (including replicas) Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 shards. Erick's note is very important. From the information given, we can't even guess about the size of your index. Even if we had that information, there are too many variables to give you any real recommendations. Also mentioned by Erick: RAM is the single greatest factor affecting Solr performance. If you have enough OS disk cache to fit your index entirely in RAM, performance is likely to be excellent. With 256GB of RAM on eight servers, you're going to have about 2TB of RAM, some of which will be used for Solr itself. If both copies of your index take up 2TB or less in disk space, you're probably going to be OK there. You'd probably be OK up to about 3TB of total index. The 48TB of disk space is probably serious overkill. I would assume this is twelve 4TB drives. It would be better for performance (without losing redundancy) to use RAID10 with a stripe size of at least 1MB for the storage instead of any other RAID level. It eats up half your raw space for redundancy, but the performance is *excellent*. The fact that your query volume will be low does give me the ability to tell you one thing: With 16 CPU cores per machine and a low query volume, you'll be able to handle a lot more Solr cores per machine. The extra CPU cores can spend their time reading from Solr cores and speeding up each individual query without worrying about being crushed under hundreds of queries per second. For a perfect match of CPU cores to Solr cores, you'd do option number 4, so each machine would get 16 Solr cores ... but I think option number 3 might be better, so you have more CPUs than indexes per machine. This gives you a safe capacity of about 32 billion documents, with a maximum total capacity of well over 64 billion documents. Thanks, Shawn
Re: SOLR Performance Benchmarking
On 6/8/2014 12:09 PM, rashi gandhi wrote: I am using SolrMeter for performance benchmarking. I am able to successfully test my solr setup up to 1000 queries per min while searching. But when I am exceeding this limit say 1500 search queries per min, facing Server Refused Connection in SOLR. Currently, I have only one solr server running on 64-bit 4 GB ram machine for testing. Please provide me some pointers , to optimize SOLR so that it can handle large number of request. (Specially more than 1000 request per min). Is there any change that I can do in solrconfig.xml or some other change to support this? This sounds like your servlet container is configured to limit the number of threads that can be started. I would bet that you are using a packaged Tomcat or Jetty install rather than the Jetty included in the Solr example, and that it has maxThreads set to the default value of 200. Solr tends to start a lot of threads internally simply for normal operation. If your servlet container is set to limit the number of total threads to 200 (with a default queue of 100 connections beyond the 200 threads) and you reach the limit because your connection rate is high, then new connections will be refused. The Solr example has its servlet container configured to allow ten thousand threads, so it almost never has this problem. You'll need to find the documentation for your servlet container and look there for information on how to increase maxThreads. For Tomcat 7, the Introduction in the HTTP Connector part of the documentation mentions the problem: http://tomcat.apache.org/tomcat-7.0-doc/config/http.html#Introduction You may also need to increase the amount of RAM in the server (or change the configuration to reduce heap requirements) to avoid performance problems that cause each individual query to be slow: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn
RE: COMMERCIAL: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
Hi Chris, Created ticket https://issues.apache.org/jira/browse/SOLR-6154 Included to the ticket the data.xml and a PDF with instructions on how to replicate. Sending different updates to different ports was just how the confluence tutorial made the steps; it does not affect the result of the test As soon as I have more information will post to the ticket. Appreciate the interest, let me know about any suggestion or feedback Thank you Ronald Matamoros -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 06 June 2014 22:00 To: solr-user@lucene.apache.org Subject: COMMERCIAL: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response Ronald: I'm having a little trouble understading the steps o reproduce that you are describing -- in particular Step 1 f ii because i'm not really sure i understand what exactly you are putting in mem2.xml Also: Since you don't appera to be using implicit routing, i'm not clear on why you are explicitly sending differnet updates to different ports in Step 1 f i -- does that affect the results of your test? If you can reliably reproduce using modified data from the example, could you please open a Jira outline these steps and atached the modified data to index directly to that issue? (FWIW: If it doesn't matter what port you use to send which documents, then you should be able to create a single unified data.xml file containing all the docs to index in a single command) : Date: Thu, 29 May 2014 18:06:38 + : From: Ronald Matamoros rmatamo...@searchtechnologies.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: RE: SolrCloud: facet range option f.field.facet.mincount=1 omits : buckets on response : : Hi all, : : At the moment I am reviewing the code to determine if this is a legitimate bug that needs to be set as a JIRA ticket. : Any insight or recommendation is appreciated. : : Including the replication steps as text: : : - : Solr versions where issue was replicated. : * 4.5.1 (Linux) : * 4.8.1 (Windows + Cygwin) : : Replicating : : 1. Created two-shard environment - no replication : https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud : : a. Download Solr distribution from http://lucene.apache.org/solr/downloads.html : b. Unzipped solr-4.8.1.zip to a temporary location: SOLR_DIST_HOME : c. Ran once so the SolrCloud jars get unpacked: java -jar start.jar : d. Create nodes : i. cd SOLR_DIST_HOME : ii. Via Windows Explorer copied example to node1 : iii. Via Windows Explorer copied example to node2 : : e. Start Nodes : i. Start node 1 : :cd node1 :java -DzkRun -DnumShards=2 -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -jar start.jar : : ii. Start node 2 : :cd node2 :java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar : : f. Fed sample documents : i. Out of the box : :curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml -d @mem.xml :curl http://localhost:7574/solr/update?commit=true -H Content-Type: text/xml -d @monitor2.xml : : ii. Create a copy of mem.xml to mem2.xml; modified identifiers, names, prices and fed : :curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml -d @mem2.xml : :add : doc :field name=idCOMPANY1/field :field name=nameCOMPANY1 Device/field :field name=manuCOMPANY1 Device Mfg/field :. :field name=price190/field :. : /doc : doc :field name=idCOMPANY2/field :field name=nameCOMPANY2 flatscreen/field :field name=manuCOMPANY2 Device Mfg./field :. :field name=price200.00/field :. : /doc : doc :field name=idCOMPANY3/field :field name=nameCOMPANY3 Laptop/field :field name=manuCOMPANY3 Device Mfg./field :. :field name=price800.00/field :. : /doc : : /add : : 2. Query **without** f.price.facet.mincount=1, counts and buckets are OK : :
accessing individual elements of a multivalued field
hi, prod: p cat : catA,catB,catC prod :q cat : catB, catC,catD My schema consists of documents with uid : 'prod's and then they belong can to multiple categories called 'cat' and which are represented as a multivalued field. For a particular kind of query I need to access individual elements separately as in return prod where (cat_1 == catA) or (cat_2==catB). is there a way by which i can do that? thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862.html Sent from the Solr - User mailing list archive at Nabble.com.
solr4 optimization
Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some of the boxes we have about 5 million deleted docs and we have never run optimization since beginning. Does number of deleted docs have anything to do with performance of query? Should we consider optimization at all if we're not worried about disk space? Thanks!
SolrCloud collection create / delete failure
Hey guys, I'm trying to simply create collection foo in SolrCloud (to a collection that failed to create once due to a badly formatted schema). I try the following: createCollection foo - could not create a new core solr/foo_shard1_replica1 as another core is already defined there deleteCollection foo - could not find collection foo unload core foo_shard1_replica1 and delete data dir - no such core exists 'foo_shard1_replica1' The directory 'foo_shard1_replica1' exists in my /solr directory. How can I recover out of this state without manually deleting the directory and/or wiping out my ZK? Thanks, John
Re: solr4 optimization
Hi, I don't remember last time I ran optimize. Sure, yes, things will work faster if you optimize an index and reduce the number of segments, but if you are regularly writing to that index and performance is OK, leave it to Lucene segment merges to purge deletes. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Jun 9, 2014 at 4:15 PM, Joshi, Shital shital.jo...@gs.com wrote: Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some of the boxes we have about 5 million deleted docs and we have never run optimization since beginning. Does number of deleted docs have anything to do with performance of query? Should we consider optimization at all if we're not worried about disk space? Thanks!
Re: Setup a Solr Cloud on a set of powerful machines
the incoming document rate could be as high as 20k/second... That sounds like a lot of CPU eager indexing work, given the 128 CPU cores available, from indexing speed perspective: would you recommend having a similar number of solr cores created, or Solr does just a when several with a small number of Solr cores, having several CPU cores per Solr core, as indexing is multi-threaded? On Mon, Jun 9, 2014 at 7:19 PM, Shawn Heisey s...@elyograg.org wrote: On 6/8/2014 4:17 PM, shushuai zhu wrote: I would like to get some advice to setup a Solr Cloud on a set of powerful machines. The average size of the documents handled by the Solr Cloud is about 0.5 KB, and the number of documents stored in Solr Cloud could reach billions. When indexing, the incoming document rate could be as high as 20k/second; and the major query operations performed on the Cloud are searching, faceting, and some other aggregations. There will NOT be many concurrent queries (replication factor of 2 may be good enough), but some queries could cover big range of documents. As an example, I have 8 powerful machines (nodes), and each machine (node) has: 16 CPU cores 256GB RAM 48TB physical disk space The Solr Cloud may be setup in following different ways (assuming replication factor is 2): 1) 8 shards on 8 Solr servers, total 16 cores (including replicas) Each machine (node) holds one Solr server (JVM), and each Solr server has one shard. 2) 32 shards on 8 Solr servers, total 64 cores (including replicas) Each machine (node) holds one Solr server (JVM), and each Solr server has 4 shards. 3) 32 shards on 16 Solr servers, total 64 cores (including replicas) Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 2 shards. 4) 64 shards on 16 Solr servers, total 128 cores (including replicas) Each machine (node) holds 2 Solr servers (JVMs), and each Solr server has 4 shards. 5) 128 shards on 32 Solr servers, total 256 cores (including replicas) Each machine (node) holds 4 Solr servers (JVMs), and each Solr server has 4 shards. Erick's note is very important. From the information given, we can't even guess about the size of your index. Even if we had that information, there are too many variables to give you any real recommendations. Also mentioned by Erick: RAM is the single greatest factor affecting Solr performance. If you have enough OS disk cache to fit your index entirely in RAM, performance is likely to be excellent. With 256GB of RAM on eight servers, you're going to have about 2TB of RAM, some of which will be used for Solr itself. If both copies of your index take up 2TB or less in disk space, you're probably going to be OK there. You'd probably be OK up to about 3TB of total index. The 48TB of disk space is probably serious overkill. I would assume this is twelve 4TB drives. It would be better for performance (without losing redundancy) to use RAID10 with a stripe size of at least 1MB for the storage instead of any other RAID level. It eats up half your raw space for redundancy, but the performance is *excellent*. The fact that your query volume will be low does give me the ability to tell you one thing: With 16 CPU cores per machine and a low query volume, you'll be able to handle a lot more Solr cores per machine. The extra CPU cores can spend their time reading from Solr cores and speeding up each individual query without worrying about being crushed under hundreds of queries per second. For a perfect match of CPU cores to Solr cores, you'd do option number 4, so each machine would get 16 Solr cores ... but I think option number 3 might be better, so you have more CPUs than indexes per machine. This gives you a safe capacity of about 32 billion documents, with a maximum total capacity of well over 64 billion documents. Thanks, Shawn
Re: writing logs of a speicific solr posting to a file
Check out the patch on the issue below. We hit the same issue and posted a patch, none of the committers have picked it up yet, but would be good to get some feedback on it and get this into the next dot release. If it works for you, please vote it up. https://issues.apache.org/jira/browse/SOLR-5940 Thanks, -- *Sameer Maggon* Founder | Measured Search http://measuredsearch.com On Mon, Jun 9, 2014 at 3:48 AM, pshahukhal pshahuk...@gmail.com wrote: Hi I am using SimplepostTool to post the xml files to SOLR llke : java -Durl=http://localhost:8080/solr/collection1/update -jar /var/lib/tomcat6/solr/collection1/dump/xmlinput/post.jar /var/lib/tomcat6/solr/collection1/dump/xmlinput/solr.xml When there are certain errors ,the response from above command just shows the 404 error or 500 server error but doesnt provide the complete log details like in http://localhost:8080/solr/#/~logging or in catalina.out I want to catch the exact log details that are thrown in the logs when the above command is executed and write to a file .I am wondering if there are additional params that need to be passed in the command line or I have to work in the configurations . -- View this message in context: http://lucene.472066.n3.nabble.com/writing-logs-of-a-speicific-solr-posting-to-a-file-tp4140730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: accessing individual elements of a multivalued field
Not currently. You could have separate explicit fields for the categories such as cat_1, cat_2, etc. The data would need to be replicated (possibly using a copyField), but redundancy to facilitate access is a reasonable approach. -- Jack Krupansky -Original Message- From: kritarth.anand Sent: Monday, June 9, 2014 2:48 PM To: solr-user@lucene.apache.org Subject: accessing individual elements of a multivalued field hi, prod: p cat : catA,catB,catC prod :q cat : catB, catC,catD My schema consists of documents with uid : 'prod's and then they belong can to multiple categories called 'cat' and which are represented as a multivalued field. For a particular kind of query I need to access individual elements separately as in return prod where (cat_1 == catA) or (cat_2==catB). is there a way by which i can do that? thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: accessing individual elements of a multivalued field
Thanks for the response Jack -- View this message in context: http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862p4140911.html Sent from the Solr - User mailing list archive at Nabble.com.
How to simplifying my query for appropriate scoring.
hi all, I need help simplifying my query. The doc structure is as follows. docStructure id A cat : p, q, r id B cat : m, n ,o id C cat: l,b, o Now given this structure my job is to find documents which have cat ids belonging to a list. Right now this is achieved in this fashion using OR of multiple queries. given list [p,n,o ] and ranked in the order. Query 1: q : cat:p --- doc:A Query 2 : q : cat n AND !(cat:p) -- B Query 3 : q (cat o AND !(cat:p) AND (cat:n)) -- C final query = query1^3 OR query2^2 OR query3^1 this is to ensure the ranking is A,B,C The query is pretty complicated and gets very long too so I would want to form a shorter version of it if possible. There are just two constraints. a: the highest preference is given to doc with cat:p even if some other matches all the other terms. so A should be higher than B(even when B matches both n and o) . b. Also if there are two docs which have match on first cat :p, they should have equal score irrespective of rest of values of cat. For example consider an additional document D id : D cat :[p,n,o] Now D and A both match on first cat p and therefore the fact that D also matches on n and o should not matter and both A and D should have same score. Please let me know if there is a simple way of doing it. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-simplifying-my-query-for-appropriate-scoring-tp4140913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Integrate solr with openNLP
Hi Aman, Yeah, We are also thinking the same. Using UIMA is better. And thanks to everyone. You guys really showed us the way(UIMA). We'll work on it. Thanks, Vivek On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Vikek, As everybody in the mail list mentioned to use UIMA you should go for it, as opennlp issues are not tracking properly, it can make stuck your development in near future if any issue comes, so its better to start investigate with uima. With Regards Aman Tandon On Fri, Jun 6, 2014 at 11:00 AM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Can anyone pleas reply..? Thanks, Vivek -- Forwarded message -- From: Vivekanand Ittigi vi...@biginfolabs.com Date: Wed, Jun 4, 2014 at 4:38 PM Subject: Re: Integrate solr with openNLP To: Tommaso Teofili tommaso.teof...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org, Ahmet Arslan iori...@yahoo.com Hi Tommaso, Yes, you are right. 4.4 version will work.. I'm able to compile now. I'm trying to apply named recognition(person name) token but im not seeing any change. my schema.xml looks like this: field name=text type=text_opennlp_pos_ner indexed=true stored=true multiValued=true/ fieldType name=text_opennlp_pos_ner class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.OpenNLPTokenizerFactory tokenizerModel=opennlp/en-token.bin / filter class=solr.OpenNLPFilterFactory nerTaggerModels=opennlp/en-ner-person.bin / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Please guide..? Thanks, Vivek On Wed, Jun 4, 2014 at 1:27 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, Ahment was suggesting to eventually use UIMA integration because OpenNLP has already an integration with Apache UIMA and so you would just have to use that [1]. And that's one of the main reason UIMA integration was done: it's a framework that you can easily hook into in order to plug your NLP algorithm. If you want to just use OpenNLP then it's up to you if either write your own UpdateRequestProcessor plugin [2] to add metadata extracted by OpenNLP to your documents or either you can write a dedicated analyzer / tokenizer / token filter. For the OpenNLP integration (LUCENE-2899), the patch is not up to date with the latest APIs in trunk, however you should be able to apply it to (if I recall correctly) to 4.4 version or so, and also adapting it to the latest API shouldn't be too hard. Regards, Tommaso [1] : http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#org.apche.opennlp.uima [2] : http://wiki.apache.org/solr/UpdateRequestProcessor 2014-06-03 15:34 GMT+02:00 Ahmet Arslan iori...@yahoo.com.invalid: Can you extract names, locations etc using OpenNLP in plain/straight java program? If yes, here are two seperate options : 1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to integrate your NER code into it and write your own indexing code. You have the full power here. No solr-plugins are involved. 2) Use 'Implementing a conditional copyField' given here : http://wiki.apache.org/solr/UpdateRequestProcessor as an example and integrate your NER code into it. Please note that these are separate ways to enrich your incoming documents, choose either (1) or (2). On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Okay, but i dint understand what you said. Can you please elaborate. Thanks, Vivek On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vivekanand, I have never use UIMA+Solr before. Personally I think it takes more time to learn how to configure/use these uima stuff. If you are familiar with java, write a class that extends UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields (organisation, city, person name, etc, to your document. This phase is usually called 'enrichment'. Does that makes sense? On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi Ahmet, I followed what you said https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how can i achieve my goal? i mean extracting only name of the organization or person from the content field. I guess i'm almost there but something is missing? please guide me Thanks, Vivek On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Entire goal cant be said but one of those tasks can be like