Re: solr java.lang.NullPointerException on select queries
For the first install, I copied over all files in the directory example into, let's call it, install1. I did the same for install2. The two installs run on different ports, use different jar files, are not really related to each other in any way as far as I can see. In particular, they are not multicore. They have the same access control setup via jetty. I did a diff on config files and confirmed that only port numbers are different. Both had been running fine in parallel importing from a common database for several weeks. The documents indexed by install1, the problematic one currently, is a vastly bigger (~2.5B) superset of those indexed by install2 (~250M). At this point, select queries on install1 incurs the NullPointerException irrespective of whether install2 is running or not. The log file looks like it is indexing normally as always though. The index is also growing at the usual rate each day. Just select queries fail. :( -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990476.html Sent from the Solr - User mailing list archive at Nabble.com.
Authentication Issue in Shards Query
Hi I have a Solr server with 5 Cores, I have modified the Web.xml of solr.war to have a basic authentication feature enabled for all the web resources. Also I have written my own Login Module to have the login check. Now when I query a single core It asks for the User name and password, with proper credential the query works fine. But when I use a shard type of Query I get a 401 error. Basically the credential provided to the query is not passed on to shard queries. Is there a way to overcome this issue via some configurations. Also the replication is blocked because of authentication. Please provide me a work arround for this issue. Regards Senthil Kumar M R -- View this message in context: http://lucene.472066.n3.nabble.com/Authentication-Issue-in-Shards-Query-tp3990481.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexation Speed?
Ok thanks for this information, Le 20/06/2012 05:44, Lance Norskog a écrit : M. Della Bitta is right- we're not talking about post.jar, but starting Solr: java -xMx300m -jar start.jar On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson erickerick...@gmail.com wrote: Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems like it defaults to Integer.MAX_VALUE, so you're fine And it's all deprecated in 4.x, will be gone Best Erick On Tue, Jun 19, 2012 at 7:07 AM, Bruno Manninabmann...@free.fr wrote: Actually -Xmx512m and no effect Concerning maxFieldLength, no problem it's commented Le 19/06/2012 13:02, Erick Erickson a écrit : Then try -Xmx600M next try -Xmx900M etc. The idea is to bump things on separate runs. But be a little cautious here. Look in your solrconfig.xml file, you'll see a commented-out line maxFieldLength1/maxFieldLength The default behavior for Solr/Lucene is to index the first 10,000 tokens (not characters, think of tokens as words for not) in each document and throw the rest on the floor. At the sizes you're talking about, that's probably not a problem, but do be aware of it. Best Erick On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.frwrote: Like that? java -Xmx300m -jar post.jar myfile.xml Le 19/06/2012 11:11, Lance Norskog a écrit : Ah! Java memory size is a java command line option: http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html You would try increasing the memory size in stages up to maybe 300m. On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr wrote: Le 19/06/2012 10:51, Lance Norskog a écrit : 675 doc/s is respectable for that server. You might move the memory allocated to Java up and down- there is a balance between amount of memory in Java v.s. the OS disk buffer. How can I do that ? is there an option during my command line or in a config file? sorry for this newbie question :( And, of course, use the latest trunk. Solr 3.6 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr wrote: Correction: file size is 40 Mo !!! Le 19/06/2012 09:09, Bruno Mannina a écrit : Dear All, I would like to know if the indexation speed is right. I have a 40Go file size with around 27 000 docs inside. I index around 20 fields, My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go Ram The file takes 40 seconds with the command line: java -jar post.jar myfile.xml Could I increase this speed or reduce this time? Thanks a lot, PS: Newbie user
Solr with Tomcat on VPS
I am running Solr in a shared Tomcat v5.5.28 (I have access to all instances) on a Linux VPS server. When I set it all up, Tomcat starts properly and I can see that it has accesses my Solr Config directory properly. I can access the JSP pages if I reference them directly (http://mysite.com/solr/admin/index.jsp for example) but access to URL's like: 1. http://mysite.com/solr/admin/ 2. http://mysite.com/solr/admin/dataimport.jsp?clean=falsecommit=truecomm and=full-import 3. http://mysite.com/solr/select/?q=*%3A*version=2.2start=0rows=10inden t=on all return 404 errors like URL /solr/select/ was not found on this server. I have tried all I can think of and wondering if anyone else has some thoughts. This all works great on my development PC where I run the same version of Tomcat. Thanks, Mike
Solr Autosuggest
Hi, I have a question regarding solr Autosuggest. (If this is not the correct link to Post, Please suggest). I have implemented solr Autosuggest with Suggester component. I have read in a blog saying, Currently implemented Lookups keep their data in memory, so unlike spellchecker data, this data is discarded on core reload and not available until you invoke the build command, either explicitly or implicitly during a commit. I have a Master-Slave setup. If i add new documents to Master and give commit, then suggest would be built( as i gave given buildOnCommit=true). But, when replication is done, the Slave would reload the core, At that point, will it affect Autosuggestion of the newly added docs. Thanks, Shri
Re: parameters to decide solr memory consumption
This is really difficult to answer because there are so many variables; the number of unique terms, whether you store fields or not (which is really unrelated to memory consumption during searching), etc, etc, etc. So even trying the index and just looking at the index directory won't tell you much about memory consumption. And memory use has been dramatically improved in the 4.x code line, so anything we can say is actually wrong. Not to mention that your particular use of caches (filterCache, queryResultCache etc) will change during runtime. I'm afraid you'll just have to try it and see. Yes, LIA is accurate... Best Erick On Tue, Jun 19, 2012 at 8:28 AM, Sachin Aggarwal different.sac...@gmail.com wrote: hello, need help regarding how solr stores the indexes i was reading a article that says solr also stores the indexes in same format as explained in appendix B of lucene in action is it true and what parameters do i need to focus on while estimating the memory used by my use case as i have table like (userid, username, usertime, userlocation, userphn, timestamp, address) what i believe in my case cardinality of some fields like gender and location userphnmodel will be very less will that influence any links to read further will b appreciated. -- Thanks Regards Sachin Aggarwal 7760502772
Re: solr java.lang.NullPointerException on select queries
Internal Lucene document IDs are signed 32 bit numbers, so having 2.5B docs seems to be just _asking_ for trouble. Which could explain the fact that this just came out of thin air. If you kept adding docs to the problem instance, you wouldn't have changed configs etc, just added more docs I really think it's time to shard. Best Erick On Wed, Jun 20, 2012 at 2:15 AM, avenka ave...@gmail.com wrote: For the first install, I copied over all files in the directory example into, let's call it, install1. I did the same for install2. The two installs run on different ports, use different jar files, are not really related to each other in any way as far as I can see. In particular, they are not multicore. They have the same access control setup via jetty. I did a diff on config files and confirmed that only port numbers are different. Both had been running fine in parallel importing from a common database for several weeks. The documents indexed by install1, the problematic one currently, is a vastly bigger (~2.5B) superset of those indexed by install2 (~250M). At this point, select queries on install1 incurs the NullPointerException irrespective of whether install2 is running or not. The log file looks like it is indexing normally as always though. The index is also growing at the usual rate each day. Just select queries fail. :( -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990476.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 3 Way Solr Join . . ?
I have a similar situation in my application. I have five different entities. The relationships among entities as follows Protocol -- ( zero or more) Study -- ( zero or more) Patient Protocol -- ( zero or more) Drug Patient -- (zero or more) Study Form -- (zero or many) Study Moreover, all these entities can be exist independently also (as per the requirement of my application). So, I cannot create a document to include all these entities using demoralization. If I need to find out the Drug Name (from Drug entity), Protocol Name (from Protocol entity), Study Name (from Study entity), Patient Name (from Patient entity) and Form Name ( from Form entity) based on Drug Batch Number (from Drug entity) I passed. Using Join in Solr, I can get either child or parent not from both. What is the best way to index the data in Solr? Do I need to create separate indices for each entity or single one for all -- View this message in context: http://lucene.472066.n3.nabble.com/3-Way-Solr-Join-tp3815979p3990515.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr java.lang.NullPointerException on select queries
Erick, thanks for pointing that out. I was going to say in my original post that it is almost like some limit on max documents got violated all of a sudden, but the rest of the symptoms didn't seem to quite match. But now that I think about it, the problem probably happened at 2B (corresponding exactly to the size of the signed int space) as my ID space in the database has roughly 85% holes and the problem probably happened when the ID hit around 2.4B. It is still odd that indexing appears to proceed normally and the select queries know which IDs are used because the error happens only for queries with non-empty results, e.g., searching for an ID that doesn't exist gives a valid 0 numResponses response. Is this because solr uses 'long' or more for indexing (given that the schema supports long) but not in the querying modules? I hadn't used solr sharding because I really needed rolling partitions, where I keep a small index of recent documents and throw the rest into a slow archive index. So maintaining the smaller instance2 (usually 50M) and replicating it if needed was my homebrewed sharding approach. But I guess it is time to shard the archive after all. AV -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990534.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Schema / Config Error?
As I understand, James is not upgrading, but trying to start a fresh downloaded 3.6.0. James, can you provide some more details, especially, which AppServer are you using, how did you start Solr... Can you copy/paste the error msg from your log files? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 6. juni 2012, at 13:33, Jack Krupansky wrote: Read CHANGES.txt carefully, especially the section entitled Upgrading from Solr 3.5. For example, * As of Solr 3.6, the indexDefaults and mainIndex sections of solrconfig.xml are deprecated and replaced with a new indexConfig section. Read more in SOLR-1052 below. If you simply copied your schema/config directly, unchanged, then this could be the problem. You may need to compare your schema/config line-by-line to the new 3.6 schema/config for any differences. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Wednesday, June 06, 2012 6:57 AM To: solr-user@lucene.apache.org Subject: Re: Schema / Config Error? That implies one of two things: 1 you changed solr.xml. I'd go back to the original and re-edit anything you've changed 2 you somehow got a corrupted download. Try blowing your installation away and getting a new copy Because it works perfectly for me. Best Erick On Wed, Jun 6, 2012 at 4:14 AM, Spadez james_will...@hotmail.com wrote: Hi, I installed a fresh copy of Solr 3.6.0 or my server but I get the following page when I try to access Solr: http://176.58.103.78:8080/solr/ It says errors to do with my Solr.xml. This is my solr.xml: I really cant figure out how I am meant to fix this, so if anyone is able to give some input I would really appreciate it. James -- View this message in context: http://lucene.472066.n3.nabble.com/Schema-Config-Error-tp3987923.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr java.lang.NullPointerException on select queries
Let's make sure we're talking about the same thing. Solr happily indexes and stores long (64) bit values, no problem. What it doesn't do is assign _internal_ documents IDs as longs, those are ints. on admin/statistics, look at maxDocs and numDocs. maxDocs +1 will be the next _internal_ lucene doc id assigned, so if that's wonky or 2B, this is where the rub happens. BTW, the difference between numDocs and maxDocs is the number of documents deleted from your index. If your number of current documents is much smaller than 2B, you can get maxDocs to equal numDocs if you optimize, and get yourself some more headroom. whether your index will be OK I'm not prepared to guarantee though... But if I'm reading your notes correctly, the 85% holes applies to a value in your document, and has nothing to do with the internal lucene ID issue. But internally, the int limit isn't robustly enforced, so I'm not surprised that it pops out (if, indeed, this is your problem) in odd places. Best Erick On Wed, Jun 20, 2012 at 10:02 AM, avenka ave...@gmail.com wrote: Erick, thanks for pointing that out. I was going to say in my original post that it is almost like some limit on max documents got violated all of a sudden, but the rest of the symptoms didn't seem to quite match. But now that I think about it, the problem probably happened at 2B (corresponding exactly to the size of the signed int space) as my ID space in the database has roughly 85% holes and the problem probably happened when the ID hit around 2.4B. It is still odd that indexing appears to proceed normally and the select queries know which IDs are used because the error happens only for queries with non-empty results, e.g., searching for an ID that doesn't exist gives a valid 0 numResponses response. Is this because solr uses 'long' or more for indexing (given that the schema supports long) but not in the querying modules? I hadn't used solr sharding because I really needed rolling partitions, where I keep a small index of recent documents and throw the rest into a slow archive index. So maintaining the smaller instance2 (usually 50M) and replicating it if needed was my homebrewed sharding approach. But I guess it is time to shard the archive after all. AV -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990534.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexation Speed?
Little question please: I have directories with around 30 files of 40Mo with around 17 000 doc for each files. is it better to index: - file by file with java -jar 1.xml, java -jar 2.xml, etc or - all at the same time with java -jar *.xml All files are verified, so my question is just concerning speed Thx for your comments, Bruno Le 20/06/2012 05:44, Lance Norskog a écrit : M. Della Bitta is right- we're not talking about post.jar, but starting Solr: java -xMx300m -jar start.jar On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson erickerick...@gmail.com wrote: Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems like it defaults to Integer.MAX_VALUE, so you're fine And it's all deprecated in 4.x, will be gone Best Erick On Tue, Jun 19, 2012 at 7:07 AM, Bruno Manninabmann...@free.fr wrote: Actually -Xmx512m and no effect Concerning maxFieldLength, no problem it's commented Le 19/06/2012 13:02, Erick Erickson a écrit : Then try -Xmx600M next try -Xmx900M etc. The idea is to bump things on separate runs. But be a little cautious here. Look in your solrconfig.xml file, you'll see a commented-out line maxFieldLength1/maxFieldLength The default behavior for Solr/Lucene is to index the first 10,000 tokens (not characters, think of tokens as words for not) in each document and throw the rest on the floor. At the sizes you're talking about, that's probably not a problem, but do be aware of it. Best Erick On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.frwrote: Like that? java -Xmx300m -jar post.jar myfile.xml Le 19/06/2012 11:11, Lance Norskog a écrit : Ah! Java memory size is a java command line option: http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html You would try increasing the memory size in stages up to maybe 300m. On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr wrote: Le 19/06/2012 10:51, Lance Norskog a écrit : 675 doc/s is respectable for that server. You might move the memory allocated to Java up and down- there is a balance between amount of memory in Java v.s. the OS disk buffer. How can I do that ? is there an option during my command line or in a config file? sorry for this newbie question :( And, of course, use the latest trunk. Solr 3.6 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr wrote: Correction: file size is 40 Mo !!! Le 19/06/2012 09:09, Bruno Mannina a écrit : Dear All, I would like to know if the indexation speed is right. I have a 40Go file size with around 27 000 docs inside. I index around 20 fields, My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go Ram The file takes 40 seconds with the command line: java -jar post.jar myfile.xml Could I increase this speed or reduce this time? Thanks a lot, PS: Newbie user
Re: solr java.lang.NullPointerException on select queries
Yes, wonky indeed. numDocs : -2006905329 maxDoc : -1993357870 And yes, I meant that the holes are in the database auto-increment ID space, nothing to do with lucene IDs. I will set up sharding. But is there any way to retrieve most of the current index? Currently, all select queries even in ranges in the hundreds of millions return the NullPointerException. It would suck to lose all of this. :( -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990542.html Sent from the Solr - User mailing list archive at Nabble.com.
Malay Language Detection
Hi, We are using http://code.google.com/p/language-detection/ along with Solr for language detection, but it seems that the following jar doesn't have support for Malay detection. So, I created the profile for malay which is used by the jar, this works in local test environment, but I don't know how to get it to work with Solr. Has anyone else worked on this earlier? Regards, Rohit
How to import this Json-line by DIH?
-- View this message in context: http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544.html Sent from the Solr - User mailing list archive at Nabble.com.
solrj and replication
hi, i was just wondering if i need to do smth special if i want to have an embedded slave to get replication working ? my setup is like so: - in my clustered application that uses embedded solr(j) (for performance). the cores are configured as slaves that should connect to a master which runs in a jetty. - the embedded codes dont expose any of the solr servlets note: that the slave config, if started in jetty, does proper replication, while when embedded it doesnt. using solr 3.5 thx tom
Re: Indexation Speed?
I doubt you'll find any significant difference in indexing speed. But the post.jar file is really intended as a demo program to quickly get the examples working. It was never intended to be a production-ready program. I'd think about using something like SolrJ etc. to index the docs. And I'm assuming your documents are in the approved Solr format, somthing like add doc field name=myfieldvalue for field/field . . /doc doc . . . /doc /add solr will not index arbitrary XML. If you're trying to do this, you'll need to transform your arbitrary XML into the above format, consider SolrJ or something like that in this case. Best Erick On Wed, Jun 20, 2012 at 10:40 AM, Bruno Mannina bmann...@free.fr wrote: Little question please: I have directories with around 30 files of 40Mo with around 17 000 doc for each files. is it better to index: - file by file with java -jar 1.xml, java -jar 2.xml, etc or - all at the same time with java -jar *.xml All files are verified, so my question is just concerning speed Thx for your comments, Bruno Le 20/06/2012 05:44, Lance Norskog a écrit : M. Della Bitta is right- we're not talking about post.jar, but starting Solr: java -xMx300m -jar start.jar On Tue, Jun 19, 2012 at 10:05 AM, Erick Erickson erickerick...@gmail.com wrote: Well, it _used_ to be defaulted in the code, but on looking at 3.6 it's seems like it defaults to Integer.MAX_VALUE, so you're fine And it's all deprecated in 4.x, will be gone Best Erick On Tue, Jun 19, 2012 at 7:07 AM, Bruno Manninabmann...@free.fr wrote: Actually -Xmx512m and no effect Concerning maxFieldLength, no problem it's commented Le 19/06/2012 13:02, Erick Erickson a écrit : Then try -Xmx600M next try -Xmx900M etc. The idea is to bump things on separate runs. But be a little cautious here. Look in your solrconfig.xml file, you'll see a commented-out line maxFieldLength1/maxFieldLength The default behavior for Solr/Lucene is to index the first 10,000 tokens (not characters, think of tokens as words for not) in each document and throw the rest on the floor. At the sizes you're talking about, that's probably not a problem, but do be aware of it. Best Erick On Tue, Jun 19, 2012 at 5:44 AM, Bruno Manninabmann...@free.fr wrote: Like that? java -Xmx300m -jar post.jar myfile.xml Le 19/06/2012 11:11, Lance Norskog a écrit : Ah! Java memory size is a java command line option: http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html You would try increasing the memory size in stages up to maybe 300m. On Tue, Jun 19, 2012 at 2:04 AM, Bruno Manninabmann...@free.fr wrote: Le 19/06/2012 10:51, Lance Norskog a écrit : 675 doc/s is respectable for that server. You might move the memory allocated to Java up and down- there is a balance between amount of memory in Java v.s. the OS disk buffer. How can I do that ? is there an option during my command line or in a config file? sorry for this newbie question :( And, of course, use the latest trunk. Solr 3.6 On Tue, Jun 19, 2012 at 12:10 AM, Bruno Manninabmann...@free.fr wrote: Correction: file size is 40 Mo !!! Le 19/06/2012 09:09, Bruno Mannina a écrit : Dear All, I would like to know if the indexation speed is right. I have a 40Go file size with around 27 000 docs inside. I index around 20 fields, My (old) test server is a DualCore 3.06GHz Intel Xeon with only 1Go Ram The file takes 40 seconds with the command line: java -jar post.jar myfile.xml Could I increase this speed or reduce this time? Thanks a lot, PS: Newbie user
Re: solr java.lang.NullPointerException on select queries
That indeed sucks. But I don't personally know of a good way to try to split apart an existing index into shards. I'm afraid you're going to be stuck with re-indexing Wish I had a better solution Erick On Wed, Jun 20, 2012 at 10:45 AM, avenka ave...@gmail.com wrote: Yes, wonky indeed. numDocs : -2006905329 maxDoc : -1993357870 And yes, I meant that the holes are in the database auto-increment ID space, nothing to do with lucene IDs. I will set up sharding. But is there any way to retrieve most of the current index? Currently, all select queries even in ranges in the hundreds of millions return the NullPointerException. It would suck to lose all of this. :( -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990542.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr java.lang.NullPointerException on select queries
Thanks. Do you know if the tons of index files with names like '_zxt.tis' in the index/data/ directory have the lucene IDs embedded in the binaries? The files look good to me and are partly readable even if in binary. I am wondering if I could just set up a new solr instance and move these index files there and hope to use them (or most of them) as is without shards? If so, I will just set up a separate sharded index for the documents indexed henceforth, but won't bother splitting the huge existing index. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990560.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr java.lang.NullPointerException on select queries
Don't even try to do that. First of all, you have to have a reliable way to index the same docs to the same shards. The docs are all mixed up in the segment files and would lead to chaos. Solr/Lucene report the same doc multiple times if it's indifferent shards, so if you ever updated a document, you wouldn't know what shard to send it to. Second, the segments are all parts of a single index, and Solr expects (well, actually Lucene) expects them to be consistent. Putting some on one shard and some on another would probably not allow Solr to start (but I confess I've never tried that). So I really wouldn't even try to go there. Best Erick On Wed, Jun 20, 2012 at 12:35 PM, avenka ave...@gmail.com wrote: Thanks. Do you know if the tons of index files with names like '_zxt.tis' in the index/data/ directory have the lucene IDs embedded in the binaries? The files look good to me and are partly readable even if in binary. I am wondering if I could just set up a new solr instance and move these index files there and hope to use them (or most of them) as is without shards? If so, I will just set up a separate sharded index for the documents indexed henceforth, but won't bother splitting the huge existing index. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990560.html Sent from the Solr - User mailing list archive at Nabble.com.
write.lock
I'm running Solr 3.4. The past 2 months I've been getting a lot of write.lock errors. I switched to the simple lockType (and made it clear the lock on restart), but my index is still locking up a few times a week. I can't seem to determine what is causing the locks -- does anyone out there have any ideas/experience as to what is causing the locks, and what config changes that I can make in order to prevent the lock? Any help would be very appreciated! -- Chris
Help with Solr File Based spell check
Hi, We are trying to implement file based search in our application using Solr 1.4. This is the code we have written -http://lllydevvm02.sixfeetup.com:8983/solr/admin/file/?file=solrconfig.xml searchComponent name=spellcheck class=solr.SpellCheckComponent -http://lllydevvm02.sixfeetup.com:8983/solr/admin/file/?file=solrconfig.xml lst name=spellchecker str name=namedefault/str str name=classnamesolr.FileBasedSpellChecker/str str name=sourceLocation/usr/home/lilly/sixfeetup/projects/alm-buildout/etc/solr/spelling.txt/str str name=spellcheckIndexDir./filespellchecker/str str name=accuracy0.7/str /lst str name=queryAnalyzerFieldTypetext/str /searchComponent We are facing a issue and need your help on the same. When the user searches for a word medicine, which is a correct word and is present in the dictionary. We still get a suggestion medicines from dictionary. We only want suggestion if the word is incorrectly spelled or is not included in the dictionary. Can you please provide some suggestions. Regards, Sanjay Dua
Re: LanguageDetection inside of ExtractingRequestHandler
Hi, In my opinion, instead of hardcoding such functionality into multiple request handlers, we should go the opposite direction - modularization, factoring out Tika extraction into its own UpdateProcessor (https://issues.apache.org/jira/browse/SOLR-1763). Then the ExtractingRequestHandler would eventually go away, and you could use it and language detection with any Request Handler you choose, including XML and DIH... -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. juni 2012, at 17:10, Martin Ruckli wrote: Hi all, I just wanted to check if there is a demand for this feature. I had to implement this functionality for one of our customers and would like to contribute it. Here is the use case: We are using the ExtractingRequestHandler with the extractOnly=true flag set. With a request to this handler we get the content of a posted document like we want to. We would also like to detect the language and return it as a metadata field in the response from solr. As there is already support for LanguageDetection based on tika integrated into solr, the only thing what I did was add a new param to enable or disable this feature and then do the language detection nearly the same way as it is done in the TikaLanguageIdentifierUpdateProcessor I think this would be a nice addition, especially in the extractOnly mode. What are your thoughts on this? Cheers Martin
Exception using distributed field-collapsing
I am doing a search on three shards with identical schemas (I double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is giving me back the exception listed at the bottom of this email: Other information: My schema uses the following field types: StrField, DateField, TrieDateField, TextField, SortableInt, SortableLong, BoolField My query looks like this (I’ve messed with it to anonymize but, I hope, kept the essentials: http://[solr core2] /select/?start=0rows=25q={!qsol}machinessort=[sort field] fl=[list of fields] shards=[solr core1]%2c[solr core2]%2c[solr core3]group=truegroup.field=[group field] java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.String at org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844) at org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180) at org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154) at java.util.TreeMap.put(TreeMap.java:547) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222) at org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285) at org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340) at org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77) at org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) Any thoughts or advice? Thanks, -- Bryan
Re: Exception using distributed field-collapsing
Hi Bryan, What is the fieldtype of the groupField? You can only group by field that is of type string as is described in the wiki: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters When you group by another field type a http 400 should be returned instead if this error. At least that what I'd expect. Martijn On 20 June 2012 20:37, Bryan Loofbourrow bloofbour...@knowledgemosaic.com wrote: I am doing a search on three shards with identical schemas (I double-checked!), using the group feature, and Solr/Lucene 3.5. Solr is giving me back the exception listed at the bottom of this email: Other information: My schema uses the following field types: StrField, DateField, TrieDateField, TextField, SortableInt, SortableLong, BoolField My query looks like this (I’ve messed with it to anonymize but, I hope, kept the essentials: http://[solr core2] /select/?start=0rows=25q={!qsol}machinessort=[sort field] fl=[list of fields] shards=[solr core1]%2c[solr core2]%2c[solr core3]group=truegroup.field=[group field] java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.String at org.apache.lucene.search.FieldComparator$StringOrdValComparator.compareValues(FieldComparator.java:844) at org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:180) at org.apache.lucene.search.grouping.SearchGroup$GroupComparator.compare(SearchGroup.java:154) at java.util.TreeMap.put(TreeMap.java:547) at java.util.TreeSet.add(TreeSet.java:255) at org.apache.lucene.search.grouping.SearchGroup$GroupMerger.updateNextGroup(SearchGroup.java:222) at org.apache.lucene.search.grouping.SearchGroup$GroupMerger.merge(SearchGroup.java:285) at org.apache.lucene.search.grouping.SearchGroup.merge(SearchGroup.java:340) at org.apache.solr.search.grouping.distributed.responseprocessor.SearchGroupShardResponseProcessor.process(SearchGroupShardResponseProcessor.java:77) at org.apache.solr.handler.component.QueryComponent.handleGroupedResponses(QueryComponent.java:565) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:548) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:289) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:679) Any thoughts or advice? Thanks, -- Bryan -- Met vriendelijke groet, Martijn van Groningen
RE: Exception using distributed field-collapsing
Hi Bryan, What is the fieldtype of the groupField? You can only group by field that is of type string as is described in the wiki: http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters When you group by another field type a http 400 should be returned instead if this error. At least that what I'd expect. Martijn Martijn, The group-by field is a string. I have been unable to figure how a date comes into the picture at all, and have basically been wondering if there is some problem in the grouping code that misaligns the field values from different results in the group, so that it is not comparing like with like. Not a strong theory, just the only thing I can think of. -- Bryan
Re: Indexation Speed?
Hi Erick, I doubt you'll find any significant difference in indexing speed. But the post.jar file is really intended as a demo program to quickly get the examples working. It was never intended to be a production-ready program. I'd think about using something like SolrJ etc. to index the docs. ah?! I don't know yet SolrJ :( I need to know how to program in java? I transformed all my xml source files to the xml structure below and I'm using post.jar I thought it was (post.jar) a standard tool to index docs. And I'm assuming your documents are in the approved Solr format, somthing like add doc field name=myfieldvalue for field/field . . /doc doc . . . /doc /add Yes all my xml docs have this format. solr will not index arbitrary XML. If you're trying to do this, you'll need to transform your arbitrary XML into the above format, consider SolrJ or something like that in this case. If all my xml docs are in the xml structure above, is it necessary to use SolrJ ?
RE: How to import this Json-line by DIH?
Hi jueljust, Nabble removed the entire content of your email before sending it to the mailing list. Maybe use a different service that doesn't throw away your message? Steve From: jueljust [juelj...@gmail.com] Sent: Wednesday, June 20, 2012 10:56 AM To: solr-user@lucene.apache.org Subject: How to import this Json-line by DIH? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-import-this-Json-line-by-DIH-tp3990544.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexation Speed?
I think it's a bit of an it depends on whether post.jar is the Right choice for production. It -is- SolrJ inside after all, Erick :) and it's pretty much the same as using curl. Just be sure you control commits as needed. Erik On Jun 20, 2012, at 15:18, Bruno Mannina bmann...@free.fr wrote: Hi Erick, I doubt you'll find any significant difference in indexing speed. But the post.jar file is really intended as a demo program to quickly get the examples working. It was never intended to be a production-ready program. I'd think about using something like SolrJ etc. to index the docs. ah?! I don't know yet SolrJ :( I need to know how to program in java? I transformed all my xml source files to the xml structure below and I'm using post.jar I thought it was (post.jar) a standard tool to index docs. And I'm assuming your documents are in the approved Solr format, somthing like add doc field name=myfieldvalue for field/field . . /doc doc . . . /doc /add Yes all my xml docs have this format. solr will not index arbitrary XML. If you're trying to do this, you'll need to transform your arbitrary XML into the above format, consider SolrJ or something like that in this case. If all my xml docs are in the xml structure above, is it necessary to use SolrJ ?
Re: solr java.lang.NullPointerException on select queries
Erick, thanks for the advice, but let me make sure you haven't misunderstood what I was asking. I am not trying to split the huge existing index in install1 into shards. I am also not trying to make the huge install1 index as one shard of a sharded solr setup. I plan to use a sharded setup only for future docs. I do want to avoid trying to re-index the docs in install1 and think of them as a slow tape archive index server if I ever need to go and query the past documents. So I was wondering if I could somehow use the existing segment files to run an isolated (unsharded) solr server that lets me query roughly the first 2B docs before the wraparound problem happened. If the negative internal doc IDs have pervasively corrupted the segment files, this would not be possible, but I am not able to imagine an underlying lucene design that would cause such a problem. Is my only option to re-index the past 2B docs if I want to be able to query them at this point or is there any way to use the existing segment files? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990615.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Apache Lucene Eurocon 2012
Hello Mikhail- Your mail did not come through. Hope things are well, Lance Norskog Lucid Imagination On Wed, Jun 20, 2012 at 11:16 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: up -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Lance Norskog goks...@gmail.com
Re: Editing solr update handler sub class
Can anybody tell me where are the lucene jar files org.apache.lucene.index and org.apache.lucene.search located? Thanks Shameema On Wed, Jun 20, 2012 at 4:44 PM, Shameema Umer shem...@gmail.com wrote: Hi, I decompiled DirectUpdateHandler2.class to .java file and edited it to suit my requirement to stop overwriting duplicates(I needed the first fetched tstamp). But when I tried to compile it to .class file, it shows 91 errors. Am I wrong anywhere? I am new to java application but fluent in web languages. Please help. Thanks Shameema
Re: Editing solr update handler sub class
Hi, Jar file are located in dist folder . check ur dist folder or you can check your solrconfig.xml file where you will get jar location path. On Thu, Jun 21, 2012 at 9:47 AM, Shameema Umer shem...@gmail.com wrote: Can anybody tell me where are the lucene jar files org.apache.lucene.index and org.apache.lucene.search located? Thanks Shameema On Wed, Jun 20, 2012 at 4:44 PM, Shameema Umer shem...@gmail.com wrote: Hi, I decompiled DirectUpdateHandler2.class to .java file and edited it to suit my requirement to stop overwriting duplicates(I needed the first fetched tstamp). But when I tried to compile it to .class file, it shows 91 errors. Am I wrong anywhere? I am new to java application but fluent in web languages. Please help. Thanks Shameema
Re: parameters to decide solr memory consumption
thanks for help hey I tried some exercise I m storing schema (uuid,key, userlocation) uuid and key are unique and user location have cardinality as 150 uuid and key are stored and indexed while userlocation is indexed not stored. still the index directory size is 51 MB just for 200,000 records don't u think its not optimal what if i go for billions of records. -- Thanks Regards Sachin Aggarwal 7760502772