can solr admin tab statistics be customized... how can this be achived.
Hi, I want to compute my own stats in addition to solr default stats. How can i enhance statistics in solr? How this thing can be achieved.. Solr compute stats as cumulative, is there is any way to get per instant stats...?? Thanks... waiting for good replies.. -- View this message in context: http://lucene.472066.n3.nabble.com/can-solr-admin-tab-statistics-be-customized-how-can-this-be-achived-tp3996128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: help: I always get NULL with row.get(columnName)
anyone knows? On Thu, Jul 19, 2012 at 5:48 PM, Roy Liu wrote: > Hi, > > When I use Transformer to handle files, I always get NULL with > row.get(columnName). > anyone knows? > > -- > The following file is *data-config.xml* > > > name="ds" > driver="oracle.jdbc.driver.OracleDriver" > url="jdbc:oracle:thin:@10.1.1.1:1521:sid" > user="username" > password="pwd" > /> > > > query="select a.objid as ID from DOCGENERAL a where > a.objid=14154965"> > > > > * *query="select docid as ID, name as filename, > storepath as filepath from attachment where docid=${report.ID}" * > * transformer="com.bs.solr.BSFileTransformer" >* > * * > * * > * * > * * > > > > > > > > public class *BSFileTransformer *extends Transformer { > private static Log LOGGER = LogFactory.getLog(BSFileTransformer.class); > @Override > public Object transformRow(Map row, Context context) { > // row.get("filename") is always null,but row.get("id") is > OK. > S*ystem.out.println("==filename:"+row.get("filename"));* > > List> fields = context.getAllEntityFields(); > > String id = null; // Entity ID > String fileName = "NONAME"; > for (Map field : fields) { > String name = field.get("name"); > System.out.println("name:" + name); > if ("bs_attachment_id".equals(name)) { > String columnName = field.get("column"); > id = String.valueOf(row.get(columnName)); > } > if ("bs_attachment_name".equals(name)) { > String columnName = field.get("column"); > fileName = (String) row.get(columnName); > } > String isFile = field.get("isfile"); > if ("true".equals(isFile)) { > String columnName = field.get("column"); > String filePath = (String) row.get(columnName); > > try { > System.out.println("fileName:"+ fileName+",filePath: " + filePath); > if(filePath != null){ > File file = new File(filePath); > InputStream inputStream = new FileInputStream(file); > Tika tika = new Tika(); > String text = tika.parseToString(inputStream); > row.put(columnName, text); > } > LOGGER.info("Processed File OK! Entity: " + fileName + ", ID: " +id); > } catch (IOException ioe) { > LOGGER.error(ioe.getMessage()); > row.put(columnName, ""); > } catch (TikaException e) { > LOGGER.error("Parse File Error:" + id + ", Error:" > + e.getMessage()); > row.put(columnName, ""); > } > } > } > return row; > } > } >
Re: How to setup SimpleFSDirectoryFactory
Thanks. Are you saying that if we run low on memory, the MMapDirectory will stop using it? The least used memory will be removed from the OS automatically? Isee some paging. Wouldn't paging slow down the querying? My index is 10gb and every 8 hours we get most of it in shared memory. The memory is 99 percent used, and that does not leave any room for other apps. Other implications? Sent from my mobile device 720-256-8076 On Jul 19, 2012, at 9:49 AM, "Uwe Schindler" wrote: > Read this, then you will see that MMapDirectory will use 0% of your Java Heap > space or free system RAM: > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > Uwe > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: William Bell [mailto:billnb...@gmail.com] >> Sent: Tuesday, July 17, 2012 6:05 AM >> Subject: How to setup SimpleFSDirectoryFactory >> >> We all know that MMapDirectory is fastest. However we cannot always use it >> since you might run out of memory on large indexes right? >> >> Here is how I got iSimpleFSDirectoryFactory to work. Just set - >> Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory. >> >> Your solrconfig.xml: >> >> > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/> >> >> You can check it with http://localhost:8983/solr/admin/stats.jsp >> >> Notice that the default for Windows 64bit is MMapDirectory. Else >> NIOFSDirectory except for WIndows It would be nicer if we just set it >> all up >> with a helper in solrconfig.xml... >> >> if (Constants.WINDOWS) { >> if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT) >>return new MMapDirectory(path, lockFactory); >> else >>return new SimpleFSDirectory(path, lockFactory); >> } else { >>return new NIOFSDirectory(path, lockFactory); >> } >> } >> >> >> >> -- >> Bill Bell >> billnb...@gmail.com >> cell 720-256-8076 > >
custom sorter
Hi, I have requirements to place a document to a pre-determined position for special filter query values, for instance when filter query is fq=(field1:"xyz") place document abc as first result (the rest of the result set will be ordered by sort=field2). I guess I have to plug in my Java code as a custom sorter. I'd appreciate it if someone can shed light on this (how to add custom sorter, etc.) TIA.
Re: queryResultCache not checked with fieldCollapsing
: When I run dismax queries I see there are no lookups in the : queryResultCache. If I remove the field collapsing - lookups happen. I : can't find any mention of this anywhere or think of reason why this should I'm not very familiar with the grouping code, but i think the crux of what you are seeing is that when you using grouping, queryResultCache isn't used because it can't be. queryResultCache is a mapping of -> but with grouping you don't have a simple DocList any more, so there is nothing that can go in (or come out of) the cache. There are probably oportunities for other things to be cached when grouping is used (using new SolrCaches) but i'm not sure what/how. : disable caching. I've tried playing with the group.cache.percent parameter group.cache.percent is ... something different. I don't remember how exactly it works (mvg: ping?), but it definitely doesn't affect any usage of the queryResultCache. If your main concern is caching entire requests (ie: query options, facets, filters, sort, grouping, etc...) then i would suggest you consider putting an HTTP cache in front of Solr. -Hoss
Redirecting SolrQueryRequests to another core with Handler
What is the best way to redirect a SolrQueryRequest to another core from within a handler (custom SearchHandler)? I've tried to find the SolrCore of the core I want to redirect to and called the execute() method with the same params but it looks like the SolrQueryRequest object already has the old core name embedded into it! I want to do this without making a new request and going through the servlet etc... * Note that I had to have an empty core with a special name just to do this redirection process in the first place, if there is a better way to proceed with this please let me know too :) Many thanks for any help you can give, Nicholas (incunix)
Re: How to Increase the number of connexion on Solr/Tomcat6?
Hi Bruno, It's usually the maxThreads attribute in the tag in $CATALINA_HOME/conf/server.xml. But I kind of doubt you're running out of threads... maybe you could post some more details about the system you're running Solr on. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jul 19, 2012 at 6:47 PM, Bruno Mannina wrote: > Dear Solr User, > > I don't know if it's here that my question must be posted but I'm sure some > users have already had my problem. > > Actually, I do 1556 requests with 4 Http components with my program. If I do > these requests without delay (500ms) > before sending each requests I have around 10% of requests with empty > answer. If I add delay before each requests I have no empty answer. > > Empty answer has HTTP 200 OK, Header OK but Body = '' > > Where can I increase the limit of Tomcat/Solr requests at the same time or > how can I solve my problem. > > Thanks a lot for your Help, > Bruno
How to Increase the number of connexion on Solr/Tomcat6?
Dear Solr User, I don't know if it's here that my question must be posted but I'm sure some users have already had my problem. Actually, I do 1556 requests with 4 Http components with my program. If I do these requests without delay (500ms) before sending each requests I have around 10% of requests with empty answer. If I add delay before each requests I have no empty answer. Empty answer has HTTP 200 OK, Header OK but Body = '' Where can I increase the limit of Tomcat/Solr requests at the same time or how can I solve my problem. Thanks a lot for your Help, Bruno
Re: Solr 4 Alpha SolrJ Indexing Issue
Thanks Mark! On Thu, Jul 19, 2012 at 4:07 PM, Mark Miller wrote: > https://issues.apache.org/jira/browse/SOLR-3649 > > On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson < > w.briggs.thomp...@gmail.com> wrote: > > > This is unrelated for the most part, but the javabin update request > handler > > does not seem to be working properly when calling solrj > > method*HttpSolrServer.deleteById(List ids) > > *. A single Id gets deleted from the index as opposed to the full list. > It > > appears properly in the logs - shows delete of all Ids sent, although all > > but one remain in the index. > > > > I confirmed that the default update request handler deletes the list > > properly, so this appears to be a problem with > > the BinaryUpdateRequestHandler. > > > > Not an issue for me, just spreading the word. > > > > Thanks, > > Briggs > > > > On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller > > wrote: > > > > > we really need to resolve that issue soon... > > > > > > On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote: > > > > > > > Yury, > > > > > > > > Thank you so much! That was it. Man, I spent a good long while > trouble > > > > shooting this. Probably would have spent quite a bit more time. I > > > > appreciate your help!! > > > > > > > > -Briggs > > > > > > > > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats > wrote: > > > > > > > >> On 7/18/2012 7:11 PM, Briggs Thompson wrote: > > > >>> I have realized this is not specific to SolrJ but to my instance of > > > >> Solr. Using curl to delete by query is not working either. > > > >> > > > >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432 > > > >> > > > > > > - Mark Miller > > > lucidimagination.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com >
Re: Is it possible to alias a facet field?
: > facet.field=testfield&facet.field=%7B!key=mylabel%7Dtestfield&f.mylabel.limit=1 : > : > but the limit on the alias didn't seem to work. Is this expected? : : Per-field params don't currently look under the alias. I believe : there's a JIRA open for this. https://issues.apache.org/jira/browse/SOLR-1351 There's a fairly old patch that covers some of the basics, but there are some tricky edge cases that need to be accounted for and we need good distributed tests to make sure things work properly. -Hoss
RE: solr 4.0 cloud 303 error
I did a search via both admin UI and /search What I searched for was *:* as that was default in the search box in the admin ui (so expected something that was not an 303 error). Will post url and server logs tomorrow when I am back in the office. But i think the admin url was not anything odd. Server logs was full of chatter between the nodes in the cloud setup. From: Chris Hostetter [hossman_luc...@fucit.org] Sent: 19 July 2012 23:03 To: solr-user Subject: Re: solr 4.0 cloud 303 error : > try to do a search - throws 303 error Can you be specific about how exactly you did the search? Was this from the admin UI? what URL was in your browser location bar? what values did you put in the form? what buttons did you click? what URL was in your browser location bar when the error happened? Can you post the logs from each of the servers from arround the time of this error (a few lings of context before it happened as well) : >> org.apache.solr.common.SolrException: Server at : >> http://linux-vckp:8983/solr/collection1 returned non ok status:303, that smells like something jetty *might* be returning automaticly because the client asked for... http://linux-vckp:8983/solr/collection1 ...instead of... http://linux-vckp:8983/solr/collection1/ ... (ie: no trailing slash) ... but i'm not sure why HttpShardHandler would be asking for either of those URLs w/o specifying a handler. -Hoss This email is intended for the addressee(s) named above. It may contain confidential or privileged information and should not be read, copied or otherwise used by any person for whom it was not intended. If you have received this mail in error please contact the sender by return email and delete the email from your system. The Royal National Theatre Upper Ground, London, SE1 9PX www.nationaltheatre.org.uk Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) 20 7452 Registered in England as a company limited by guarantee, number 749504. Registered Charity number 224223 Recipients are advised to apply their own virus checks to this message on delivery.
Re: Count is inconsistent between facet and stats
: So from StatsComponent the count for 'electronics' cat is 3, while : FacetComponent report 14 'electronics'. Is this a bug? : : Following is the field definition for 'cat'. : FYI... https://issues.apache.org/jira/browse/SOLR-3642 (The underlying problem is that the stats.facet feature doesn't work for multivalued fields, and the check that was suppose to return an error in this case was only checking the fieldtype not the field) -Hoss
Re: solr 4.0 cloud 303 error
: > try to do a search - throws 303 error Can you be specific about how exactly you did the search? Was this from the admin UI? what URL was in your browser location bar? what values did you put in the form? what buttons did you click? what URL was in your browser location bar when the error happened? Can you post the logs from each of the servers from arround the time of this error (a few lings of context before it happened as well) : >> org.apache.solr.common.SolrException: Server at : >> http://linux-vckp:8983/solr/collection1 returned non ok status:303, that smells like something jetty *might* be returning automaticly because the client asked for... http://linux-vckp:8983/solr/collection1 ...instead of... http://linux-vckp:8983/solr/collection1/ ... (ie: no trailing slash) ... but i'm not sure why HttpShardHandler would be asking for either of those URLs w/o specifying a handler. -Hoss
Re: solr 4.0 cloud 303 error
Okay - I'll do the same in a bit and report back. On Jul 19, 2012, at 5:23 PM, John-Paul Drawneek wrote: > This is just out of the box. > > All I did was download solr 4 Alpha from the site. > unpack > follow instructions from wiki. > > admin console worked - great > > try to do a search - throws 303 error > > Downloaded nightly build, same issue. > > Also got errors from the other shard with error connecting due to master > throwing 303 errors. > > From: Mark Miller [markrmil...@gmail.com] > Sent: 19 July 2012 22:11 > To: solr-user@lucene.apache.org > Subject: Re: solr 4.0 cloud 303 error > > That's really odd - never seen or heard anything like it. A 303 is what a > server will respond with if you should GET a different URI... > > This won't happen out of the box that I've ever seen...can you tells us > about any customization's you have made? > > On Thu, Jul 19, 2012 at 1:08 PM, John-Paul Drawneek < > jpdrawn...@nationaltheatre.org.uk> wrote: > >> Hi. >> >> playing with the new solrcloud: >> http://wiki.apache.org/solr/SolrCloud >> >> tried alpha + nightly build 19/07/2012 >> >> admin panel works, but select queries fail with: >> >> org.apache.solr.common.SolrException: Server at >> http://linux-vckp:8983/solr/collection1 returned non ok status:303, >> message:See Other at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:376) >> at >> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) >> at >> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165) >> at >> org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at >> java.util.concurrent.FutureTask.run(FutureTask.java:166) at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at >> java.util.concurrent.FutureTask.run(FutureTask.java:166) at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> at java.lang.Thread.run(Thread.java:722) >> >> on all combinations of solrcloud from the example in the wiki. >> >> Is this a known issue? >> >> Had a quick look at the tracker/wiki/google but did not find any reference >> to this. >> >> >> >> >> This email is intended for the addressee(s) named above. It may contain >> confidential or privileged information and should not be read, copied or >> otherwise used by any person for whom it was not intended. >> If you have received this mail in error please contact the sender by >> return email and delete the email from your system. >> >> The Royal National Theatre >> Upper Ground, London, SE1 9PX >> www.nationaltheatre.org.uk >> Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) >> 20 7452 >> Registered in England as a company limited by guarantee, number 749504. >> Registered Charity number 224223 >> >> Recipients are advised to apply their own virus checks to this message on >> delivery. >> >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > > > This email is intended for the addressee(s) named above. It may contain > confidential or privileged information and should not be read, copied or > otherwise used by any person for whom it was not intended. > If you have received this mail in error please contact the sender by return > email and delete the email from your system. > > The Royal National Theatre > Upper Ground, London, SE1 9PX > www.nationaltheatre.org.uk > Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) 20 > 7452 > Registered in England as a company limited by guarantee, number 749504. > Registered Charity number 224223 > > Recipients are advised to apply their own virus checks to this message on > delivery. > > > - Mark Miller lucidimagination.com
RE: solr 4.0 cloud 303 error
This is just out of the box. All I did was download solr 4 Alpha from the site. unpack follow instructions from wiki. admin console worked - great try to do a search - throws 303 error Downloaded nightly build, same issue. Also got errors from the other shard with error connecting due to master throwing 303 errors. From: Mark Miller [markrmil...@gmail.com] Sent: 19 July 2012 22:11 To: solr-user@lucene.apache.org Subject: Re: solr 4.0 cloud 303 error That's really odd - never seen or heard anything like it. A 303 is what a server will respond with if you should GET a different URI... This won't happen out of the box that I've ever seen...can you tells us about any customization's you have made? On Thu, Jul 19, 2012 at 1:08 PM, John-Paul Drawneek < jpdrawn...@nationaltheatre.org.uk> wrote: > Hi. > > playing with the new solrcloud: > http://wiki.apache.org/solr/SolrCloud > > tried alpha + nightly build 19/07/2012 > > admin panel works, but select queries fail with: > > org.apache.solr.common.SolrException: Server at > http://linux-vckp:8983/solr/collection1 returned non ok status:303, > message:See Other at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:376) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at > java.util.concurrent.FutureTask.run(FutureTask.java:166) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at > java.util.concurrent.FutureTask.run(FutureTask.java:166) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > > on all combinations of solrcloud from the example in the wiki. > > Is this a known issue? > > Had a quick look at the tracker/wiki/google but did not find any reference > to this. > > > > > This email is intended for the addressee(s) named above. It may contain > confidential or privileged information and should not be read, copied or > otherwise used by any person for whom it was not intended. > If you have received this mail in error please contact the sender by > return email and delete the email from your system. > > The Royal National Theatre > Upper Ground, London, SE1 9PX > www.nationaltheatre.org.uk > Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) > 20 7452 > Registered in England as a company limited by guarantee, number 749504. > Registered Charity number 224223 > > Recipients are advised to apply their own virus checks to this message on > delivery. > > > -- - Mark http://www.lucidimagination.com This email is intended for the addressee(s) named above. It may contain confidential or privileged information and should not be read, copied or otherwise used by any person for whom it was not intended. If you have received this mail in error please contact the sender by return email and delete the email from your system. The Royal National Theatre Upper Ground, London, SE1 9PX www.nationaltheatre.org.uk Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) 20 7452 Registered in England as a company limited by guarantee, number 749504. Registered Charity number 224223 Recipients are advised to apply their own virus checks to this message on delivery.
Re: solr 4.0 cloud 303 error
That's really odd - never seen or heard anything like it. A 303 is what a server will respond with if you should GET a different URI... This won't happen out of the box that I've ever seen...can you tells us about any customization's you have made? On Thu, Jul 19, 2012 at 1:08 PM, John-Paul Drawneek < jpdrawn...@nationaltheatre.org.uk> wrote: > Hi. > > playing with the new solrcloud: > http://wiki.apache.org/solr/SolrCloud > > tried alpha + nightly build 19/07/2012 > > admin panel works, but select queries fail with: > > org.apache.solr.common.SolrException: Server at > http://linux-vckp:8983/solr/collection1 returned non ok status:303, > message:See Other at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:376) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:165) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:132) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at > java.util.concurrent.FutureTask.run(FutureTask.java:166) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at > java.util.concurrent.FutureTask.run(FutureTask.java:166) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) > > on all combinations of solrcloud from the example in the wiki. > > Is this a known issue? > > Had a quick look at the tracker/wiki/google but did not find any reference > to this. > > > > > This email is intended for the addressee(s) named above. It may contain > confidential or privileged information and should not be read, copied or > otherwise used by any person for whom it was not intended. > If you have received this mail in error please contact the sender by > return email and delete the email from your system. > > The Royal National Theatre > Upper Ground, London, SE1 9PX > www.nationaltheatre.org.uk > Telephone numbers: BOX OFFICE +44 (0) 20 7452 3000, SWITCHBOARD +44 (0) > 20 7452 > Registered in England as a company limited by guarantee, number 749504. > Registered Charity number 224223 > > Recipients are advised to apply their own virus checks to this message on > delivery. > > > -- - Mark http://www.lucidimagination.com
Re: Solr 4 Alpha SolrJ Indexing Issue
https://issues.apache.org/jira/browse/SOLR-3649 On Thu, Jul 19, 2012 at 3:34 PM, Briggs Thompson < w.briggs.thomp...@gmail.com> wrote: > This is unrelated for the most part, but the javabin update request handler > does not seem to be working properly when calling solrj > method*HttpSolrServer.deleteById(List ids) > *. A single Id gets deleted from the index as opposed to the full list. It > appears properly in the logs - shows delete of all Ids sent, although all > but one remain in the index. > > I confirmed that the default update request handler deletes the list > properly, so this appears to be a problem with > the BinaryUpdateRequestHandler. > > Not an issue for me, just spreading the word. > > Thanks, > Briggs > > On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller > wrote: > > > we really need to resolve that issue soon... > > > > On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote: > > > > > Yury, > > > > > > Thank you so much! That was it. Man, I spent a good long while trouble > > > shooting this. Probably would have spent quite a bit more time. I > > > appreciate your help!! > > > > > > -Briggs > > > > > > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats wrote: > > > > > >> On 7/18/2012 7:11 PM, Briggs Thompson wrote: > > >>> I have realized this is not specific to SolrJ but to my instance of > > >> Solr. Using curl to delete by query is not working either. > > >> > > >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432 > > >> > > > > - Mark Miller > > lucidimagination.com > > > > > > > > > > > > > > > > > > > > > > > > > -- - Mark http://www.lucidimagination.com
Re: Reg issue with indexing data from one of the sqlserver DB
Your password has an & in it. Since this is an XML file, you need to turn it into an XML entity, so your password should be entered as: 8ty&2ty=6 Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jul 19, 2012 at 3:54 PM, lakshmi bhargavi wrote: > Hi Team , > > Greetings! > > We are trying to index data from one of the sqlserver DB (2008) but we are > getting the following error on start up > > Jul 19, 2012 2:41:11 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException > at org.apache.solr.core.SolrCore.(SolrCore.java:600) > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:483) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:335) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:219) > at > org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:161) > at > org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:96) > at > org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277) > at > org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258) > at > org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382) > at > org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103) > at > org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4638) > at > org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5294) > at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) > at > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:895) > at > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:871) > at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:615) > at > org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:649) > at > org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1585) > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: org.apache.solr.common.SolrException: FATAL: Could not create > importer. DataImporter config invalid > at > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:124) > at > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:527) > at org.apache.solr.core.SolrCore.(SolrCore.java:594) > ... 23 more > Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: > Exception occurred while initializing context > at > org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:216) > at > org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:108) > at > org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:117) > ... 25 more > Caused by: org.xml.sax.SAXParseException: The entity name must immediately > follow the '&' in the entity reference. > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown > Source) > at > com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown > Source) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown > Source) > at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown > Source) > at com.sun.org.apache.xerces.internal
Re: Solr 4 Alpha SolrJ Indexing Issue
This is unrelated for the most part, but the javabin update request handler does not seem to be working properly when calling solrj method*HttpSolrServer.deleteById(List ids) *. A single Id gets deleted from the index as opposed to the full list. It appears properly in the logs - shows delete of all Ids sent, although all but one remain in the index. I confirmed that the default update request handler deletes the list properly, so this appears to be a problem with the BinaryUpdateRequestHandler. Not an issue for me, just spreading the word. Thanks, Briggs On Thu, Jul 19, 2012 at 9:00 AM, Mark Miller wrote: > we really need to resolve that issue soon... > > On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote: > > > Yury, > > > > Thank you so much! That was it. Man, I spent a good long while trouble > > shooting this. Probably would have spent quite a bit more time. I > > appreciate your help!! > > > > -Briggs > > > > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats wrote: > > > >> On 7/18/2012 7:11 PM, Briggs Thompson wrote: > >>> I have realized this is not specific to SolrJ but to my instance of > >> Solr. Using curl to delete by query is not working either. > >> > >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432 > >> > > - Mark Miller > lucidimagination.com > > > > > > > > > > > >
Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document
Robert, So this is lossy: basically you can think of there being only 256 > possible values. So when you increased the number of terms only > slightly by changing your analysis, this happened to bump you over the > edge rounding you up to the next value. > > more information: > http://lucene.apache.org/core/3_6_0/scoring.html > > http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html Thanks - this was extremely helpful! I had read both sources before but didn't grasp the magnitude of lossy-ness until your pointer and mention of edge-case. Just to help out anybody else who might run in to this, I hacked together a little harness to demonstrate: --- fieldLength: 160, computeNorm: 0.07905694, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 161, computeNorm: 0.07881104, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 162, computeNorm: 0.07856742, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 163, computeNorm: 0.07832605, floatToByte315: 109, byte315ToFloat: 0.078125 fieldLength: 164, computeNorm: 0.07808688, floatToByte315: 108, byte315ToFloat: 0.0625 fieldLength: 165, computeNorm: 0.077849895, floatToByte315: 108, byte315ToFloat: 0.0625 fieldLength: 166, computeNorm: 0.07761505, floatToByte315: 108, byte315ToFloat: 0.0625 --- So my takeaway is that these scores that vary significantly are caused by: 1) a field with lengths right on this boundary between the two analyzer chains 2) the fact that we might be searching for matches from 50+ values to a field with 150+ values, and so the overall score is repeatedly impacted by the otherwise typically insignificant change in fieldNorm value Thanks again, Aaron
RE: Solr Commit not working after delete
Hi Brandan, I am not sure if get whats being suggested. Our delete worked fine, but now no new data is going into the system. Could you please throw some more light. Regards, Rohit -Original Message- From: Brendan Grainger [mailto:brendan.grain...@gmail.com] Sent: 19 July 2012 17:33 To: solr-user@lucene.apache.org Subject: Re: Solr Commit not working after delete You might be running into the same issue someone else had the other day: https://issues.apache.org/jira/browse/SOLR-3432 On Jul 19, 2012, at 1:23 PM, Rohit wrote: > We delete some data from solr, post which solr is not accepting any > commit's. What could be wrong? > > > > We don't see any error in logs or anywhere else. > > > > Regards, > > Rohit > > >
Re: LUCENE-2899 patch, OpenNLPTokenizer compile error
> > and get the following errors: > --- > > [javac] warning: [options] bootstrap class path not set in conjunction > with -source 1.6 > [javac] > /home/swu/newproject/lucene_4x/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:170: > error: method reset in class TokenStream cannot be applied to given types; > [javac] super.reset(input); > [javac] ^ > [javac] required: no arguments > [javac] found: Reader > [javac] reason: actual and formal argument lists differ in length > [javac] reset was renamed to setReader
LUCENE-2899 patch, OpenNLPTokenizer compile error
I am following instruction http://wiki.apache.org/solr/OpenNLP to test OpenNLP, Solr integration 1. pull 4.0 branch from trunk 2. apply patch LUCENE-2899 patch (there are several LUCENE-2899 patch files, I took the one, 385KB, 02/Jul/12 08:05, I should only apply this one, correct ?) 3. ant compile and get the following errors: --- [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /home/swu/newproject/lucene_4x/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:170: error: method reset in class TokenStream cannot be applied to given types; [javac] super.reset(input); [javac] ^ [javac] required: no arguments [javac] found: Reader [javac] reason: actual and formal argument lists differ in length [javac] /home/swu/newproject/lucene_4x/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:168: error: method does not override or implement a method from a supertype [javac] @Override [javac] ^ [javac] 2 errors [javac] 2 warnings BUILD FAILED --- I am running java 1.7. Is patch only work for java 1.6 ?, or I am doing something wrong. Thanks Sam
Re: Solr Commit not working after delete
You might be running into the same issue someone else had the other day: https://issues.apache.org/jira/browse/SOLR-3432 On Jul 19, 2012, at 1:23 PM, Rohit wrote: > We delete some data from solr, post which solr is not accepting any > commit's. What could be wrong? > > > > We don't see any error in logs or anywhere else. > > > > Regards, > > Rohit > > >
Solr Commit not working after delete
We delete some data from solr, post which solr is not accepting any commit's. What could be wrong? We don't see any error in logs or anywhere else. Regards, Rohit
Re: Problem with Solr logging under Jetty
Hello, I have a similar problem, anything new about this issue? My problem is that info logs go to stderr and not stdout, do you have an explanation? For the log level I use the file "logging.properties" with in it only one line setting the level. .level = INFO and I have a configuration file called "jetty-logging.xml" passed at startup of jetty through start.ini for redirect stdout to a file and stderr to another file, the config looks like this: http://www.eclipse.org/jetty/configure.dtd";> /jetty_err.log true 90 GMT /jetty_out.log true 90 GMT Redirecting stdout to Redirecting stderr to Thanks for your help, Remy On 23 November 2011 16:55, Shawn Heisey wrote: > I am having a problem with jdk logging with Solr, using the jetty included > with Solr. > > In jetty.xml, I have the following defined: > > java.util.logging.config.**file > etc/logging.properties > Contents of etc/logging.properties: > == > # Logging level > .level=WARNING > > # Write to a file > handlers = java.util.logging.FileHandler > > # Write log messages in human readable format: > java.util.logging.FileHandler.**formatter = java.util.logging.** > SimpleFormatter > java.util.logging.**ConsoleHander.formatter = java.util.logging.** > SimpleFormatter > > # Log to the log subdirectory, with log files named solr_log-n.log > java.util.logging.FileHandler.**pattern = ./log/solr_log-%g.log > java.util.logging.FileHandler.**append = true > java.util.logging.FileHandler.**count = 10 > java.util.logging.FileHandler.**limit = 10485760 > == > > This actually all seems to work perfectly at first. I changed the logging > level to INFO in the solr admin, and it still seemed to work. Then at some > point it stopped logging to solr_log-0.log and started logging to stderr. > My init script for Solr sends that to a file, but there's no log rotation > on that file and it is overwritten whenever Solr is restarted. > > With the same config, OS version, java version, and everything else I can > think of, my test server is still working, but all of my production servers > aren't. It does seem to be related to changing the log level to INFO in > the gui, but making that change doesn't make it fail right away. > > What information can I provide to help troubleshoot this? > > Thanks, > Shawn > >
Re: Importing data to Solr
First, turn off all your soft commit stuff, that won't help in your situation. If you do leave autocommit on, make it a really high number (let's say 1,000,000 to start). You won't have to make 300M calls, you can batch, say, 1,000 docs into each request. DIH supports a bunch of different data sources, take a look at: http://wiki.apache.org/solr/DataImportHandler, the EntityProcessor, DataSource and the like. There is also the CSV update processor, see: http://wiki.apache.org/solr/UpdateCSV. It might be better to, say, break up your massive file into N CSV files and import those. Best Erick On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier wrote: > Hello, > > I was wondering if there's other ways to import data in Solr than > posting xml/json/csv to the server URL (e.g. locally building the > index). Is the DataImporter only for database? > > My data is in an enormous text file that is parsed in python, I get > clean json/xml out of it if I want, but the thing is that it drills > down to about 300 millions "documents", so I don't want to execute 300 > millions http post in a for loop, even with relaxed soft commits etc > it will take weeks, months to populate the index. > > I need to do that only once on an offline server and never add data > back to the index (e.g. becomes a read-only instance). > > Any temporary index configuration I could have to populate the server > with optimal add speed, then turn back the settings optimized for a > read only instance? > > Thanks! > > -- > jonatan
Re: Importing data to Solr
Hi Jonatan, Ideally you'd use a Solr API client that allowed batched updates, so you'd be sending documents 100 at a time, say. Alternatively, if you're good with Java, you could build an index by using the EmbeddedSolrServer class in the same process as the code you use to parse the documents. But if your Solr API client is using batches and multiple connections, I'm not sure if the tradeoff is worth it. Also, there are some various efforts out there to build indexes in Hadoop, but I don't believe any of them are 100% production ready (would like to be proven wrong.) Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier wrote: > Hello, > > I was wondering if there's other ways to import data in Solr than > posting xml/json/csv to the server URL (e.g. locally building the > index). Is the DataImporter only for database? > > My data is in an enormous text file that is parsed in python, I get > clean json/xml out of it if I want, but the thing is that it drills > down to about 300 millions "documents", so I don't want to execute 300 > millions http post in a for loop, even with relaxed soft commits etc > it will take weeks, months to populate the index. > > I need to do that only once on an offline server and never add data > back to the index (e.g. becomes a read-only instance). > > Any temporary index configuration I could have to populate the server > with optimal add speed, then turn back the settings optimized for a > read only instance? > > Thanks! > > -- > jonatan
Importing data to Solr
Hello, I was wondering if there's other ways to import data in Solr than posting xml/json/csv to the server URL (e.g. locally building the index). Is the DataImporter only for database? My data is in an enormous text file that is parsed in python, I get clean json/xml out of it if I want, but the thing is that it drills down to about 300 millions "documents", so I don't want to execute 300 millions http post in a for loop, even with relaxed soft commits etc it will take weeks, months to populate the index. I need to do that only once on an offline server and never add data back to the index (e.g. becomes a read-only instance). Any temporary index configuration I could have to populate the server with optimal add speed, then turn back the settings optimized for a read only instance? Thanks! -- jonatan
Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document
On Thu, Jul 19, 2012 at 11:11 AM, Aaron Daubman wrote: > Apologies if I didn't clearly state my goal/concern: I am not looking for > the exact same scoring - I am looking to explain scoring differences. > Deprecated components will eventually go away, time moves on, etc... > etc... I would like to be able to run current code, and should be able to - > the part that is sticking is being able to *explain* the difference in > results. > OK: i totally missed that, sorry! to explain why you see such a large difference: The difference is that these length normalizations are computed at index time and fit inside a *single byte* by default. This is to keep ram usage low for many documents and many fields with norms (since its #fieldsWithNorms * #documents in bytes in ram). So this is lossy: basically you can think of there being only 256 possible values. So when you increased the number of terms only slightly by changing your analysis, this happened to bump you over the edge rounding you up to the next value. more information: http://lucene.apache.org/core/3_6_0/scoring.html http://lucene.apache.org/core/3_6_0/api/core/org/apache/lucene/search/Similarity.html by the way: if you don't like this: 1. if you can still live with a single byte, maybe plug in your own Similarity class into 3.6, overriding decodeNormValue/encodeNormValue. For example, you could use a different SmallFloat configuration that has less range but more precision for your use case (if your docs are all short or whatever) 2. otherwise, if you feel you need more than a single byte, check out 4.0-ALPHA: you arent limited to a single byte there. -- lucidimagination.com
RE: How to setup SimpleFSDirectoryFactory
Read this, then you will see that MMapDirectory will use 0% of your Java Heap space or free system RAM: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: William Bell [mailto:billnb...@gmail.com] > Sent: Tuesday, July 17, 2012 6:05 AM > Subject: How to setup SimpleFSDirectoryFactory > > We all know that MMapDirectory is fastest. However we cannot always use it > since you might run out of memory on large indexes right? > > Here is how I got iSimpleFSDirectoryFactory to work. Just set - > Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory. > > Your solrconfig.xml: > > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/> > > You can check it with http://localhost:8983/solr/admin/stats.jsp > > Notice that the default for Windows 64bit is MMapDirectory. Else > NIOFSDirectory except for WIndows It would be nicer if we just set it all > up > with a helper in solrconfig.xml... > > if (Constants.WINDOWS) { > if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT) > return new MMapDirectory(path, lockFactory); > else > return new SimpleFSDirectory(path, lockFactory); > } else { > return new NIOFSDirectory(path, lockFactory); > } > } > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076
Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document
Robert, > I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as > > identically as possible (given deprecations) and indexing the same > document. > > Why did you do this? If you want the exact same scoring, use the exact > same analysis. > This means specifying luceneMatchVersion = 2.9, and the exact same > analysis components (even if deprecated). > > > I have taken the field values for the example below and run them > > through /admin/analysis.jsp on each solr instance. Even for the > problematic > > docs/fields, the results are almost identical. For the example below, the > > t_tag values for the problematic doc: > > 1.4.1: 162 values > > 3.6.0: 164 values > > > > This is why: you changed your analysis. > Apologies if I didn't clearly state my goal/concern: I am not looking for the exact same scoring - I am looking to explain scoring differences. Deprecated components will eventually go away, time moves on, etc... etc... I would like to be able to run current code, and should be able to - the part that is sticking is being able to *explain* the difference in results. As you can see from my email, after running the different analysis on the input, the output does not demonstrate (in any way that I can see) why the fieldNorm values would be so different. Even with the different analysis, the results are almost identical - which *should* result in an almost identical fieldNorm??? Again, the desire is not to be the same, it is to understand the difference. Thanks, Aaron
Re: Solr faceting -- sort order
Maybe I'm not understanding the problem, but I accomplish this by having two fields. One for sorting, like so: And then a string type field for faceting. Use a copyField directive to get the same data in both, and then sort on the sort field, and facet on the string field. The MappingCharFilterFactory removes accents for sorting, so you don't have to worry about accented characters sorting out of order. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jul 19, 2012 at 4:37 AM, Toke Eskildsen wrote: > On Wed, 2012-07-18 at 20:30 +0200, Christopher Gross wrote: >> When I do a query, the results that come through retain their original >> case for this field, like: >> doc 1 >> keyword: Blah Blah Blah >> doc 2 >> keyword: Yadda Yadda Yadda >> >> But when I pull back facets, i get: >> >> blah blah blah (1) >> yadda yadda yadda (1) > > Yes. The results from your query are the stored values, while the > results from your facets are the indexed ones. That's the way faceting > works with Solr. > > Technically there is nothing wrong with writing a faceting system that > uses the stored values. We did this some years back, but abandoned the > idea. As far as I remember, it was a lot slower to initialize the > internal structures this way. One could also do faceting fully at search > time, by iterating all the documents and requesting the stored value for > each of them directly from the index, but that would be very slow. > >> I was attempting to fix a sorting problem -- keyword "" would show >> up after keyword "Zulu" due to the "index" sorting, so I thought that >> I could lowercase it all to have it be in the same order. But now it >> is all in lower case, and I'd like it to retain the original style. > > Currently the lowercase trick is the only solution for plain Solr and > even that only works as long as your field holds only a-z letters. So no > foreign names or such. > > Looking forward, one solution would be to specify a custom codec for the > facet field, where the comparator used for sorting is backed by a > Collator that sorts the terms directly, instead of using CollatorKeys. > It would be a bit slower for index updates, but should do what you > require. Unfortunately I am not aware of anyone who has created such a > codex or even how easy it is to get it to work with Solr (4.0 alpha). > > We have experimented with a faceting approach that allows for custom > ordering, but it sorts upon index open and thus has a fairly long start > up time. Besides, it it not in a proper state for production: > https://issues.apache.org/jira/browse/SOLR-2412 > > - Toke Eskildsen, State and University Library, Denmark >
Re: Solr grouping / facet query
Thanks for the reply. To clarify, the idea is to search for authors with certain specialties (eg. political, horror, etc.) and if they have any published titles relevant to the user's query, then display those titles next to the author's name. At first, I thought it would be great to have all the author's data (name, location, bio, titles with descriptions, etc) all in one document. Each title and description being a multivalued field, however, I have no idea how the "relevant titles" based on the user's query as described above can be quickly picked from within the document and displayed. The only solution I see is to have a doc per title and include the name, location, bio, etc in each one. As for the author's with no published titles, simply add their bio data to a document with no title or description and when I do the "grouping" check to see if the title is blank, then display "no titles found". This could work, though I'm concerned if having all that duplicate bio data will affect the relevancy of the results or speed/performance of solr? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787p3995974.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Importing index - Real Time or Queued?
On Thu, 2012-07-19 at 16:00 +0200, Spadez wrote: > This seems to suggest you have to reindex Solr in its entirety and cant add a > single document at a time, is this right? > > http://stackoverflow.com/questions/11247625/apache-solr-adding-editing-deleting-records-frequently No. What is says is that you can't change _part_ of a document. What you need to do is send the full document each time it changes. By having a uniqueKey, Solr will do the bookkeeping for you and delete the old document before adding the new one. As for the performance part of the stackoverflow discussion, note that they are talking about 10 million documents and 10,000 updates. That quite far from what you've got. - Toke Eskildsen
Re: Importing index - Real Time or Queued?
You can definitely do a single document at a time, but unless you're using NRT, your changes won't be visible until you do a commit. Doing a commit involves closing Searchers and reopening them, which is semi expensive... depending on how you're doing caching, you wouldn't want to do it too frequently. However, your index is so small, you should easily be able to get away with doing it every minute or so, depending on traffic. Michael Della Bitta Appinions, Inc. -- Where Influence Isn’t a Game. http://www.appinions.com On Thu, Jul 19, 2012 at 10:00 AM, Spadez wrote: > This seems to suggest you have to reindex Solr in its entirety and cant add a > single document at a time, is this right? > > http://stackoverflow.com/questions/11247625/apache-solr-adding-editing-deleting-records-frequently > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936p3995964.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Importing index - Real Time or Queued?
This seems to suggest you have to reindex Solr in its entirety and cant add a single document at a time, is this right? http://stackoverflow.com/questions/11247625/apache-solr-adding-editing-deleting-records-frequently -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936p3995964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4 Alpha SolrJ Indexing Issue
we really need to resolve that issue soon... On Jul 19, 2012, at 12:08 AM, Briggs Thompson wrote: > Yury, > > Thank you so much! That was it. Man, I spent a good long while trouble > shooting this. Probably would have spent quite a bit more time. I > appreciate your help!! > > -Briggs > > On Wed, Jul 18, 2012 at 9:35 PM, Yury Kats wrote: > >> On 7/18/2012 7:11 PM, Briggs Thompson wrote: >>> I have realized this is not specific to SolrJ but to my instance of >> Solr. Using curl to delete by query is not working either. >> >> Can be this: https://issues.apache.org/jira/browse/SOLR-3432 >> - Mark Miller lucidimagination.com
Re: Solr grouping / facet query
I'm not sure your point <3> makes sense. If you're searching by author, how do you define "the four most relevant titles"? Relevant to what? If you are searching text of the publications, then displaying authors with no publications seems unhelpful. If you're searching the bios, how do you define "relevant titles"? Or are relevant titles based on some other criteria than you're searching on? But don't get stuck on worrying about duplicate data, denormalization of data is a common practice in Solr/Lucene. But I'm at something of a loss until you clarify what "relevant titles" means when searching for authors. Best Erick On Wed, Jul 18, 2012 at 2:36 PM, s215903406 wrote: > Could anyone suggest the options available to handle the following situation: > > 1. Say we have 1,000 authors > > 2. 65% of these authors have 10-100 titles they authored; the others have > not authored any titles but provide only their biography and writing > capability. > > 3. We want to search for authors, group the results by author, and show the > 4 most relevant titles authored for each (if any) next to the author name. > > Since not all authors have titles authored, I can't group titles by author. > Also, adding their bio to each title places a lot of duplicate data in the > index. > > So the search results would look like this; > > Author A > title0, title6, title8, title3 > > Author G > no titles found > > Author E > title4, title9, title2 > > Any suggestions would be appreciated! > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 4 ALPHA /terms /browse
Can you file two JIRA issues for these? bq. but does return reasonable results when distrib is turned off like so It should default to distrib=false - I don't think /terms is distrib aware/compatible. bq. /browse returns this stack trace to the browser HTTP ERROR 500 We may be able to fix this. On Jul 18, 2012, at 8:42 PM, Nick Koton wrote: > When I setup a 2 shard cluster using the example and run it through its > paces, I find two features that do not work as I expect. Any suggestions on > adjusting my configuration or expectations would be appreciated. > > /terms does not return any terms when issued as follows: > http://hostname:8983/solr/terms?terms.fl=name&terms=true&terms.limit=-1&isSh > ard=true&terms.sort=index&terms.prefix=s > but does return reasonable results when distrib is turned off like so > http://hostname:8983/solr/terms?terms.fl=name&terms=true&distrib=false&terms > .limit=-1&isShard=true&terms.sort=index&terms.prefix=s > > /browse returns this stack trace to the browser > HTTP ERROR 500 > > Problem accessing /solr/browse. Reason: > >{msg=ZkSolrResourceLoader does not support getConfigDir() - likely, what > you are trying to do is not supported in ZooKeeper > mode,trace=org.apache.solr.common.cloud.ZooKeeperException: > ZkSolrResourceLoader does not support getConfigDir() - likely, what you are > trying to do is not supported in ZooKeeper mode > at > org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader > .java:99) > at > org.apache.solr.response.VelocityResponseWriter.getEngine(VelocityResponseWr > iter.java:117) > at > org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter > .java:40) > at > org.apache.solr.core.SolrCore$LazyQueryResponseWriterWrapper.write(SolrCore. > java:1990) > at > org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter. > java:398) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: > 276) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler > .java:1337) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119 > ) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java > :233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java > :1065) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java: > 192) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java: > 999) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117 > ) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHand > lerCollection.java:250) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection. > java:149) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:1 > 11) > at org.eclipse.jetty.server.Server.handle(Server.java:351) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpCo > nnection.java:454) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpCo > nnection.java:47) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpC > onnection.java:890) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplet > e(AbstractHttpConnection.java:944) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) > at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnectio > n.java:66) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketCon > nector.java:254) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java: > 599) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:5 > 34) > at java.lang.Thread.run(Thread.java:662) > ,code=500} > > Best regards, > Nick Koton > > > - Mark Miller lucidimagination.com
Re: Frustrating differences in fieldNorm between two different versions of solr indexing the same document
On Thu, Jul 19, 2012 at 12:10 AM, Aaron Daubman wrote: > Greetings, > > I've been digging in to this for two days now and have come up short - > hopefully there is some simple answer I am just not seeing: > > I have a solr 1.4.1 instance and a solr 3.6.0 instance, both configured as > identically as possible (given deprecations) and indexing the same document. Why did you do this? If you want the exact same scoring, use the exact same analysis. This means specifying luceneMatchVersion = 2.9, and the exact same analysis components (even if deprecated). > I have taken the field values for the example below and run them > through /admin/analysis.jsp on each solr instance. Even for the problematic > docs/fields, the results are almost identical. For the example below, the > t_tag values for the problematic doc: > 1.4.1: 162 values > 3.6.0: 164 values > This is why: you changed your analysis. -- lucidimagination.com
Re: Indexing data in csv format
Check your csv file for extraneous data? The other thing to do is look at your logs to see if more informative information is there. THere's really very little info to go on here, you might review: http://wiki.apache.org/solr/UsingMailingListshttp://wiki.apache.org/solr/UsingMailingLists Best Erick On Tue, Jul 17, 2012 at 10:05 AM, gopes wrote: > > Hi , > > I am trying to index data in csv format. But while indexing I get this > following message - > > > HTTP ERROR 404 > > Problem accessing /solr/update/csv. Reason: > NOT_FOUND/Powered by Jetty:/// > > solrconfig.xml has the following entries for CSVRequestHandler > startup="lazy"> > > ; > true > publish_date > " > > > > Thanks, > Sarala > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-data-in-csv-format-tp3995549.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Result docs missing only when shards parameter present in query?
A multiValued really doesn't make any sense. But your log file should have something in it like this: SEVERE: uniqueKey should not be multivalued although it _is_ a bit hard to see on startup unless you've suppressed the INFO level output. See: https://issues.apache.org/jira/browse/SOLR-1570 Best Erick On Tue, Jul 17, 2012 at 9:24 AM, Bill Havanki wrote: > I had the same problem as the original poster did two years ago (!), but > with Solr 3.4.0: > >> I cannot get hits back and do not get a correct total number of records > when using shard searching. > > When performing a sharded query, I would get empty / missing results - no > documents at all. Querying each shard individually worked, but anything > with the "shards" parameter yielded no result documents. > > I was able to get results back by updating my schema to include > multiValued="false" for the unique key field. > > The problem I was seeing was that, when Solr was formulating the queries to > go get records from each shard, it was including square brackets around the > ids it was asking for, e.g.: > > ...q=123&ids=[ID1],[ID2],[ID3]&... > > I delved into the Solr code and saw that this query string was being formed > (in QueryComponent.createRetrieveDocs()) by simply calling toString() on > the unique key field value for each document it wanted to get. My guess is > that the value objects somehow were ArrayLists (or something like that) and > not Strings, so those annoying square brackets showed up via toString(). By > emphasizing in the schema that the field was single-valued, those lists > would hopefully stop appearing, and I think they did. At least the brackets > went away. > > Here's the relevant QueryComponent code (again, 3.4.0 - it's the same in > 3.6.0, didn't check 4): > > ArrayList ids = new ArrayList(shardDocs.size()); > for (ShardDoc shardDoc : shardDocs) { > // TODO: depending on the type, we may need more tha a simple toString()? > ids.add(shardDoc.id.toString()); > } > sreq.params.add(ShardParams.IDS, StrUtils.join(ids, ',')); > > The comment in there seems to fit my theory. :) > > Bill
DIH is doubling field entries
While porting from 3.6.1 to 4.x I noticed the doubling content of some fields in my index. Didn't have this with 3.6.1. This can also be seen with luke. I could trace it down to DIH so far. Anyone seen this? I'm using XPathEntityProcessor with RegexTransformer. Will look into this closer tomorrow and try to create an example. Bernd
Re: Importing index - Real Time or Queued?
On Thu, 2012-07-19 at 13:49 +0200, Spadez wrote: > It does seem really poor design to reimport 10,000 documents, when only one > needs to be added. I dont like that, can you not insert a specific entry > into Solr rather than reimporting everything? Isn't that what you outlined in your option #1? What you're looking for is probably uniqueKey: https://wiki.apache.org/solr/UniqueKey - Toke Eskildsen
Does defType overrides other settings for default request handler
Hi, We have used *dismax* in our SOLR config with /defaultOperator="OR"/ and some *mm * settings. Recently, we have started using *defType=edismax * in query params. With this change, we have observed significant drop in results count. We doubt that SOLR is using default operator="AND" and hence reducing the results count. Please confirm if our suspicion is correct or are we missing some part? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-defType-overrides-other-settings-for-default-request-handler-tp3995946.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: join this mailing list
On 19 July 2012 10:15, 晋鹏(Tomsdinary) wrote: > > Hi > I wait to join this mailing list. Please see the very first entry under http://lucene.apache.org/solr/discussion.html Regards, Gora
join this mailing list
Hi I wait to join this mailing list. This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you. 本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。
Re: Importing index - Real Time or Queued?
Thank you for the reply. Ok, well that brings another question. I dont like pre-optimisation, but I also dont like inefficiency, so lets see if I can strike a balance. It does seem really poor design to reimport 10,000 documents, when only one needs to be added. I dont like that, can you not insert a specific entry into Solr rather than reimporting everything? -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936p3995944.html Sent from the Solr - User mailing list archive at Nabble.com.
NGram Indexing Basic Question
I have set some of my fields to be NGram Indexed. Have also set analyzer both at query as well as index level. Most of the stuff works fine except for use cases where I simply interchange couple of characters. For an example: "springfield" retrieves correct matches, "springfi" retrieves correct matches, "ingfield" retrieves correct matches. However when i say "springfiedl" it returns 0 results. I debugged and found that at query/index level I have all correct N-Grams stored. So ideally it should match "springfie" (which is there both in Query NGram and Index NGram) and return me the correct results. As I was busy so did not get time to look at the code for NGram. What ideally happens when I use NGram at Query level? Does it split the strings into N-Grams and then send each of them to Solr Server? Thanks Sahi for your help yesterday. Appreciate that. **This message may contain confidential or proprietary information intended only for the use of theaddressee(s) named above or may contain information that is legally privileged. If you arenot the intended addressee, or the person responsible for delivering it to the intended addressee,you are hereby notified that reading, disseminating, distributing or copying this message is strictlyprohibited. If you have received this message by mistake, please immediately notify us byreplying to the message and delete the original message and any copies immediately thereafter. Thank you.~ ** FAFLD
Re: Importing index - Real Time or Queued?
On Thu, 2012-07-19 at 12:54 +0200, Spadez wrote: > I want to import any new SQL results onto the server as quickly as possible > so they are searchable but I dont want to overload the server. These are my > new options: > > 1. Devise a script to run when a new SQL item is posted, to immediatly > import only the new SQL record to Solr Unless you have really complex documents, which does not sound likely for an auction site, 20,000 entries and 100 changes is a tiny index in the Lucene/Solr world. It sounds like you're optimizing prematurely: Go with option 1 and expect updates to take a few seconds without the server straining. - Toke Eskildsen
maxScore returned with distributed search
Hi, Why is maxScore always returned with distributed search? It used to return only if score was part of fl. Bug? Feature? Thanks Markus
Importing index - Real Time or Queued?
Hi, Lets say I am running an auction site. There are 20,000 entries. 100 entries come from an on-site SQL database, the rest come from a generated txt file from scrapped content. I want to import any new SQL results onto the server as quickly as possible so they are searchable but I dont want to overload the server. These are my new options: 1. Devise a script to run when a new SQL item is posted, to immediatly import only the new SQL record to Solr 2. Run a CRON script on the hour to import the whole SQL database 3. Run a CRON script on the hour to import everything, including the SQL entries and the large txt file with all the scrapped results. I would really like to hear your feedback, because I cant get my head around which one is the most efficient or pratical solution. James -- View this message in context: http://lucene.472066.n3.nabble.com/Importing-index-Real-Time-or-Queued-tp3995936.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How To apply transformation in DIH for multivalued numeric field?
I have seen that issue several times, in my case it was always with an id field, mysql db and linux. Same config but on windows did not show that issue. Never got to the bottom of it...as it was an id it was just working as it was unique. -- View this message in context: http://lucene.472066.n3.nabble.com/How-To-apply-transformation-in-DIH-for-multivalued-numeric-field-tp3995810p3995927.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr faceting -- sort order
On Wed, 2012-07-18 at 20:30 +0200, Christopher Gross wrote: > When I do a query, the results that come through retain their original > case for this field, like: > doc 1 > keyword: Blah Blah Blah > doc 2 > keyword: Yadda Yadda Yadda > > But when I pull back facets, i get: > > blah blah blah (1) > yadda yadda yadda (1) Yes. The results from your query are the stored values, while the results from your facets are the indexed ones. That's the way faceting works with Solr. Technically there is nothing wrong with writing a faceting system that uses the stored values. We did this some years back, but abandoned the idea. As far as I remember, it was a lot slower to initialize the internal structures this way. One could also do faceting fully at search time, by iterating all the documents and requesting the stored value for each of them directly from the index, but that would be very slow. > I was attempting to fix a sorting problem -- keyword "" would show > up after keyword "Zulu" due to the "index" sorting, so I thought that > I could lowercase it all to have it be in the same order. But now it > is all in lower case, and I'd like it to retain the original style. Currently the lowercase trick is the only solution for plain Solr and even that only works as long as your field holds only a-z letters. So no foreign names or such. Looking forward, one solution would be to specify a custom codec for the facet field, where the comparator used for sorting is backed by a Collator that sorts the terms directly, instead of using CollatorKeys. It would be a bit slower for index updates, but should do what you require. Unfortunately I am not aware of anyone who has created such a codex or even how easy it is to get it to work with Solr (4.0 alpha). We have experimented with a faceting approach that allows for custom ordering, but it sorts upon index open and thus has a fairly long start up time. Besides, it it not in a proper state for production: https://issues.apache.org/jira/browse/SOLR-2412 - Toke Eskildsen, State and University Library, Denmark