Re: Grouping ngroups count
Hi Francois, The issue you describe looks like a similar issue we have fixed before with matches count. Open an issue and we can look into it. Martijn On 1 May 2012 20:14, Francois Perron francois.per...@wantedanalytics.com wrote: Thanks for your response Cody, First, I used distributed grouping on 2 shards and I'm sure then all documents of each group are in the same shard. I take a look on JIRA issue and it seem really similar. There is the same problem with group.ngroups. The count is calculated in second pass so we only had result from useful shards and it's why when I increase rows limit i got the right count (they must use all my shards). Except it's a feature (i hope not), I will create a new JIRA issue for this. Thanks On 2012-05-01, at 12:32 PM, Young, Cody wrote: Hello, When you say 2 slices, do you mean 2 shards? As in, you're doing a distributed query? If you're doing a distributed query, then for group.ngroups to work you need to ensure that all documents for a group exist on a single shard. However, what you're describing sounds an awful lot like this JIRA issue that I entered a while ago for distributed grouping. I found that the hit count was coming only from the shards that ended up having results in the documents that were returned. I didn't test group.ngroups at the time. https://issues.apache.org/jira/browse/SOLR-3316 If this is a similar issue then you should make a new Jira issue. Cody -Original Message- From: Francois Perron [mailto:francois.per...@wantedanalytics.com] Sent: Tuesday, May 01, 2012 6:47 AM To: solr-user@lucene.apache.org Subject: Grouping ngroups count Hello all, I tried to use grouping with 2 slices with a index of 35K documents. When I ask top 10 rows, grouped by filed A, it gave me about 16K groups. But, if I ask for top 20K rows, the ngroups property is now at 30K. Do you know why and of course how to fix it ? Thanks. -- Met vriendelijke groet, Martijn van Groningen
Parent-Child relationship
Hi, I just wanted to get some information about whether Parent-Child relationship between documents which Lucene has been talking about has been implemented in Solr or not? I know join patch is available, would that be the only solution? And another question, as and when this will be possible (if its not done already), would such a functionality (whether join or defining such relations at index time) would be available across different cores? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr broke a pipe
It seems like slave instance start to pull the index from the master and then die, it causes broken pipe at master node. On Thu, May 3, 2012 at 3:31 AM, Robert Petersen rober...@buy.com wrote: Anyone have any clues about this exception? It happened during the course of normal indexing. This is new to me (we're running solr 3.6 on tomcat 6/redhat RHEL) and we've been running smoothly for some time now until this showed up: Red Hat Enterprise Linux Server release 5.3 (Tikanga) Apache Tomcat Version 6.0.20 java.runtime.version = 1.6.0_25-b06 java.vm.name = Java HotSpot(TM) 64-Bit Server VM May 2, 2012 4:07:48 PM org.apache.solr.handler.ReplicationHandler$FileStream write WARNING: Exception while writing response for params: indexversion=1276893500358file=_1uca.frqcommand=filecontentchecksum=t ruewt=filestream ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.j ava:358) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:354) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java: 381) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:370) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStrea m.java:89) at org.apache.solr.common.util.FastOutputStream.write(FastOutputStream.java :87) at org.apache.solr.handler.ReplicationHandler$FileStream.write(ReplicationH andler.java:1076) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.ja va:936) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFil ter.java:345) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:273) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 93) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 9) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(Unknown Source) at java.net.SocketOutputStream.write(Unknown Source) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOut putBuffer.java:740) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:434) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:349) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.d oWrite(InternalOutputBuffer.java:764) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutp utFilter.java:126) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuff er.java:573) at org.apache.coyote.Response.doWrite(Response.java:560) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.j ava:353) ... 21 more -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Parent-Child relationship
Hello, Here is my favorite ones: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 On Thu, May 3, 2012 at 10:17 AM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Hi, I just wanted to get some information about whether Parent-Child relationship between documents which Lucene has been talking about has been implemented in Solr or not? I know join patch is available, would that be the only solution? And another question, as and when this will be possible (if its not done already), would such a functionality (whether join or defining such relations at index time) would be available across different cores? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Dynamic core creation works in 3.5.0 fails in 3.6.0: At least one core definition required at run-time for Solr 3.6.0?
On Wed, May 2, 2012 at 9:35 PM, Emes, Matthew (US - Irvine) me...@aaxiscommerce.com wrote: Hi: I have been working on an integration project involving Solr 3.5.0 that dynamically registers cores as needed at run-time, but does not contain any cores by default. The current solr.xml configuration file is:- ?xml version=1.0 encoding=UTF-8 ? solr persistent=false sharedLib=lib cores adminPath=/admin/cores/ /solr This configuration does not include any cores as those are created dynamically by each application that is using the Solr server. This is working fine with Solr 3.5.0; the server starts and running web applications can register a new core using SolrJ CoreAdminRequest and everything is working correctly. However, I tried to update to Solr 3.6.0 and this configuration fails with a SolrException due to the following code in CoreContainer.java (lines 171-173):- if (cores.cores.isEmpty()){ throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, No cores were created, please check the logs for errors); } This is a change from Solr 3.5.0 which has no such check. I have searched but cannot find any ticket or notice that this is a planned change in 3.6.0, but before I file a ticket I am asking the community in case this is an issue that has been discussed and this is a planned direction for Solr. I believe that this particular change was part of https://issues.apache.org/jira/browse/SOLR-1730. The ability to start solr with no cores seems like a reasonable feature so I would classify this as a bug. Not sure what others thing about it. -- Sami Siren
1MB file to Zookeeper
Hi, We've increased Zookeepers znode size limit to accomodate for some larger dictionaries and other files. It isn't the best idea to increase the maximum znode size. Any plans for splitting up larger files and storing them with multi? Does anyone have another suggestion? Thanks, Markus
SOLR 3.5 Index Optimization not producing single .cfs file
Hi, I've migrated the search servers to the latest stable release (SOLR-3.5) from SOLR-1.4.1. We've fully recreated the index for this. After index completes, when im optimizing the index then it is not merging the index into a single .cfs file as was being done with 1.4.1 version. We've set the , useCompoundFiletrue/useCompoundFile Is it something related to the new MergePolicy being used with SOLR 3.x onwards (I suppose it is TieredMergePolicy with 3.x version)? If yes should i change it to the LogByteSizeMergePolicy? Does this change requires complete rebuilt OR will do incrementally? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Merge during off peak times
Great, thanks Otis and Erick for your responses I will take a look at SPM Thanks Prabhu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 03 May 2012 00:02 To: solr-user@lucene.apache.org Subject: Re: Solr Merge during off peak times Hello Prabhu, Look at SPM for Solr (URL in sig below). It includes Index Statistics graphs, and from these graphs you can tell: * how many docs are in your index * how many docs are deleted * size of index on disk * number of index segments * number of index files * maybe something else I'm forgetting now So from size, # of segments, and index files you will be able to tell when merges happened and before/after size, segment and index file count. Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org; Otis Gospodnetic otis_gospodne...@yahoo.com Sent: Wednesday, May 2, 2012 7:22 AM Subject: RE: Solr Merge during off peak times Ok, thanks Otis Another question on merging What is the best way to monitor merging? Is there something in the log file that I can look for? It seems like I have to monitor the system resources - read/write IOPS etc.. and work out when a merge happened It would be great if I can do it by looking at log files or in the admin UI. Do you know if this can be done or if there is some tool for this? Thanks Prabhu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: 01 May 2012 15:12 To: solr-user@lucene.apache.org Subject: Re: Solr Merge during off peak times Hi Prabhu, I don't think such a merge policy exists, but it would be nice to have this option and I imagine it wouldn't be hard to write if you really just base the merge or no merge decision on the time of day (and maybe day of the week). Note that this should go into Lucene, not Solr, so if you decide to contribute your work, please see http://wiki.apache.org/lucene-java/HowToContribute Otis Performance Monitoring for Solr - http://sematext.com/spm From: Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 1, 2012 8:45 AM Subject: Solr Merge during off peak times Hi, I would like to know if there is a way to configure index merge policy in solr so that the merging happens during off peak hours. Can you please let me know if such a merge policy configuration exists? Thanks Prabhu
Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory
Yes numfound is 0 I tried your way of defining solrdir, didn't work either. Right now I placed my elevate.xml in conf dir, and wrote a script to reload solr webapp. We will run the script once a day. - Original Message - From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Wednesday, May 02, 2012 09:19 PM To: solr-user@lucene.apache.org solr-user@lucene.apache.org Subject: Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory (12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from dataDir${solr.data.dir:./solr/data}/dataDir To dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir 2. I placed my elevate.xml in my solr's data directory. Based on forum answers, I thought placing elevate.xml under data directory would pick my latest change. I restarted tomcat. 3. When i placed my elevate.xml under conf directory, elevation was working with url: http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name But when i moved to data directory, I am not seeing any results. NOTE: I can see the catalina.out, printing solr reading the file from data directory. I tried to give invalid entries; I noticed solr errors parsing elevate.xml from data directory. I even tried to send some documents to index, thought commit might help to read the elevate config file. But nothing helped. I don't understand why below url does not work anymore. There are no errors in the log files. http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name Any help on this topic is appreciated. Hi Noordeen, What do you mean by I am not seeing any results.? Is it no docs in response (numFound=0) ? And have you tried the original ${solr.data.dir:./solr/data} for the dataDir? Isn't it working for you too? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: 1MB file to Zookeeper
On May 3, 2012, at 5:15 AM, Markus Jelsma wrote: Hi, We've increased Zookeepers znode size limit to accomodate for some larger dictionaries and other files. It isn't the best idea to increase the maximum znode size. Any plans for splitting up larger files and storing them with multi? Does anyone have another suggestion? Thanks, Markus Patches welcome :) You can compress, you can break up the files, or you can raise the limit - that's about the options I know of. You might start by creating a JIRA issue. - Mark Miller lucidimagination.com
Re: Null Pointer Exception in SOLR
Hmmm, can we have some more details here? What version of Solr? What exactly did you do in the UI? What was the state of your index (i.e. adding documents from some other process? etc.). Best Erick On Wed, May 2, 2012 at 8:17 AM, mechravi25 mechrav...@yahoo.co.in wrote: Hi, When I tried to remove a data from UI (which will in turn hit SOLR), the whole application got stuck up. When we took the log files of the UI, we could see that this set of requests did not reach SOLR itself. In the SOLR log file, we were able to find the following exception occuring at the same time. SEVERE: org.apache.solr.common.SolrException: null__javalangNullPointerException_ null__javalangNullPointerException_ request: http://solr/coreX/select at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request at org.apache.solr.handler.component.HttpCommComponent$1.call at org.apache.solr.handler.component.HttpCommComponent$1.call at java.util.concurrent.FutureTask$Sync.innerRun at java.util.concurrent.FutureTask.run at java.util.concurrent.Executors$RunnableAdapter.call at java.util.concurrent.FutureTask$Sync.innerRun at java.util.concurrent.FutureTask.run at java.util.concurrent.ThreadPoolExecutor$Worker.runTask at java.util.concurrent.ThreadPoolExecutor$Worker.run at java.lang.Thread.run This situation resulted for another few hours. No one was able to perform any operation with the application and If any one tried to perform any action, it resulted in the above exception during that period. But, this situation resolved by itself after few hours and it started working like normal. Can you tell me if this situation was due to deadlock condition or was it due to the CPU utilization going beyond 100%? If it was due to the deadloack, then why did we not get any such messages in the log files?Or is it due to some other problem?Am I missing anything? Can you guide me on this? -- View this message in context: http://lucene.472066.n3.nabble.com/Null-Pointer-Exception-in-SOLR-tp3954952.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLRJ: Is there a way to obtain a quick count of total results for a query
That's the standard way, it's actually pretty efficient. Why is this a concern? Just the verbosity of the getRestults()? Best Erick On Wed, May 2, 2012 at 11:58 AM, vybe3142 vybe3...@gmail.com wrote: I can achieve this by building a query with start and rows = 0, and using queryResponse.getResults().getNumFound(). Are there any more efficient approaches to this? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Is-there-a-way-to-obtain-a-quick-count-of-total-results-for-a-query-tp3955322.html Sent from the Solr - User mailing list archive at Nabble.com.
3.6 browse gui
Hi Ive started using solr 3.6 and would like to use the /browse requestHandler as i normally do. But it just gives me some lazyloading error when trying to reach /solr/browse. This would normally work in solr 3.4. So my question is what setup is needed for the velocity responseWriter to work? / AA
Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.
Guessing from the message, java.lang.RuntimeException: [solrconfig.xml] indexDefaults/mergePolicy: missing mandatory attribute 'class' somewhere in your solr configs you have something like: mergePolicy int name=maxMergeAtOnce10/int int name=segmentsPerTier10/int /mergePolicy rather than mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce10/int int name=segmentsPerTier10/int /mergePolicy although I suppose this could be a misleading error message if, say, your classpath is confused or some such... Best Erick On Wed, May 2, 2012 at 5:38 PM, vybe3142 vybe3...@gmail.com wrote: I chronicled exactly what I had to configure to slay this dragon at http://vinaybalamuru.wordpress.com/2012/04/12/solr4-tomcat-multicor/ Hope that helps -- View this message in context: http://lucene.472066.n3.nabble.com/need-some-help-with-a-multicore-config-of-solr3-6-0-tomcat7-mine-reports-Severe-errors-in-solr-confi-tp3957196p3957389.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 1MB file to Zookeeper
Hi. Compression is a good suggestion. All large dictionaries are compressed well below 1MB with GZIP. Where should this be implemented? SolrZkClient or ZkController? Which good compressor is already in Solr's lib? And what's the difference between SolrZkClient setData and create? Should it autocompress files larger than N bytes? And how should we detect if data is compressed when reading from ZooKeeper? On Thursday 03 May 2012 14:04:31 Mark Miller wrote: On May 3, 2012, at 5:15 AM, Markus Jelsma wrote: Hi, We've increased Zookeepers znode size limit to accomodate for some larger dictionaries and other files. It isn't the best idea to increase the maximum znode size. Any plans for splitting up larger files and storing them with multi? Does anyone have another suggestion? Thanks, Markus Patches welcome :) You can compress, you can break up the files, or you can raise the limit - that's about the options I know of. You might start by creating a JIRA issue. - Mark Miller lucidimagination.com -- Markus Jelsma - CTO - Openindex
Re: should slave replication be turned off / on during master clean and re-index?
thanks for all of the advice / help. i appreciate it ;) -- View this message in context: http://lucene.472066.n3.nabble.com/should-slave-replication-be-turned-off-on-during-master-clean-and-re-index-tp3945531p3959088.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Hi David, what do you want to do with the 'commonField' option ? Is it possible to have the part of the schema for the author field please ? Is the author field stored ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question about dates
Thank you for the tips :) Gary Le 02/05/2012 21:26, Chris Hostetter a écrit : : String dateString = 20101230; : SimpleDateFormat sdf = new SimpleDateFormat(MMdd); : Date date = sdf.parse(dateString); : doc.addField(date, date); : : In the index, the date 20101230 is saved as 2010-12-29T23:00:00Z ( because : of GMT). because of GMT is missleading and vague ... what you get in your index is a value of 2010-12-29T23:00:00Z because that is the canonical string representation of the date object you have passed to doc.addField -- the date object you have passed in represents that time, because you constructed a SimpleDateFormat object w/o specifying which TimeZone that SDF object should assume is in use when it parses it's string input. So when you give it the input 20101230 it treats that is Dec 30, 2010, 00:00:00.000 in whatever teh local timezone of your client is. If you want it to treat that input string as a date expression in GMT, then you need to configure the parser to use GMT (SimpleDateFormat.setTimeZone) : I tried the following code : : : String dateString = 20101230; : SimpleDateFormat sdf = new SimpleDateFormat(MMdd); : Date date = sdf.parse(dateString); : SimpleDateFormat gmtSdf = new : SimpleDateFormat(-MM-dd'T'HH\\:mm\\:ss'Z'); : String gmtString = gmtSdf.format(Date); : : The problem is that gmtString is equals to 2010-12-30T00\:00\:00Z. There is again, that is not a gmtString .. in this case, both of the SDF objects you are using have not been configured with an explicit TimeZone, so they use whatever hte platform default is where this code is run -- so the variable you are calling gmtString is actaully a string representation of Date object formated in your local TimeZone. Bottom line... * when parsing a string into a Date, you really need to know (and be explicit to the parser) about what timezone is represented in that string (unless the formated of hte string includes the TimeZone) * when building a query string to pass to solr, then the DateFormat you use to formate a Date object must format it using GMT -- there is a DateUtil class included in solrj to make this easier. If you really don't care at all about TimeZones, then just use GMT everywhere .. but if you actually care about what time of day something happened, and want to be able to query for events with hour/min/sec/etc.. granularity, then you need to be precise about the TimeZone in every Formatter you use. -Hoss
Re: Null Pointer Exception in SOLR
Hi, I'm using the following configuration for solr Solr Specification Version: 1.4.0.2010.01.13.08.09.44 Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44 Lucene Specification Version: 2.9.1-dev Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31 Current Time: Thu May 03 05:38:12 MST 2012 Server Start Time:Wed May 02 03:45:58 MST 2012 I was trying to delete a part of a document from the UI. In this situation, the index file got updated only after 2hrs when the application unfroze. Here, I also observed that I was getting the following Highlighting error mostly. Is there a problem with using this highlighting feature? INFO: [corex] webapp=/solr path=/select params={facet=truef.CFacet.facet.limit=160hl.fl=*wt=javabinhl=falserows=2version=1f.rFacet.facet.limit=160fl=uxid,schstart=0f.tFacet.facet.limit=160q=uxid:Plan for Todayfacet.field=CFacetfacet.field=tFacetfacet.field=rFacet?=isShard=truefs=true} hits=1 status=0 QTime=6 SEVERE: java.lang.NullPointerException *at org.apache.solr.highlight.SolrHighlighter.getHighlightFields at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting* at org.apache.solr.handler.component.HighlightComponent.process at org.apache.solr.handler.component.SearchHandler.handleRequestBody at org.apache.solr.handler.RequestHandlerBase.handleRequest at org.apache.solr.core.SolrCore.execute at org.apache.solr.servlet.SolrDispatchFilter.execute at org.apache.solr.servlet.SolrDispatchFilter.doFilter at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter at org.mortbay.jetty.servlet.ServletHandler.handle at org.mortbay.jetty.security.SecurityHandler.handle at org.mortbay.jetty.servlet.SessionHandler.handle at org.mortbay.jetty.handler.ContextHandler.handle at org.mortbay.jetty.webapp.WebAppContext.handle at org.mortbay.jetty.handler.ContextHandlerCollection.handle at org.mortbay.jetty.handler.HandlerCollection.handle at org.mortbay.jetty.handler.HandlerWrapper.handle at org.mortbay.jetty.Server.handle at org.mortbay.jetty.HttpConnection.handleRequest at org.mortbay.jetty.HttpConnection$RequestHandler.content at org.mortbay.jetty.HttpParser.parseNext at org.mortbay.jetty.HttpParser.parseAvailable at org.mortbay.jetty.HttpConnection.handle at org.mortbay.jetty.bio.SocketConnector$Connection.run at org.mortbay.thread.BoundedThreadPool$PoolThread.run -- View this message in context: http://lucene.472066.n3.nabble.com/Null-Pointer-Exception-in-SOLR-tp3954952p3959151.html Sent from the Solr - User mailing list archive at Nabble.com.
solr snapshots - old school and replication - new school ?
hello all, enviornment: centOS and solr 3.5 i want to make sure i understand the difference between snapshots and solr replication. snapshots are old school and have been deprecated with solr replication new school. do i have this correct? btw: i have replication working (now), between my master and two slaves - i just want to make sure i am not missing a larger picture ;) i have been reading the Smiley Pugh book (pg 349) as well as material on the wiki at: http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrReplication thank you, -- View this message in context: http://lucene.472066.n3.nabble.com/solr-snapshots-old-school-and-replication-new-school-tp3959152.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: correct XPATH syntax
Is what I want even possible with XPathEntityProcessor? It sort of works now - I didn't realize the flatten attribute is an attribute of field instead of entity. BUT it's still not what I would like. The XML looks like below and it's nested within /MedlineCitationSet/MedlineCitation/Article/ AuthorList CompleteYN=Y Author ValidYN=Y LastNameStarremans/LastName ForeNamePatrick G J F/ForeName InitialsPG/Initials /AuthorAuthor ValidYN=Y LastNamevan der Kemp/LastName ForeNameAnnemiete W C M/ForeName InitialsAW/Initials /Author Author ValidYN=Y LastNameKnoers/LastName ForeNameNine V A M/ForeName InitialsNV/Initials /Author Author ValidYN=Y LastNamevan den Heuvel/LastName ForeNameLambertus P W J/ForeName InitialsLP/Initials /Author /AuthorList What I would like to see in the index author field is authorStarremans PG, Van der Kemp AW, etc /author note lastname Initials, no forename. When I set Xpath like this field column=author xpath=/MedlineCitationSet/MedlineCitation/Article/AuthorList/Author flatten=true / I get this in the index arr name=author strStarremans Patrick G J F PG/str strVan der Kemp Annemiete W C M AW/str . . /arr note: the forename field is included My author field in the schema.xml is field name=author type=textgen indexed=true stored=true multiValued=true required=false/ So is this even possible with XPathEntityProcessor? Thanks David On 5/3/12 8:40 AM, lboutros boutr...@gmail.commailto:boutr...@gmail.com wrote: Hi David, what do you want to do with the 'commonField' option ? Is it possible to have the part of the schema for the author field please ? Is the author field stored ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959097.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: synonyms
Jack, I am also using synonyms at query side, but so far i am able to use only single words to work, multi words is not working for me. I didn't want to use synonyms during indexing, to avoid re indexing. Is there a way for solr to support multi words? Ex: John Cena, John, Cena Or Triple H, DX, tripleh, hhh. Thanks Roxy -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, May 02, 2012 8:53 PM To: solr-user@lucene.apache.org Subject: Re: synonyms There are lots of different strategies for dealing with synonyms, depending on what exactly is most important and what exactly your are willing to tolerate. In your latest example, you seem to be using string fields, which is somewhat different form the text synonyms we talk about in Solr. You can certainly have multiple string fields, or even a multi-valued string field to store variations on selected categories of terms. That works well when you have a well-defined number of categories. So, you can have a user query go against a combination of normal text fields and these category string fields. If that is sufficient for your application, great. -- Jack Krupansky -Original Message- From: Carlos Andres Garcia Sent: Wednesday, May 02, 2012 6:57 PM To: solr-user@lucene.apache.org Subject: RE: synonyms Thanks for your answers, now I have another cuestions,if I develop the filter to replacement the current synonym filter,I understand that this procces would be in time of the indexing because in time of the query search there are a lot problems knows. if so, how can I do for create my index file. For example: I have two synonyms Nou cam, Cataluña for barcelona in the data base Opcion 1) In time of the indexing would create 2 records like this: doc fieldbarcelonafield fieldCamp Noufield ... doc and doc fieldbarcelonafield fieldCataluñafield ... doc Opcion 2) or only would create one record like this: doc fieldbarcelonafield fieldCamp Nou,Cataluñafield ... doc If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when I looking for by barcelona the Solr return 2 records and that is one error because barcelona is only one IF it create the opcion 2 , I have searching wiht wildcards for example *Camp Nou* o *Cataluña* y the solr would return one records, the same case if searching by barcelona solr would return one recors that is good , but i want to know if is the better option or solr have another caracteristic betters that can resolve this topic of one better way.
RE: Solr 3.5 - Elevate.xml causing issues when placed under /data directory
Koji, Using the way i have specified datadir, i was able to see solr reading my file. It didnt have any issue reading, but it was not serving the results using /elevate. I looked at the ElevationComponent java code, i didn't see any issue with the code either. I need elevation to work with synonyms; right now i am not sure how to implement synonyms with elevation. The code seems to maintain elevation data in a map, and it does key matches. Is there a way i can configure solrconfig.xml to use synonyms after elevate results are returned? Thanks Roxy Noordeen -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Wednesday, May 02, 2012 9:19 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.5 - Elevate.xml causing issues when placed under /data directory (12/05/03 1:39), Noordeen, Roxy wrote: Hello, I just started using elevation for solr. I am on solr 3.5, running with Drupal 7, Linux. 1. I updated my solrconfig.xml from dataDir${solr.data.dir:./solr/data}/dataDir To dataDir/usr/local/tomcat2/data/solr/dev_d7/data/dataDir 2. I placed my elevate.xml in my solr's data directory. Based on forum answers, I thought placing elevate.xml under data directory would pick my latest change. I restarted tomcat. 3. When i placed my elevate.xml under conf directory, elevation was working with url: http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name But when i moved to data directory, I am not seeing any results. NOTE: I can see the catalina.out, printing solr reading the file from data directory. I tried to give invalid entries; I noticed solr errors parsing elevate.xml from data directory. I even tried to send some documents to index, thought commit might help to read the elevate config file. But nothing helped. I don't understand why below url does not work anymore. There are no errors in the log files. http://mysolr.www.com:8181/solr/elevate?q=gameswt=xmlsort=score+descfl=id,bundle_namehttp://p6solr1.cube6.wwe.com:8181/solr/elevate?q=gameswt=xmlfl=id,bundle_name Any help on this topic is appreciated. Hi Noordeen, What do you mean by I am not seeing any results.? Is it no docs in response (numFound=0) ? And have you tried the original ${solr.data.dir:./solr/data} for the dataDir? Isn't it working for you too? koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Parent-Child relationship
Solr join has been implemented for quite some time, see: https://issues.apache.org/jira/browse/SOLR-2272 but only on trunk. 3076 is a refinement as I understand it. FWIW Erick On Thu, May 3, 2012 at 3:01 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Here is my favorite ones: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 On Thu, May 3, 2012 at 10:17 AM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Hi, I just wanted to get some information about whether Parent-Child relationship between documents which Lucene has been talking about has been implemented in Solr or not? I know join patch is available, would that be the only solution? And another question, as and when this will be possible (if its not done already), would such a functionality (whether join or defining such relations at index time) would be available across different cores? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Implementing multiterm chain for ICUCollationKeyFilterFactory
Hello I read and tried a lot, but somehow I don't fully understand and it doesn't work. I'm working on solr 4.0 (latest trunk) and use ICUCollationKeyFilterFactory for my main field type. Now, wildcard queries don't work, even though ICUCollationKeyFilterFactory seems to be http://lucene.apache.org/solr/api/org/apache/solr/analysis/class-use/MultiTermAwareComponent.html MultiTermAware . Can someone help out? My analysis chains are as follows: I tried different options for my multiterm-chain, none seem to work. Thanks a lot Oliver -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-multiterm-chain-for-ICUCollationKeyFilterFactory-tp3959241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Merge during off peak times
Ahhh, you're right. Shows what happens when I work from memory Thanks. Erick On Wed, May 2, 2012 at 4:26 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: BTW, in 4.0, there's DocumentWriterPerThread that merges in the background It flushes without pausing, but does not perform merges. Maybe you're thinking of ConcurrentMergeScheduler? On Wed, May 2, 2012 at 7:26 AM, Erick Erickson erickerick...@gmail.com wrote: Optimizing is much less important query-speed wise than historically, essentially it's not recommended much any more. A significant effect of optimize _used_ to be purging obsolete data (i.e. that from deleted docs) from the index, but that is now done on merge. There's no harm in optimizing on off-peak hours, and combined with an appropriate merge policy that may make indexing a little better (I'm thinking of not doing as many massive merges here). BTW, in 4.0, there's DocumentWriterPerThread that merges in the background and pretty much removes even this as a motivation for optimizing. All that said, optimizing isn't _bad_, it's just often unnecessary. Best Erick On Wed, May 2, 2012 at 9:29 AM, Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com wrote: Actually we are not thinking of a M/S setup We are planning to have x number of shards on N number of servers, each of the shard handling both indexing and searching The expected query volume is not that high, so don't think we would need to replicate to slaves. We think each shard will be able to handle its share of the indexing and searching. If we need to scale query capacity in future, yeah probably need to do it by replicating each shard to its slaves I agree autoCommit settings would be good to set up appropriately Another question I had is pros/cons of optimising the index. We would be purging old content every week and am thinking whether to run an index optimise in the weekend after purging old data. Because we are going to be continuously indexing data which would be mix of adds, updates, deletes, not sure if the benefit of optimising would last long enough to be worth doing it. Maybe setting a low mergeFactor would be good enough. Optimising makes sense if the index is more static, perhaps? Thoughts? Thanks Prabhu -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 May 2012 13:15 To: solr-user@lucene.apache.org Subject: Re: Solr Merge during off peak times But again, with a master/slave setup merging should be relatively benign. And at 200M docs, having a M/S setup is probably indicated. Here's a good writeup of mergepolicy http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ If you're indexing and searching on a single machine, merging is much less important than how often you commit. If a M/S situation, then you're polling interval on the slave is important. I'd look at commit frequency long before I worried about merging, that's usually where people shoot themselves in the foot - by committing too often. Overall, your mergeFactor is probably less important than other parts of how you perform indexing/searching, but it does have some effect for sure... Best Erick On Wed, May 2, 2012 at 7:54 AM, Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com wrote: We have a fairly large scale system - about 200 million docs and fairly high indexing activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want to work out what a good mergeFactor setting would be by testing with different mergeFactor settings. I think the default of 10 might be high, I want to try with 5 and compare. Unless I know when a merge starts and finishes, it would be quite difficult to work out the impact of changing mergeFactor. I want to be able to measure how long merges take, run queries during the merge activity and see what the response times are etc.. Thanks Prabhu -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 02 May 2012 12:40 To: solr-user@lucene.apache.org Subject: Re: Solr Merge during off peak times Why do you care? Merging is generally a background process, or are you doing heavy indexing? In a master/slave setup, it's usually not really relevant except that (with 3.x), massive merges may temporarily stop indexing. Is that the problem? Look at the merge policys, there are configurations that make this less painful. In trunk, DocumentWriterPerThread makes merges happen in the background, which helps the long-pause-while-indexing problem. Best Erick On Wed, May 2, 2012 at 7:22 AM, Prakashganesh, Prabhu prabhu.prakashgan...@dowjones.com wrote: Ok, thanks Otis Another question on merging What is the best way to monitor merging? Is there something in the log file that I can look for? It seems like I have to monitor the system resources - read/write IOPS etc.. and work out when a
RE: Implementing multiterm chain for ICUCollationKeyFilterFactory
Hi Oliver, Nabble.com stripped out your analysis chain XML before sending your message to the mailing list. My suggestion: stop using Nabble. (I've described this problem to their support people a couple of times, and they apparently just don't care, since it still persists, years later.) Steve -Original Message- From: OliverS [mailto:oliver.schi...@unibas.ch] Sent: Thursday, May 03, 2012 9:36 AM To: solr-user@lucene.apache.org Subject: Implementing multiterm chain for ICUCollationKeyFilterFactory Hello I read and tried a lot, but somehow I don't fully understand and it doesn't work. I'm working on solr 4.0 (latest trunk) and use ICUCollationKeyFilterFactory for my main field type. Now, wildcard queries don't work, even though ICUCollationKeyFilterFactory seems to be http://lucene.apache.org/solr/api/org/apache/solr/analysis/class-use/MultiTermAwareComponent.html MultiTermAware . Can someone help out? My analysis chains are as follows: I tried different options for my multiterm-chain, none seem to work. Thanks a lot Oliver -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-multiterm-chain-for-ICUCollationKeyFilterFactory-tp3959241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing multiterm chain for ICUCollationKeyFilterFactory
On Thu, May 3, 2012 at 9:35 AM, OliverS oliver.schi...@unibas.ch wrote: Hello I read and tried a lot, but somehow I don't fully understand and it doesn't work. I'm working on solr 4.0 (latest trunk) and use ICUCollationKeyFilterFactory for my main field type. Now, wildcard queries don't work, even though ICUCollationKeyFilterFactory seems to be http://lucene.apache.org/solr/api/org/apache/solr/analysis/class-use/MultiTermAwareComponent.html this filter implements that interface solely to support rangequeries in collation order (in addition to sort), so that it has all the lucene functionality. wildcards and even prefix queries simply wont work, because these are binary keys intended just for this purpose. if you want to do textish queries like this, you need to use a text field. -- lucidimagination.com
Searching by location – What do I send to Solr?
Hi, I'm finding it a bit hard to get my head around this. Say I am putting items on a map. This is how I am thinking it would work: A user submits an item and specifies the location as London On submission, I run a process to convert London to a Long/Lat which is stored in the database However, when someone then goes onto my site and searches for the postcode SE1 2EL, which is in London, how does Solr know the long / lat of this? Do I have to query Solr by using Long / Lat, in which case, when a user submits a request for SE1 2EL I have to convert this to Long / Lat first and then send this query to Solr? I hope I have explained myself well, I would really like to know how others have solved this. James -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms
If query-side multi-term synonyms are important to your application, your best bet may be to implement a preprocessor that expands them to an OR sequences of phrases before submitting the query to Solr. That would also give you an opportunity to boost a preferred synonym. For example, a user query of abc John Cena xyz would be preprocessed and sent to Solr as abc (John Cena OR Cena John) xyz You could also consider using phrase slop to handle simple name reversal: abc John Cena~1 xyz That would also allow a middle initial or name, for example. But, you need to consider whether you really want that. The simple phrases give you explicit control. If the synonym is in a phrase, you might need to consider re-generating the entire phrase: abc def John Cena uvw xyz to abc (def John Cena uvw OR def Cena John uvw) xyz As a side note, the query parser in LucidWorks Enterprise (and LucidWorks Cloud) does support multi-term synonyms at query term for normal text fields, but it does so by bypassing the the processing of the Solr synonym filter and simply using the synonym file to preprocess the query terms before completing the term analysis. But, that won't do you any good if you are not using the Lucid products. -- Jack Krupansky -Original Message- From: Noordeen, Roxy Sent: Thursday, May 03, 2012 9:08 AM To: solr-user@lucene.apache.org Subject: RE: synonyms Jack, I am also using synonyms at query side, but so far i am able to use only single words to work, multi words is not working for me. I didn't want to use synonyms during indexing, to avoid re indexing. Is there a way for solr to support multi words? Ex: John Cena, John, Cena Or Triple H, DX, tripleh, hhh. Thanks Roxy -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, May 02, 2012 8:53 PM To: solr-user@lucene.apache.org Subject: Re: synonyms There are lots of different strategies for dealing with synonyms, depending on what exactly is most important and what exactly your are willing to tolerate. In your latest example, you seem to be using string fields, which is somewhat different form the text synonyms we talk about in Solr. You can certainly have multiple string fields, or even a multi-valued string field to store variations on selected categories of terms. That works well when you have a well-defined number of categories. So, you can have a user query go against a combination of normal text fields and these category string fields. If that is sufficient for your application, great. -- Jack Krupansky -Original Message- From: Carlos Andres Garcia Sent: Wednesday, May 02, 2012 6:57 PM To: solr-user@lucene.apache.org Subject: RE: synonyms Thanks for your answers, now I have another cuestions,if I develop the filter to replacement the current synonym filter,I understand that this procces would be in time of the indexing because in time of the query search there are a lot problems knows. if so, how can I do for create my index file. For example: I have two synonyms Nou cam, Cataluña for barcelona in the data base Opcion 1) In time of the indexing would create 2 records like this: doc fieldbarcelonafield fieldCamp Noufield ... doc and doc fieldbarcelonafield fieldCataluñafield ... doc Opcion 2) or only would create one record like this: doc fieldbarcelonafield fieldCamp Nou,Cataluñafield ... doc If it create the opcion 2 can looking for by Camp Nou y by Cataluña but when I looking for by barcelona the Solr return 2 records and that is one error because barcelona is only one IF it create the opcion 2 , I have searching wiht wildcards for example *Camp Nou* o *Cataluña* y the solr would return one records, the same case if searching by barcelona solr would return one recors that is good , but i want to know if is the better option or solr have another caracteristic betters that can resolve this topic of one better way.
Re: correct XPATH syntax
ok, not that easy :) I did not test it myself but it seems that you could use an XSL preprocessing with the 'xsl' option in your XPathEntityProcessor : http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1 You could transform the author part as you wish and then import the author field with your actual configuration. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/correct-XPATH-syntax-tp3951804p3959397.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.
On Wed, May 2, 2012, at 02:16 PM, Robert Petersen wrote: I don't know if this will help but I usually add a dataDir element to each cores solrconfig.xml to point at a local data folder for the core like this: after a bit of digging, your suggestion PLUS a change to the 'lib dir' specifications in each core's solrconfig.xml vi solrconfig.xml ... lib dir=./lib lib dir=./contrib/extraction/lib lib dir=./contrib/clustering/lib/ lib dir=./contrib/velocity/lib /lib /lib /lib /lib ... dataDir${solr.data.dir:/srv/www/solrbase/data}/dataDir ... did the trick. i've a multicore setup working now. thanks! tbh, i'm not at all sure why the *nested* lib ... stanza is used (i just lifted it from an example I found online ...), but it seems to work.
Re: 3.6 browse gui
Your issue may relate to the migration of the Velocity response writer back to contrib that occurred in Solr 3.5. You can read about it here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3ccb50294a.2434%25dipti.srivast...@apollogrp.edu%3E And in the 3.5 release notes: http://wiki.apache.org/solr/Solr3.5 -- Jack Krupansky -Original Message- From: Aleksander Akerø Sent: Thursday, May 03, 2012 8:14 AM To: solr-user@lucene.apache.org Subject: 3.6 browse gui Hi I’ve started using solr 3.6 and would like to use the /browse requestHandler as i normally do. But it just gives me some lazyloading error when trying to reach /solr/browse. This would normally work in solr 3.4. So my question is what setup is needed for the velocity responseWriter to work? / AA
Re: Solr for routing a webapp
Why not pass the parameters using ?parameter1=value1parameter2=value2 ? mvg, Jasper On Thu, Apr 26, 2012 at 9:03 PM, Paul Libbrecht p...@hoplahup.net wrote: Or write your own query component mapping /solr/* in the web.xml, exposing the request by a thread-local through a filter, and reading this setting the appropriate query parameters... Performance-wise, this seems quite reasonable I think. paul Le 26 avr. 2012 à 16:58, Paul Libbrecht a écrit : Have you tried using mod_rewrite for this? paul Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit : Hello, I'm thinking about using a Solr index for routing a webapp. I have pregenerated base urls in my index. E.g. /foo/bar1 /foo/bar2 /foo/bar3 /foo/bar4 /bar/foo1 /bar/foo2 /bar/foo3 I try to find a way to match /foo/bar3/parameter1/value1/parameter2/value2 without knowing that parameter and value are not part of the base url. In fact I need the best hit from the beginng. Is that possible and are there any performance issues? I hope my problem is understandable! Thanks in advance and best regards, Bjoern
Re: get a total count
On 5/1/2012 8:57 AM, Rahul R wrote: Hello, A related question on this topic. How do I programmatically find the total number of documents across many shards ? For EmbeddedSolrServer, I use the following command to get the total count : solrSearcher.getStatistics().get(numDocs) With distributed search, how do i get the count of all records in all shards. Apart from doing a *:* query, is there a way to get the total count ? I am not able to use the same command above because, I am not able to get a handle to the SolrIndexSearcher object with distributed search. The conf and data directories of my index reside directly under a folder called solr (no core) under the weblogic domain directly. I dont have a SolrCore object. With EmbeddedSolrServer, I used to get the SolrIndexSearcher object using the following call : solrSearcher = (SolrIndexSearcher)SolrCoreObject.getSearcher().get(); A *:* query with rows=0 is how I get a total document count. The program that does this most often is Perl using LWP, but I'm pretty sure I could do the same thing with the Commons server in SolrJ. I've never used the embedded server. I do not specify the shards parameter on my requests, I query a special core that has the shards parameter in solrconfig.xml. Thanks, Shawn
Re: Lucene FieldCache - Out of memory exception
Just for a baseline, how much memory is available in the JVM (using jconsole or something similar) before you do your first query, and then after your first query (that has these 50-70 facets), and then after a few different queries (different facets.) Just to see how close you are to the edge even before a volume of queries start coming in. -- Jack Krupansky -Original Message- From: Rahul R Sent: Thursday, May 03, 2012 1:28 AM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache - Out of memory exception Jack, Yes, the queries work fine till I hit the OOM. The fields that start with S_* are strings, F_* are floats, I_* are ints and so so. The dynamic field definitions from schema.xml : dynamicField name=S_* type=stringindexed=true stored=true omitNorms=true/ dynamicField name=I_* type=sintindexed=true stored=true omitNorms=true/ dynamicField name=F_* type=sfloatindexed=true stored=true omitNorms=true/ dynamicField name=D_* type=dateindexed=true stored=true omitNorms=true/ dynamicField name=B_* type=booleanindexed=true stored=true omitNorms=true/ *Each FieldCache will be an array with maxdoc entries (your total number of documents - 1.4 million) times the size of the field value or whatever a string reference is in your JVM* So if I understand correct - every field (dynamic or normal) will have its own field cache. The size of the field cache for any field will be (maxDocs * sizeOfField) ? If the field has only 100 unique values, will it occupy (100 * sizeOfField) or will it still be (maxDocs * sizeOfField) ? *Roughly what is the typical or average length of one of your facet field values? And, on average, how many unique terms are there within a typical faceted field?* Each field length may vary from 10 - 30 characters. Average of 20 maybe. Number of unique terms within a faceted field will vary from 100 - 1000. Average of 300. How will the number of unique terms affect performance ? *3 GB sounds like it might not be enough for such heavy use of faceting. It is probably not the 50-70 number, but the 440 or accumulated number across many queries that pushes the memory usage up* I am using jdk1.5.0_14 - 32 bit. With 32 bit jdk, I think there is a limitation that more RAM cannot be allocated. *When you hit OOM, what does the Solr admin stats display say for FieldCache?* I don't have solr deployed as a separate web app. All solr jar files are present in my webapp's WEB-INF\lib directory. I use EmbeddedSolrServer. So is there a way I can get this information that the admin would show ? Thank you for your time. -Rahul On Wed, May 2, 2012 at 5:19 PM, Jack Krupansky j...@basetechnology.comwrote: The FieldCache gets populated the first time a given field is referenced as a facet and then will stay around forever. So, as additional queries get executed with different facet fields, the number of FieldCache entries will grow. If I understand what you have said, theses faceted queries do work initially, but after awhile they stop working with OOM, correct? The size of a single FieldCache depends on the field type. Since you are using dynamic fields, it depends on your dynamicField types - which you have not told us about. From your query I see that your fields start with S_ and F_ - presumably you have dynamic field types S_* and F_*? Are they strings, integers, floats, or what? Each FieldCache will be an array with maxdoc entries (your total number of documents - 1.4 million) times the size of the field value or whatever a string reference is in your JVM. String fields will take more space than numeric fields for the FieldCache, since a separate table is maintained for the unique terms in that field. Roughly what is the typical or average length of one of your facet field values? And, on average, how many unique terms are there within a typical faceted field? If you can convert many of these faceted fields to simple integers the size should go down dramatically, but that depends on your application. 3 GB sounds like it might not be enough for such heavy use of faceting. It is probably not the 50-70 number, but the 440 or accumulated number across many queries that pushes the memory usage up. When you hit OOM, what does the Solr admin stats display say for FieldCache? -- Jack Krupansky -Original Message- From: Rahul R Sent: Wednesday, May 02, 2012 2:22 AM To: solr-user@lucene.apache.org Subject: Re: Lucene FieldCache - Out of memory exception Here is one sample query that I picked up from the log file : q=*%3A*fq=Category%3A%223__**107%22fq=S_P1540477699%3A%** 22MICROCIRCUIT%2C+LINE+**TRANSCEIVERS%22rows=0facet=** truefacet.mincount=1facet.**limit=2facet.field=S_** C1503120369facet.field=S_**P1406389942facet.field=S_** P1430116878facet.field=S_**P1430116881facet.field=S_** P1406453552facet.field=S_**P1406451296facet.field=S_** P1406452465facet.field=S_**C2968809156facet.field=S_**
Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.
I've never seen lib directives nested, I doubt they're necessary and it's vaguely possible that this is not intentionally supported. I'd try un-nesting them personally. Best Erick On Thu, May 3, 2012 at 10:35 AM, loc...@mm.st wrote: On Wed, May 2, 2012, at 02:16 PM, Robert Petersen wrote: I don't know if this will help but I usually add a dataDir element to each cores solrconfig.xml to point at a local data folder for the core like this: after a bit of digging, your suggestion PLUS a change to the 'lib dir' specifications in each core's solrconfig.xml vi solrconfig.xml ... lib dir=./lib lib dir=./contrib/extraction/lib lib dir=./contrib/clustering/lib/ lib dir=./contrib/velocity/lib /lib /lib /lib /lib ... dataDir${solr.data.dir:/srv/www/solrbase/data}/dataDir ... did the trick. i've a multicore setup working now. thanks! tbh, i'm not at all sure why the *nested* lib ... stanza is used (i just lifted it from an example I found online ...), but it seems to work.
Re: should slave replication be turned off / on during master clean and re-index?
On 5/1/2012 6:55 AM, geeky2 wrote: you said, you don't use autocommit. if so - then why don't you use / like autocommit? It's not really that I don't like it, I just don't need it. I think that it actually caused me problems when I first started using Solr (1.4.0), but that's been long enough ago that I no longer remember. I use the live/build core method, so I do not need to be able to search the documents as they are being added. A commit at the end is good enough. It already creates multiple Lucene segments when ramBufferSizeMB fills up. I used to use the dataimporter for everything, with a Perl-based build system using cron and LWP. Now I have a multi-threaded SolrJ application that only use the importer for full rebuilds, which are very rare. Because I could not do replication between 1.4.1 and 3.x, I had to abandon replication in order to upgrade Solr. The new build program updates both of my index chains in parallel. Thanks, Shawn
Re: need some help with a multicore config of solr3.6.0+tomcat7. mine reports: Severe errors in solr configuration.
On Thu, May 3, 2012, at 11:10 AM, Erick Erickson wrote: I've never seen lib directives nested, I doubt they're necessary and it's vaguely possible that this is not intentionally supported. I'd try un-nesting them personally. changing to, lib dir=./lib/lib lib dir=./contrib/extraction/lib/lib lib dir=./contrib/clustering/lib//lib lib dir=./contrib/velocity/lib/lib still works, doesn't appear to change any behavior -- detrimentally or otherwise -- and makes more sens to me anyway. sounds like 'a keeper'. thanks.
Re: Implementing multiterm chain for ICUCollationKeyFilterFactory
Hi Thanks for the information. Steve, the xml is visible in nabble itself, but that's not a solution for people receiving the mails. Robert, I tried to implement the factory to deal with german umlaut and stuff, but am now back with an adapted charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ This should work. According to http://wiki.apache.org/solr/UnicodeCollation the wiki on UnicodeCollation I used ICUCollationKeyFilterFactory for search, obviously, it should only be used for special cases - I cannot think of any at the moment. Thanks again Oliver -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-multiterm-chain-for-ICUCollationKeyFilterFactory-tp3959241p3959634.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Parent-Child relationship
Erick, Generally I agree, but could you please expand your definition is refinement. What does it mean? I suggested SOLR-3076, because index time has been mention. On Thu, May 3, 2012 at 5:35 PM, Erick Erickson erickerick...@gmail.comwrote: Solr join has been implemented for quite some time, see: https://issues.apache.org/jira/browse/SOLR-2272 but only on trunk. 3076 is a refinement as I understand it. FWIW Erick On Thu, May 3, 2012 at 3:01 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Here is my favorite ones: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 On Thu, May 3, 2012 at 10:17 AM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Hi, I just wanted to get some information about whether Parent-Child relationship between documents which Lucene has been talking about has been implemented in Solr or not? I know join patch is available, would that be the only solution? And another question, as and when this will be possible (if its not done already), would such a functionality (whether join or defining such relations at index time) would be available across different cores? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
DataImportHandler only importing 3 fields on all entities
I have a data-config.xml declaring some entities and no matter what fields I declare in the entities the only ones it will index are id, name, and description. So fields like firstname, email, url don't appear in the index. They also don't appear in the schema browser. Am I doing something wrong? What is so special about id, name, and description that they always appear? document entity name=Project processor=SqlEntityProcessor query=select * from project field column=projectid name=id / field column=name name=name / field column=description name=description / /entity entity name=Person processor=SqlEntityProcessor query=select * from person field column=firstname name=firstName / field column=lastname name=lastName / field column=email name=email / field column=phonenumber name=phoneNumber / field column=login name=login / field column=displayname name=displayName / field column=updateduser name=updatedUser / /entity entity name=Script processor=SqlEntityProcessor query=select * from script field column=scriptid name=id / field column=name name=name / field column=description name=description / field column=url name=url / /entity /document
Re: Searching by location – What do I send to Solr?
You're on the right track. Solr knows nothing about converting post codes to lat/lon, you have to do that outside the request and submit a standard distance query. Of course this is a bit interesting. I assume the post codes aren't perfectly circular (or rectangular for that matter) so you'll get some data don't expect. Is this OK? Best Erick On Thu, May 3, 2012 at 10:01 AM, Spadez james_will...@hotmail.com wrote: Hi, I'm finding it a bit hard to get my head around this. Say I am putting items on a map. This is how I am thinking it would work: A user submits an item and specifies the location as London On submission, I run a process to convert London to a Long/Lat which is stored in the database However, when someone then goes onto my site and searches for the postcode SE1 2EL, which is in London, how does Solr know the long / lat of this? Do I have to query Solr by using Long / Lat, in which case, when a user submits a request for SE1 2EL I have to convert this to Long / Lat first and then send this query to Solr? I hope I have explained myself well, I would really like to know how others have solved this. James -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler only importing 3 fields on all entities
Those three field names are already in the Solr example schema. Either manually add your desired fields to the schema, change their names (column vs. sourceColName) to fields that do exist in your Solr schema, give them names that end with one of the dynamicField suffixes (such as *_s), or enable the * dynamicField rule in the Solr schema. -- Jack Krupansky -Original Message- From: Parmeley, Michael Sent: Thursday, May 03, 2012 1:39 PM To: solr-user@lucene.apache.org Subject: DataImportHandler only importing 3 fields on all entities I have a data-config.xml declaring some entities and no matter what fields I declare in the entities the only ones it will index are id, name, and description. So fields like firstname, email, url don't appear in the index. They also don't appear in the schema browser. Am I doing something wrong? What is so special about id, name, and description that they always appear? document entity name=Project processor=SqlEntityProcessor query=select * from project field column=projectid name=id / field column=name name=name / field column=description name=description / /entity entity name=Person processor=SqlEntityProcessor query=select * from person field column=firstname name=firstName / field column=lastname name=lastName / field column=email name=email / field column=phonenumber name=phoneNumber / field column=login name=login / field column=displayname name=displayName / field column=updateduser name=updatedUser / /entity entity name=Script processor=SqlEntityProcessor query=select * from script field column=scriptid name=id / field column=name name=name / field column=description name=description / field column=url name=url / /entity /document
solr: how to change display name of a facet?
Hello, Is there a way to change the display name (that contains spaces or special characters) for a facet without changing the value of the facet field? For example if my facet field name is 'category', I want to change the display name of the facet to 'Categories and Stuff' I've experimented with this: str name=facet.field{!ex=dt key=Categories and Stuff}category/str I'm not really sure what 'ex=dt' does but it's obvious that 'key' is the desired display name? If there are spaces in the 'key' value, the display name gets cut off. What am I doing wrong? Any help is greatly appreciated.
Why is clean option of dataImportHandler not taking effect?
Hi, I am using solr dataimportHandler for doing data import. Since the amount of data volume is large and changes frequently. I used the suggested approach described in http://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport for delta import. However I noticed that after the first full-import then second time round when I issued the following to trigger the delta import it is doing the full-import again. It is only during the second time the following command is issued then it does the delta. Does anyone has the same experience and know why it is doing that? http://localhost:8080/solr/test_db/dataimport?command=full-importclean=false Also I tried to see the query used by dynamically turning on the log level to finest using the admin but nothing was logged or displayed in the console. Is there a way to see the query used? In my db-data-config.xml below is what I configured. select ma.EID as ID, ms.NAME from table1 ms, table2 ma where ms.number is not null and ms.number = ma.number and ('${dataimporter.request.clean}' lt;gt; 'false' or ma.lastupdatedate TO_DATE ('${dataimporter.last_index_time}','-MM-DD HH24:MI:SS') ) Thanks
Re: solr: how to change display name of a facet?
On Thu, May 3, 2012 at 2:26 PM, okayndc bodymo...@gmail.com wrote: [...] I've experimented with this: str name=facet.field{!ex=dt key=Categories and Stuff}category/str I'm not really sure what 'ex=dt' does but it's obvious that 'key' is the desired display name? If there are spaces in the 'key' value, the display name gets cut off. What am I doing wrong? http://wiki.apache.org/solr/LocalParams For a non-simple parameter value, enclose it in single quotes ex excludes filters tagged with a value. See http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: solr: how to change display name of a facet?
Awesome, thanks! On Thu, May 3, 2012 at 2:32 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, May 3, 2012 at 2:26 PM, okayndc bodymo...@gmail.com wrote: [...] I've experimented with this: str name=facet.field{!ex=dt key=Categories and Stuff}category/str I'm not really sure what 'ex=dt' does but it's obvious that 'key' is the desired display name? If there are spaces in the 'key' value, the display name gets cut off. What am I doing wrong? http://wiki.apache.org/solr/LocalParams For a non-simple parameter value, enclose it in single quotes ex excludes filters tagged with a value. See http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: access document by primary key
Is this still true? Assuming that I know that there hasn't been updates or that I don't care to see a different version of the document, are the term QP or the raw QP faster than the real-time get handler? On Fri, Mar 11, 2011 at 3:12 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Mar 11, 2011 at 5:58 PM, onlinespend...@gmail.com onlinespend...@gmail.com wrote: what's the quickest and most efficient way to access a doc by its primary key? suppose I already know a document's unique id and simply want to fetch it without issuing a sophisticated query. Bypassing the normal lucene query parser does give a speed up. If you're id field is of type string, try the raw query parser, or if you're on trunk, try the term query parser for other field types (like numerics) that need a translation from external to internal format. Example: q={!raw f=id}MYDOCUMENTID -Yonik http://lucidimagination.com
Re: access document by primary key
On Thu, May 3, 2012 at 3:01 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Is this still true? Assuming that I know that there hasn't been updates or that I don't care to see a different version of the document, are the term QP or the raw QP faster than the real-time get handler? Sort of different things... query parsers only parse queries, not execute them. If you're looking for documents by ID though, the realtime-get hander should be the fastest, esp in a distributed setup. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: Searching by location – What do I send to Solr?
Hi, This is quite a challenge. I know there are situations when you can get by with google maps api or similar, but they limit the number of requests and I need more than that, unfortunatly for the full service they charge a fortune! So, going back to my question, does anyone have any ideas or suggestions of a good solution? Search for London-*Convert London to Long/Lat*-Send Query to Solr-Return Query -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960231.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching by location – What do I send to Solr?
Have you tried www.geonames.org ? - Michael On Thu, 2012-05-03 at 12:20 -0700, Spadez wrote: Hi, This is quite a challenge. I know there are situations when you can get by with google maps api or similar, but they limit the number of requests and I need more than that, unfortunatly for the full service they charge a fortune! So, going back to my question, does anyone have any ideas or suggestions of a good solution? Search for London-*Convert London to Long/Lat*-Send Query to Solr-Return Query -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960231.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EmbeddedSolrServer and StreamingUpdateSolrServer
Hi, Can someone officially confirm that it is not supported by current Solr version to use both EmbeddedSolrServer(For Full indexing) and StreamingUpdateSolrServer(For Incremental indexing ) to update the same index? How can I request for enhancement in the next version? I think that this requirement is valid and very useful; Any disagreements? Thanks, PC Rao. -- View this message in context: http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3960266.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting result first which come first in sentance
Hi all, I need suggetion: I Hi all, I need suggetion: I have many title like: 1 bomb blast in kabul 2 kabul bomb blast 3 3 people killed in serial bomb blast in kabul I want 2nd result should come first while user search by kabul. Because kabul is on 1st postion in that sentance. Similarly 1st result should come on 2nd and 3rd should come last. Please suggest me hot to implement this.. Regard Jonty
synonyms
Thanks a lot to help me to find one solution. I am going to use multi-valued string field. regards, Carlos Andres Garcia Garcia @grayknight14
Re: DataImportHandler only importing 3 fields on all entities
I discovered the schema.xml file about 2 minutes before I got your response. It was very enlightening:-) thanks for the tips about dynamicFields! On May 3, 2012, at 1:02 PM, Jack Krupansky wrote: Those three field names are already in the Solr example schema. Either manually add your desired fields to the schema, change their names (column vs. sourceColName) to fields that do exist in your Solr schema, give them names that end with one of the dynamicField suffixes (such as *_s), or enable the * dynamicField rule in the Solr schema. -- Jack Krupansky -Original Message- From: Parmeley, Michael Sent: Thursday, May 03, 2012 1:39 PM To: solr-user@lucene.apache.org Subject: DataImportHandler only importing 3 fields on all entities I have a data-config.xml declaring some entities and no matter what fields I declare in the entities the only ones it will index are id, name, and description. So fields like firstname, email, url don't appear in the index. They also don't appear in the schema browser. Am I doing something wrong? What is so special about id, name, and description that they always appear? document entity name=Project processor=SqlEntityProcessor query=select * from project field column=projectid name=id / field column=name name=name / field column=description name=description / /entity entity name=Person processor=SqlEntityProcessor query=select * from person field column=firstname name=firstName / field column=lastname name=lastName / field column=email name=email / field column=phonenumber name=phoneNumber / field column=login name=login / field column=displayname name=displayName / field column=updateduser name=updatedUser / /entity entity name=Script processor=SqlEntityProcessor query=select * from script field column=scriptid name=id / field column=name name=name / field column=description name=description / field column=url name=url / /entity /document
RE: Searching by location - What do I send to Solr?
this is called geocoding and is properly a subject for GIS types. it can be non trivial and the data you need to set it up may not be cheap. i can't address the UK application, but i am somewhat familiar with the US problem space, and in the US 5 digit postal (zip) codes don't map to discreet locations, they map to bundles of postal delivery routes. you need, i think, to research how UK postal codes actually work and what data sources are available so that you can frame your problem appropriately. richard -Original Message- From: Spadez [mailto:james_will...@hotmail.com] Sent: Thu 5/3/2012 3:20 PM To: solr-user@lucene.apache.org Subject: Re: Searching by location - What do I send to Solr? Hi, This is quite a challenge. I know there are situations when you can get by with google maps api or similar, but they limit the number of requests and I need more than that, unfortunatly for the full service they charge a fortune! So, going back to my question, does anyone have any ideas or suggestions of a good solution? Search for London-*Convert London to Long/Lat*-Send Query to Solr-Return Query -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960231.html Sent from the Solr - User mailing list archive at Nabble.com.
how to present html content in browse
I am indexing records from database using DIH. The content of my record is in html format. When I use browse I would like to show the content in html format, not in text format. Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html Sent from the Solr - User mailing list archive at Nabble.com.
how to limit solr indexing to specific number of rows
I am doing database import using solr DIH. I would like to limit the solr indexing to specific number. In other words If Solr reaches indexing 100 records I want to database import to stop importing. Not sure if there is any particular setting that would tell solr that I only want to import 100 rows from database and index those 100 records. I tried to give select query with ROMNUM=100 (using oracle) in data-config.xml, but it gave error. Any ideas!!! Thanks in Advance Srini -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-limit-solr-indexing-to-specific-number-of-rows-tp3960344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to limit solr indexing to specific number of rows
Hi, What is the error that you are getting ? ROWNUM works fine with DIH, I have tried and tested it with Solr 3.1. One thing that comes to my mind is the query that you are using to implement ROWNUM. Do you replaced the in the query by a lt; in dataconfig.xml ? like ROMNUM lt; =100 ? On Thu, May 3, 2012 at 4:11 PM, srini softtec...@gmail.com wrote: I am doing database import using solr DIH. I would like to limit the solr indexing to specific number. In other words If Solr reaches indexing 100 records I want to database import to stop importing. Not sure if there is any particular setting that would tell solr that I only want to import 100 rows from database and index those 100 records. I tried to give select query with ROMNUM=100 (using oracle) in data-config.xml, but it gave error. Any ideas!!! Thanks in Advance Srini -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-limit-solr-indexing-to-specific-number-of-rows-tp3960344.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks and Regards Rahul A. Warawdekar
Looking for a Senior Search Architect/Developer - Bloomberg LP (NYC)
Hello All, We are currently looking for a Senior Solr Search Architect to work on a large scale real time search system. Our search system scales to petabytes of data, and its a pretty interesting design problem. You can get more information on the job and apply using the link, put in my name as reference. Feel free to contact me if you have any more questions. C/C++ advanced knowledge is good to have but not a requirement. http://careers.bloomberg.com/hire/jobs/job32519.html thanks, Anirudha Jadhav Sr. Developer Bloomberg LP
Re: Searching by location – What do I send to Solr?
I have heard that GeoNames is a great source for name/location information. They even have UK postal codes: http://www.geonames.org/postal-codes/postal-codes-uk.html -- Jack Krupansky -Original Message- From: Michael Della Bitta Sent: Thursday, May 03, 2012 3:32 PM To: solr-user@lucene.apache.org Subject: Re: Searching by location – What do I send to Solr? Have you tried www.geonames.org ? - Michael On Thu, 2012-05-03 at 12:20 -0700, Spadez wrote: Hi, This is quite a challenge. I know there are situations when you can get by with google maps api or similar, but they limit the number of requests and I need more than that, unfortunatly for the full service they charge a fortune! So, going back to my question, does anyone have any ideas or suggestions of a good solution? Search for London-*Convert London to Long/Lat*-Send Query to Solr-Return Query -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960231.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching by location – What do I send to Solr?
I discounted geonames to start with but it actually looks pretty good. I may be stretching the limit of my question here, but say I did go with geonames, if I go back to my model and add a bit: Search for London-Convert London to Long/Lat-Send Query to Solr-Return Query Since my main website is coded in Python, but Solr works in Java, if I was to create or use an existing script to allow me to convert London to Long/Lat, would it make more sense for this operation to be done in Python or Java? In Python it would integrate better with my website, but in Java it would integrate better with Solr. Also would one language be more suitable or faster for this kind of operation? Again, I might be pushing the boundaries of what I can ask on here, but if anyone can chime in with their opinion I would really appreciate it. ~ James -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 3.5 Index Optimization not producing single .cfs file
By default, the default merge policy (TieredMergePolicy) won't create the CFS if the segment is very large ( 10% of the total index size). Likely that's what you are seeing? If you really must have a CFS (how come?) then you can call TieredMergePolicy.setNOCFSRatio(1.0) -- not sure how/where this is exposed in Solr though. LogMergePolicy also has the same behaviour/method... Mike McCandless http://blog.mikemccandless.com On Thu, May 3, 2012 at 5:18 AM, pravesh suyalprav...@yahoo.com wrote: Hi, I've migrated the search servers to the latest stable release (SOLR-3.5) from SOLR-1.4.1. We've fully recreated the index for this. After index completes, when im optimizing the index then it is not merging the index into a single .cfs file as was being done with 1.4.1 version. We've set the , useCompoundFiletrue/useCompoundFile Is it something related to the new MergePolicy being used with SOLR 3.x onwards (I suppose it is TieredMergePolicy with 3.x version)? If yes should i change it to the LogByteSizeMergePolicy? Does this change requires complete rebuilt OR will do incrementally? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-5-Index-Optimization-not-producing-single-cfs-file-tp3958619.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting result first which come first in sentance
as for version below 4.0, it's not possible because lucene's score model. position information is stored, but only used to support phrase query. it just tell us whether a document is matched, but we can boost a document. The similar problem is : how to implement proximity boost. for 2 search terms, we need return all docs that contains this 2 terms. but if they are phrase, we give it a largest boost. if there is a word between them, we give it a smaller one. if there are 2 words between them, we will give it smaller score. all this ranking algorithm need more flexible score model. I don't know whether the latest trunk take this into consideration. On Fri, May 4, 2012 at 3:43 AM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi all, I need suggetion: I Hi all, I need suggetion: I have many title like: 1 bomb blast in kabul 2 kabul bomb blast 3 3 people killed in serial bomb blast in kabul I want 2nd result should come first while user search by kabul. Because kabul is on 1st postion in that sentance. Similarly 1st result should come on 2nd and 3rd should come last. Please suggest me hot to implement this.. Regard Jonty
Re: Parent-Child relationship
Right. See: http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/contrib-join/org/apache/lucene/search/join/package-summary.html I guess refinement wasn't a good word choice. The basic join stuff has been in Solr for a while (2272), but 3076 refers to exposing functionality that currently exists in Lucne for use in Solr. So depending on what you want to do with joins, it may already be in Solr.../. Best Erick On Thu, May 3, 2012 at 12:42 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Erick, Generally I agree, but could you please expand your definition is refinement. What does it mean? I suggested SOLR-3076, because index time has been mention. On Thu, May 3, 2012 at 5:35 PM, Erick Erickson erickerick...@gmail.comwrote: Solr join has been implemented for quite some time, see: https://issues.apache.org/jira/browse/SOLR-2272 but only on trunk. 3076 is a refinement as I understand it. FWIW Erick On Thu, May 3, 2012 at 3:01 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Here is my favorite ones: http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html https://issues.apache.org/jira/browse/SOLR-3076 On Thu, May 3, 2012 at 10:17 AM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Hi, I just wanted to get some information about whether Parent-Child relationship between documents which Lucene has been talking about has been implemented in Solr or not? I know join patch is available, would that be the only solution? And another question, as and when this will be possible (if its not done already), would such a functionality (whether join or defining such relations at index time) would be available across different cores? -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-Child-relationship-tp3958259.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Searching by location – What do I send to Solr?
The fact that they're python and java is largely beside the point I think. Solr just sees a URL, the fact that your Python app gets in there first and does stuff with the query wouldn't affect Solr at all. Also, I tend to like keeping Solr fairly lean so any work I can offload to the application I usually do. YMMV Best Erick On Thu, May 3, 2012 at 6:43 PM, Spadez james_will...@hotmail.com wrote: I discounted geonames to start with but it actually looks pretty good. I may be stretching the limit of my question here, but say I did go with geonames, if I go back to my model and add a bit: Search for London-Convert London to Long/Lat-Send Query to Solr-Return Query Since my main website is coded in Python, but Solr works in Java, if I was to create or use an existing script to allow me to convert London to Long/Lat, would it make more sense for this operation to be done in Python or Java? In Python it would integrate better with my website, but in Java it would integrate better with Solr. Also would one language be more suitable or faster for this kind of operation? Again, I might be pushing the boundaries of what I can ask on here, but if anyone can chime in with their opinion I would really appreciate it. ~ James -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-by-location-What-do-I-send-to-Solr-tp3959296p3960666.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting result first which come first in sentance
I am using solr version 3.4
Re: Dynamic core creation works in 3.5.0 fails in 3.6.0: At least one core definition required at run-time for Solr 3.6.0?
Hi Sami: On Thu, May 3, 2012 at 12:34 AM, Sami Siren ssi...@gmail.com wrote: I believe that this particular change was part of https://issues.apache.org/jira/browse/SOLR-1730. The ability to start solr with no cores seems like a reasonable feature so I would classify this as a bug. Not sure what others thing about it. Thanks for responding. I will open a ticket and see what the devs have to say. Matthew
Re: how to present html content in browse
Make two fields, one with stores the stripped HTML and another that stores the parsed HTML. You can use copyField so that you do not have to submit the html page twice. You would mark the stripped field 'indexed=true stored=false' and the full text field the other way around. The full text field should be a String type. On Thu, May 3, 2012 at 1:04 PM, srini softtec...@gmail.com wrote: I am indexing records from database using DIH. The content of my record is in html format. When I use browse I would like to show the content in html format, not in text format. Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-present-html-content-in-browse-tp3960327.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Sorting result first which come first in sentance
for this version, you may consider using payload for position boost. you can save boost values in payload. I have used it in lucene api where anchor text should weigh more than normal text. but I haven't used it in solr. some searched urls: http://wiki.apache.org/solr/Payloads http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html On Fri, May 4, 2012 at 9:51 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solr version 3.4
Re: Solr Merge during off peak times
On 5/2/2012 5:54 AM, Prakashganesh, Prabhu wrote: We have a fairly large scale system - about 200 million docs and fairly high indexing activity - about 300k docs per day with peak ingestion rates of about 20 docs per sec. I want to work out what a good mergeFactor setting would be by testing with different mergeFactor settings. I think the default of 10 might be high, I want to try with 5 and compare. Unless I know when a merge starts and finishes, it would be quite difficult to work out the impact of changing mergeFactor. I want to be able to measure how long merges take, run queries during the merge activity and see what the response times are etc.. With a lot of indexing activity, if you are attempting to avoid large merges, I would think you would want a higher mergeFactor, not a lower one, and do occasional optimizes during non-peak hours. With a small mergeFactor, you will be merging a lot more often, and you are more likely to encounter merges of already-merged segments, which can be very slow. My index is nearing 70 million documents. I've got seven shards - six large indexes with about 11.5 million docs each, and a small index that I try to keep below half a million documents. The small index contains the newest documents, between 3.5 and 7 days worth. With this setup and the way I manage it, large merges pretty much never happen. Once a minute, I do an update cycle. This looks for and applies deletions, reinserts, and new document inserts. New document inserts happen only on the small index, and there are usually a few dozen documents to insert on each update cycle. Deletions and reinserts can happen on any of the seven shards, but there are not usually deletions and reinserts on every update cycle, and the number of reinserts is usually very very small. Once an hour, I optimize the small index, which takes about 30 seconds. Once a day, I optimize one of the large indexes during non-peak hours, so every large index gets optimized once every six days. This takes about 15 minutes, during which deletes and reinserts are not applied, but new document inserts continue to happen. My mergeFactor is set to 35. I wanted a large value here, and this particular number has a side effect -- uniformity in segment filenames on the disk during full rebuilds. Lucene uses a base-36 segment numbering scheme. I usually end up with less than 10 segments in the larger indexes, which means they don't do merges. The small index does do merges, but I have never had a problem with those merges going slowly. Because I do occasionally optimize, I am fairly sure that even when I do have merges, they happen with 35 very small segment files, and leave the large initial segment alone. I have not tested this theory, but it seems the most sensible way to do things, and I've found that Lucene/Solr usually does things in a sensible manner. If I am wrong here (using 3.5 and its improved merging), I would appreciate knowing. Thanks, Shawn
Re: [Solr 4.0] soft commit with API of Solr 4.0
: Is there way to perform soft commit from code in Solr 4.0 ? : Is it possible only from solrconfig.xml through enabling autoSoftCommit : with maxDocs and/or maxTime attributes? http://wiki.apache.org/solr/NearRealtimeSearch links to: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 Mentions: softCommit=true -Hoss
Re: Sorting result first which come first in sentance
: 1 bomb blast in kabul : : 2 kabul bomb blast : : 3 3 people killed in serial bomb blast in kabul ... : I want 2nd result should come first while user search by kabul. : : Because kabul is on 1st postion in that sentance. Similarly 1st result : should come on 2nd and 3rd should come last. One way to implement this would be with a SpanFirstQuery but I don't beleive there are any QParsers in Solr (out of hte box) that produce SpanFirstQueries, so you'd have to customize. An alternative appraoch that people have been doing since long before Span Queries have existed, is to index a special marker token (that won't ever appera in your text) at the begining of your field values, and then search for a slopy phrase containing your marker token and the word your user is looking for -- because sloppy phrase queries inherently score documents higher the closer the terms are to eachother So if you choose $^$^$ as your marker token, your three sentences would become... 1: $^$^$ bomb blast in kabul 2: $^$^$ kabul bomb blast 3: $^$^$ 3 people killed in serial bomb blast in kabul And your query would be something like $^$^$ kabul~1000 -Hoss
Re: Sorting result first which come first in sentance
A Lucene SpanFirstQuery (with a boost) would do it, but you'd have to find a query parse that supports it and most don't. You could also keep a copy of the title as a string field and then use a trailing wildcard to check if the title began with a term and boost it. title_s:Kabul*^2.0 -- Jack Krupansky -Original Message- From: Jonty Rhods Sent: Thursday, May 03, 2012 3:43 PM To: solr-user@lucene apache org Subject: Sorting result first which come first in sentance Hi all, I need suggetion: I Hi all, I need suggetion: I have many title like: 1 bomb blast in kabul 2 kabul bomb blast 3 3 people killed in serial bomb blast in kabul I want 2nd result should come first while user search by kabul. Because kabul is on 1st postion in that sentance. Similarly 1st result should come on 2nd and 3rd should come last. Please suggest me hot to implement this.. Regard Jonty
Faceting on a date field multiple times
Hi. I would like to be able to do a facet on a date field, but with different ranges (in a single query). for example. I would like to show #documents by day for the last week - #documents by week for the last couple of months #documents by year for the last several years. is there a way to do this without hitting solr 3 times? thanks Ian