Re: Solr: How to index range-pair fields?
Sorry Venkat, this is pushing beyond my immediate knowledge. You'd just need to experiment. But the document still looks a bit wrong, specifically I don't understand where those extra 366 values are coming from. It should be just a two-dimensional coordinates, first one for start of the range, second for the end. You seem to have 2 extra useless ones. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 21 August 2015 at 21:29, vaedama sudheer.u...@gmail.com wrote: Alexandre, Fantastic answer! I think having a start position would work nicely with my use-case :) Also I would prefer to do the date Math during indexing. *Question # 1:* Can you please tell me if this doc looks correct (given that I am not yet bothered about factoring in year into my use-case) ? Student X was `absent` between dates: Jan 1, 2015 and Jan 15, 2015 Feb 13, 2015 and Feb 16, 2015 (assuming that Feb 13 is 43rd day in the year 2015 and Feb 16 is 46th day) March 19, 2015 and March 25, 2015 Also X was `present` between dates: Jan 25, 2015 and Jan 30, 2015 Feb 1, 2015 and Feb 12, 2015 { id: X, state: [absent, present], presentDays: [ [01 15 366 366], [43, 46, 366, 366], [78, 84, 366, 366] ], absentDays: [ [25, 30, 366, 366], [32, 43, 366, 366] ] } *Question #2:* Since I need timestamp level granularity, what is the appropriate way to store the field ? Student X was `absent` between epoch times: 1420104600 (9:30 AM, Jan 1 2015) and 1421341200 (5:00 PM, Jan 15, 2015) Is it possible to change *worldBounds* to take a polygon structure where I can represent millisecond level granularity ? Thanks in advance, Venkat Sudheer Reddy Aedama -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-How-to-index-range-pair-fields-tp4224369p4224582.html Sent from the Solr - User mailing list archive at Nabble.com.
Too many updates received since start
Hi, Can anyone explain me the possible causes of this warning? too many updates received since start - startingUpdates no longer overlaps with our currentUpdates This warning triggers an full recovery for the shard that throw the warning. - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-updates-received-since-start-tp4224617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use DocumentAnalysisRequestHandler in java
Hi, Faceting is indeed the best way to do it. Here is how it will look like in java: SolrQuery query = new SolrQuery(); query.setQuery(id: + docId); query.setFacet(true); query.addFacetField(text); // You can add all fields you want to inspect query.setFacetMinCount(1); // Otherwise you'll get even tokens that are not in your document QueryResponse rsp = this.index.query(query); // Now look at the results (for field text) FacetField facetField = rsp.getFacetField(text); for (Count field : facetField.getValues()) { System.out.println(field.getName()); } Xavier. Le 20/08/2015 22:20, Upayavira a écrit : On Thu, Aug 20, 2015, at 04:34 PM, Jean-Pierre Lauris wrote: Hi, I'm trying to obtain indexed tokens from a document id, in order to see what has been indexed exactly. It seems that DocumentAnalysisRequestHandler does that, but I couldn't figure out how to use it in java. The doc says I must provide a contentstream but the available init() method only takes a NamedList as a parameter. https://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/handler/DocumentAnalysisRequestHandler.html Could somebody provide me with a short example of how to get index information from a document id? If you are talking about what I think you are, then that is used by the Admin UI to implement the analysis tab. You pass in a document, and it returns it analysed. As Alexandre says, faceting may well get you there if you want to query a document already in your index. Upayavira -- Xavier Tannier Associate Professor / Maître de conférence (HDR) Univ. Paris-Sud LIMSI-CNRS (bât. 508, bureau 12, RdC) B.P. 133 91403 ORSAY CEDEX FRANCE http://www.limsi.fr/~xtannier/ http://www.limsi.fr/%7Extannier/ tel: 0033 (0)1 69 85 80 12 fax: 0033 (0)1 69 85 80 88 ---
Re: Collapse Expand
Can you explain your use case a little more? Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 21, 2015 at 5:43 PM, Kiran Sai Veerubhotla sai.sq...@gmail.com wrote: how can i use collapse expand on the docValues with json facet api?
Re: Too many updates received since start
On 8/22/2015 11:51 AM, Yago Riveiro wrote: My heap is about 24G an I tuned it using this link https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr Shawn updated since I use it and some configuration are not in this document any more. I see on my GC logs pauses about 6s, my index has a high index rate 1000 docs/s. I'm running java 7u25, maybe upgrading to java 8 the GC pauses reduced. I don't know if is safe use java 8 in production with solr ... If I remember right, you are running SolrCloud ... which means you're on at least 4.x. I have not heard about any problems running Solr in Java 8, but I only have concrete information for 4.x and later. I've heard indirectly that 3.x does work, but I haven't confirmed that rumor. I am running 4.9.1 on Java 8 for one of my indexes and it is working very well. Whether you use Java 7 or 8, you should definitely use the latest release. OpenJDK 7 and later is good, but the Oracle version is recommended. Thanks, Shawn
Re: Can TrieDateField fields be null?
TrieDateFields can be null. Actually, just not in the document. I just verified with 4.10 How are you indexing? I suspect that somehow the program that's sending things to Solr is putting the default time in. What version of Solr? Best, Erick On Sat, Aug 22, 2015 at 4:04 PM, Henrique O. Santos hensan...@gmail.com wrote: Hello, Just a simple question. Can TrieDateField fields be null? I have a schema with the following field and type: field name=started_at type=date indexed=true stored=true docValues=true / fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ Every time I index a document with no value for this field, the current time gets indexed and stored. Is there anyway to make this field null? My use case for this collection requires that I check if that date field is already filled or not. Thank you, Henrique.
Re: Too many updates received since start
On 8/22/2015 3:50 PM, Yago Riveiro wrote: I'm using java 7u25 oracle version with Solr 4.6.1 It work well with 98% of throughput but in some full GC the issue arises. A full sync for one shard is more than 50G. There is any configuration to configurate the number of docs behind leader that a replica can be? It looks like the number of docs is configurable in 5.1 and later: https://issues.apache.org/jira/browse/SOLR-6359 There is apparently a caveat related to SolrCloud recovery, which I am having trouble grasping: the 20% newest existing transaction log of the core to be recovered must be newer than the 20% oldest existing transaction log of the good core. Thanks, Shawn
Re: Solr performance is slow with just 1GB of data indexed
Hi Shawn, Yes, I've increased the heap size to 4GB already, and I'm using a machine with 32GB RAM. Is it recommended to further increase the heap size to like 8GB or 16GB? Regards, Edwin On 23 Aug 2015 10:23, Shawn Heisey apa...@elyograg.org wrote: On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote: I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr. However, I find that clustering is exceeding slow after I index this 1GB of data. It took almost 30 seconds to return the cluster results when I set it to cluster the top 1000 records, and still take more than 3 seconds when I set it to cluster the top 100 records. Is this speed normal? Cos i understand Solr can index terabytes of data without having the performance impacted so much, but now the collection is slowing down even with just 1GB of data. Have you increased the heap size? If you simply start Solr 5.x with the included script and don't use any commandline options, Solr will only have a 512MB heap. This is *extremely* small. A significant chunk of that 512MB heap will be required just to start Jetty and Solr, so there's not much memory left for manipulating the index data and serving queries. Assuming you have at least 4GB of RAM, try adding -m 2g to the start commandline. Thanks, Shawn
Re: Too many updates received since start
I'm using java 7u25 oracle version with Solr 4.6.1 It work well with 98% of throughput but in some full GC the issue arises. A full sync for one shard is more than 50G. There is any configuration to configurate the number of docs behind leader that a replica can be? On Sat, Aug 22, 2015 at 8:53 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/22/2015 11:51 AM, Yago Riveiro wrote: My heap is about 24G an I tuned it using this link https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr Shawn updated since I use it and some configuration are not in this document any more. I see on my GC logs pauses about 6s, my index has a high index rate 1000 docs/s. I'm running java 7u25, maybe upgrading to java 8 the GC pauses reduced. I don't know if is safe use java 8 in production with solr ... If I remember right, you are running SolrCloud ... which means you're on at least 4.x. I have not heard about any problems running Solr in Java 8, but I only have concrete information for 4.x and later. I've heard indirectly that 3.x does work, but I haven't confirmed that rumor. I am running 4.9.1 on Java 8 for one of my indexes and it is working very well. Whether you use Java 7 or 8, you should definitely use the latest release. OpenJDK 7 and later is good, but the Oracle version is recommended. Thanks, Shawn
Can TrieDateField fields be null?
Hello, Just a simple question. Can TrieDateField fields be null? I have a schema with the following field and type: field name=started_at type=date indexed=true stored=true docValues=true / fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ Every time I index a document with no value for this field, the current time gets indexed and stored. Is there anyway to make this field null? My use case for this collection requires that I check if that date field is already filled or not. Thank you, Henrique.
Solr performance is slow with just 1GB of data indexed
Hi, I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr. However, I find that clustering is exceeding slow after I index this 1GB of data. It took almost 30 seconds to return the cluster results when I set it to cluster the top 1000 records, and still take more than 3 seconds when I set it to cluster the top 100 records. Is this speed normal? Cos i understand Solr can index terabytes of data without having the performance impacted so much, but now the collection is slowing down even with just 1GB of data. Below is my clustering configurations in solrconfig.xml. requestHandler name=/clustering startup=lazy enable=${solr.clustering.enabled:true} class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows1000/int str name=wtjson/str str name=indenttrue/str str name=dftext/str str name=flnull/str bool name=clusteringtrue/bool bool name=clustering.resultstrue/bool str name=carrot.titlesubject content tag/str bool name=carrot.produceSummarytrue/bool int name=carrot.fragSize20/int !-- the maximum number of labels per cluster -- int name=carrot.numDescriptions20/int !-- produce sub clusters -- bool name=carrot.outputSubClustersfalse/bool str name=LingoClusteringAlgorithm.desiredClusterCountBase7/str !-- Configure the remaining request handler parameters. -- str name=defTypeedismax/str /lst arr name=last-components strclustering/str /arr /requestHandler Regards, Edwin
Re: Solr performance is slow with just 1GB of data indexed
On 8/22/2015 7:31 PM, Zheng Lin Edwin Yeo wrote: I'm using Solr 5.2.1, and I've indexed about 1GB of data into Solr. However, I find that clustering is exceeding slow after I index this 1GB of data. It took almost 30 seconds to return the cluster results when I set it to cluster the top 1000 records, and still take more than 3 seconds when I set it to cluster the top 100 records. Is this speed normal? Cos i understand Solr can index terabytes of data without having the performance impacted so much, but now the collection is slowing down even with just 1GB of data. Have you increased the heap size? If you simply start Solr 5.x with the included script and don't use any commandline options, Solr will only have a 512MB heap. This is *extremely* small. A significant chunk of that 512MB heap will be required just to start Jetty and Solr, so there's not much memory left for manipulating the index data and serving queries. Assuming you have at least 4GB of RAM, try adding -m 2g to the start commandline. Thanks, Shawn
Re: Number of requests to each shard is different with and without using of grouping
M is the number of ids you want for each group, specified by group.limit. It's unrelated to the number of rows requested.. On 21 Aug 2015 19:54, SolrUser1543 osta...@gmail.com wrote: Ramkumar R. Aiyengar wrote Grouping does need 3 phases.. The phases are: (2) For the N groups, each shard is asked for the top M ids (M is configurable per request). What do you exactly means by /M is configurable per request/ ? how exactly is it configurable and what is the relation between N ( which is initial rows number ) and M ? -- View this message in context: http://lucene.472066.n3.nabble.com/Number-of-requests-to-each-shard-is-different-with-and-without-using-of-grouping-tp4224293p4224521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many updates received since start
You can try to follow the suggestions at below link which had similar issued and see if that helps. http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-td4061831.html Thnx On Sat, Aug 22, 2015 at 9:05 AM, Yago Riveiro yago.rive...@gmail.com wrote: Hi, Can anyone explain me the possible causes of this warning? too many updates received since start - startingUpdates no longer overlaps with our currentUpdates This warning triggers an full recovery for the shard that throw the warning. - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-updates-received-since-start-tp4224617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Too many updates received since start
My heap is about 24G an I tuned it using this link https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr Shawn updated since I use it and some configuration are not in this document any more. I see on my GC logs pauses about 6s, my index has a high index rate 1000 docs/s. I'm running java 7u25, maybe upgrading to java 8 the GC pauses reduced. I don't know if is safe use java 8 in production with solr ... - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Too-many-updates-received-since-start-tp4224617p4224631.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collapse Expand
Using json facet api for nested faceting on the docValues. Trying to improve the query time and I read in a blog that query time on docValue can be improved with collapse expand. On 22-Aug-2015, at 9:29 am, Joel Bernstein joels...@gmail.com wrote: Can you explain your use case a little more? Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 21, 2015 at 5:43 PM, Kiran Sai Veerubhotla sai.sq...@gmail.com wrote: how can i use collapse expand on the docValues with json facet api?