Re: Partial Counts in SOLR

2014-03-13 Thread Salman Akram
Well some of the searches take minutes. Below are some stats about this particular index that I am talking about: Index size = 400GB (Using CommonGrams so without that the index is around 180GB) Position File = 280GB Total Docs = 170 million (just indexed for searching - for highlighting

Re: More Maintenance Releases?

2014-03-13 Thread Shawn Heisey
On 3/12/2014 6:27 PM, Erick Erickson wrote: Wondering if 4.7 is a natural point to do this. See Uwe's announcement that as of Solr 4.8, Solr/Lucene will _require_ Java 1.7 rather than Java 1.6. I know some organizations will not be able to make this transition easily, thus I suspect

Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each machine with a zookeeper quorum running on 3 other machines. The index size on each shard is about 15GB. I noticed that the number of segments in second shard was 42 and in the remaining shards was between 25-30. I am

Re: Result merging takes too long

2014-03-13 Thread remi tassing
Hi Erick, I've used the fl=id parameter to avoid retrieving the actual documents (step 4 in your mail) but the problem still exists. Any ideas on how to find the merging time(step 3)? Remi On Tue, Mar 11, 2014 at 7:29 PM, Erick Erickson erickerick...@gmail.comwrote: In SolrCloud there are a

Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread remi tassing
Hi Varun, I would just like to say that I have the same two problems you've mentioned and I couldn't figure out a way to solve them. For the 2nd I've posted a question a couple of days ago, title: Result merging takes too long Remi On Thu, Mar 13, 2014 at 3:44 PM, Varun Rajput

Re: Re-index Parent-Child Schema

2014-03-13 Thread Mikhail Khludnev
Hello Vijay, You can try FieldCollepsing, Join, Block-join, or just concatenate both field and search for concatenation. On Thu, Mar 13, 2014 at 7:16 AM, Vijay Kokatnur kokatnur.vi...@gmail.comwrote: Hi, I've inherited an Solr application with a Schema that contains parent-child

ClassCastException when streaming response

2014-03-13 Thread Marius Dumitru Florea
Hi guys, The following code server.queryAndStreamResponse(new SolrQuery(*:*), new StreamingResponseCallback() { public void streamSolrDocument(SolrDocument doc) { } public void streamDocListInfo(long numFound, long start, Float maxScore) { } }); throws Caused by:

RE: IDF maxDocs / numDocs

2014-03-13 Thread Markus Jelsma
Oh yes, i see what you mean. I would try SOLR-1632 and have distributed IDF, but it seems to be broken now. -Original message- From:Steven Bower smb-apa...@alcyon.net Sent: Wednesday 12th March 2014 21:47 To: solr-user solr-user@lucene.apache.org Subject: Re: IDF maxDocs / numDocs

Network path for data directory

2014-03-13 Thread Prasi S
Hi, I have solr index directory in a machine. I want a second solr instance on a different server to use this index. Is it possible to specify the path of a remote machine for data directory. Thanks, Prasi

RE: Network path for data directory

2014-03-13 Thread Suresh Soundararajan
Prasi, It is not possible to use the index files of one solr instance for the second instance. The reason behind this is while booting the solr instance it will get lock the schema and index files to make sure other instance won't update the index and schema files. As you mentioned like want

Re: Solr to return the list of matched fields

2014-03-13 Thread heaven
Hi, thank you, when it is good for visual review it is hard to work with this data. What I need is to build something like this: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No

Re: ClassCastException when streaming response

2014-03-13 Thread Marius Dumitru Florea
On Thu, Mar 13, 2014 at 10:41 AM, Marius Dumitru Florea mariusdumitru.flo...@xwiki.com wrote: Hi guys, The following code server.queryAndStreamResponse(new SolrQuery(*:*), new StreamingResponseCallback() { public void streamSolrDocument(SolrDocument doc) { } public void

regex in Solr Query

2014-03-13 Thread Priti Solanki
Hi, I am trying to fetch all the record for 2005 I have field(int) pubdateraw: 20130508 Not working - select?q=pubdateraw:/2013*/ Not working - select?q=pubdateraw:/.2013*./ Is it possible to have regex on int field in solr 4.5?? to get the record with 20130508 how am i suppose to write my

Re: regex in Solr Query

2014-03-13 Thread Ahmet Arslan
Hi Priti, Thats an interesting question, I wonder the answer by myself too. Does prefix query work with int? q=pubdateraw:2013*  ? By mean time, as a workaround, try range queries. q=pubdateraw:{20130101 TO 20131231} On Thursday, March 13, 2014 12:45 PM, Priti Solanki pritiatw...@gmail.com

Re: regex in Solr Query

2014-03-13 Thread Raymond Wiker
Regular expressions is a text-matching mechanism, so you shouldn't expect to be able to use it on numeric data. If your timestamps are of the form you indicate, you should be able to filter on pubdateraw:[2005 TO 2005]. On Thu, Mar 13, 2014 at 11:45 AM, Priti Solanki

Re: regex in Solr Query

2014-03-13 Thread Priti Solanki
Both works!! pubdateraw:[2005 TO 2005] pubdateraw:[20050101 TO 20051231] Thanks Raymond for sharing the useful info as well. On Thu, Mar 13, 2014 at 4:30 PM, Raymond Wiker rwi...@gmail.com wrote: Regular expressions is a text-matching mechanism, so you shouldn't expect to be able to

RE: use local param in solrconfig fq for access-control

2014-03-13 Thread Andreas Owen
I have given up this idee and made a wrapper which adds a fq with the userroles to each request -Original Message- From: Andreas Owen [mailto:a...@conx.ch] Sent: Dienstag, 11. März 2014 23:32 To: solr-user@lucene.apache.org Subject: use local param in solrconfig fq for access-control i

Re: Partial Counts in SOLR

2014-03-13 Thread Dmitry Kan
1. What is your solr version? In 4.x family the proximity searches have been optimized among other query types. 2. Do you use the filter queries? What is the situation with the cache utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if you have low hitratios and many

RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-13 Thread Andreas Owen
I have gotten nearly everything to work. There are to queries where i dont get back what i want. avaloq frage 1- only returns if i set minGramSize=1 while indexing yh_cug- query parser doesn't remove _ but the indexer does (WDF) so there is no match Is

Problem adding fields when indexing a pdf

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
When I index a pdf I would like to manually add the document's title in a filed named rmDocumentTitle. I defined the filed in the schema.xml, but when I query Solr I see that the field was not created... Do I make something wrong? Below the code snippet, schema and solrconfig.xml Thank you

Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
I tried to define a new field test in the schema (field name=test type=string indexed=true stored=true multiValued=true/) and added req.setParam(literal.test, test title); in the code. The field (test) is there O_O. Can someone explain me the difference? Why rmDocumentTitle is not there while

RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Ok, I renamed the filed rmDocumentTitle to rmdocumenttitle and now the field is there! Is there some naming rules for the field's names? No uppercase? Greetings Francesco -Original Message- From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] Sent: Donnerstag, 13. März

Re: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Gora Mohanty
On 13 March 2014 18:33, Croci Francesco Luigi (ID SWS) fcr...@id.ethz.ch wrote: Ok, I renamed the filed rmDocumentTitle to rmdocumenttitle and now the field is there! Is there some naming rules for the field's names? No uppercase? No. We have used mixed-case names in the past. Are you

RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Yes, in my test class I always do server.deleteByQuery(*:*, 5); at first. As you can see I have fullText and signatureField defined. And they are there. The only difference is that they are not manually set. Can it be, that if you use the literal.* parameter you have to use lowercase? Regards

RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Ok. Maybe I found the problem: in the solrconfig.xml I have str name=lowernamestrue/str I set it to false and now rmDocumentTitle is there too... Regards Francesco -Original Message- From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] Sent: Donnerstag, 13. März 2014 14:39

Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Shanaka Jayasundera
Hello Team, I am trying to index meta data of html pages, my setup is Nutch 2.2.1 and Solr 4.7.0 I can confirm Nutch is parsing meta tags and feed data to index on Solr. But I am unable to see meta tags when I query data. schema.xml configuration I've done, To accept indexing meta tags I've

Re: Partial Counts in SOLR

2014-03-13 Thread Salman Akram
1- SOLR 4.6 2- We do but right now I am talking about plain keyword queries just sorted by date. Once this is better will start looking into caches which we already changed a little. 3- As I said the contents are not stored in this index. Some other metadata fields are but with normal queries its

Re: Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Furkan KAMACI
Hi; I use Nutch and Solr to index meta tags. When you declare that: dynamicField name=meta_* type=string stored=true indexed=true/ It should work. However I have a question. You have that field for copy: metatag.keywords but your dynamic field is meta*_** I mean it should have underscore

solr result in miliseconds

2014-03-13 Thread Kishan Parmar
Hello, how to get milliseconds result function in solr gives result in milliseconds like --7 result found in 0.00456 milliseconds. Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !!

Re: solr result in miliseconds

2014-03-13 Thread Ahmet Arslan
Hi Kishan, Solr response already includes that info in QTime section. Aren't you seeing it? If you don't see it try setting omitHeaders=false On Thursday, March 13, 2014 6:12 PM, Kishan Parmar kishan@gmail.com wrote: Hello, how to get milliseconds result function in solr gives result

Re: solr result in miliseconds

2014-03-13 Thread Ahmet Arslan
Hi, Ups, I miswrote, it is omitHeader not omitHeaders Please see : http://wiki.apache.org/solr/CommonQueryParameters#omitHeader Ahmet On Thursday, March 13, 2014 6:37 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Kishan, Solr response already includes that info in QTime section. Aren't you

Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Shawn Heisey
On 3/13/2014 1:44 AM, Varun Rajput wrote: I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each machine with a zookeeper quorum running on 3 other machines. The index size on each shard is about 15GB. I noticed that the number of segments in second shard was 42 and in the

Re: Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Shanaka Jayasundera
Hi Furkan, Thanka, I ve checked only with dynamic field as well, have you done any other configuration changes to get it working? Can you give me some of examples for your meta tags ex metatag.keywords ? Tx,Shanaka On Thursday, 13 March 2014, Furkan KAMACI furkankam...@gmail.com wrote: Hi;

Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
Hi Remi, I read your post and like you, I have also identified that running solr 4.6.0 in cloud mode results in higher response time which has something to do with merging of documents from the various shards. Looking at the source code, we couldn't understand why it would take so much time for

Re: Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Furkan KAMACI
Hi; When I check my documents I see an example: meta_keywords. It should work. You may have a problem with Nutch side. Here is a link for it: http://wiki.apache.org/nutch/IndexMetatags On the other hand dynamic fields at Solr is explained here:

Re: Solr Cloud error with shard update

2014-03-13 Thread cpk
In case anyone else runs across this issue, I think we've found a work-around. We're seeing the same behavior with Solr 4.6.0 and 4.7. DataInputHandler loads documents, but the updates to the replica fail because of the limited support for the BigDecimal type in SolrCloud. We've successfully

Re: Delta import throws java heap space exception

2014-03-13 Thread Richard Marquina Lopez
Hi Furkan, sure, this is my data-config.xml: dataConfig document entity name=item pk=id dataSource=store_db onError=skip query=SELECT IT.* FROM item AS IT JOIN order AS ORD ON IT.order_id=ORD.id WHERE (IT.status=1 AND ORD.status=1) deltaQuery=SELECT IT.* FROM item IT,

Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
Hey Shawn, The config with the old policy used to be the literal name mergeFactor. With TieredMergePolicy, there are now three settings that must be changed in order to actually be the same as what mergeFactor used to do.The followingconfig snippet is the equivalent config to a mergeFactor

Help me understand these newrelic graphs

2014-03-13 Thread Software Dev
Here are some screen shots of our Solr Cloud cluster via Newrelic http://postimg.org/gallery/2hyzyeyc/ We currently have a 5 node cluster and all indexing is done on separate machines and shipped over. Our machines are running on SSD's with 18G of ram (Index size is 8G). We only have 1 shard at

Re: Solr Cloud error with shard update

2014-03-13 Thread Shawn Heisey
On 3/13/2014 12:54 PM, cpk wrote: We're seeing the same behavior with Solr 4.6.0 and 4.7. DataInputHandler loads documents, but the updates to the replica fail because of the limited support for the BigDecimal type in SolrCloud. We've successfully worked around the issue by setting

Re: single node causing cluster-wide outage

2014-03-13 Thread Avishai Ish-Shalom
a little more information: it seems the issue is happening after we get OutOfMemory error on facet query. On Wed, Mar 12, 2014 at 11:06 PM, Avishai Ish-Shalom avis...@fewbytes.comwrote: Hi all! After upgrading to Solr 4.6.1 we encountered a situation where a cluster outage was traced to a

Re: Help me understand these newrelic graphs

2014-03-13 Thread ralph tice
I think your response time is including the average response for an add operation, which generally returns very quickly and due to sheer number are averaging out the response time of your queries. New Relic should break out requests based on which handler they're hitting but they don't seem to.

Re: Help me understand these newrelic graphs

2014-03-13 Thread Ahmet Arslan
Hi, Ralphs comment makes sense. We can confirm his explanation. What happens when you select only QueryComponent and FacetComponent in first graph (requests response time)?  On Friday, March 14, 2014 12:18 AM, ralph tice ralph.t...@gmail.com wrote: I think your response time is including the

Re: Help me understand these newrelic graphs

2014-03-13 Thread Otis Gospodnetic
Hi, I think NR has support for breaking by handler, no? Just checked - no. Only webapp controller, but that doesn't apply to Solr. SPM should be more helpful when it comes to monitoring Solr - you can filter by host, handler, collection/core, etc. -- you can see the demo -

Re: Help me understand these newrelic graphs

2014-03-13 Thread Software Dev
Ahh.. its including the add operation. That makes sense I then. A bit silly on NR's part they don't break it down. Otis, our index is only 8G so I don't consider that big by any means but our queries can get a bit complex with a bit of faceting. Do you still think it makes sense to shard? How

Solr supports log-based recovery?

2014-03-13 Thread shushuai zhu
Hi, I noticed the following post indicating that Solr could recover not-committed data from operational log: http://www.opensourceconnections.com/2013/04/25/understanding-solr-soft-commits-and-data-durability/ which contradicts with Solr's web site:

Re: Solr supports log-based recovery?

2014-03-13 Thread Otis Gospodnetic
Skimmed this, but yes, docs are durable thanks to transaction log that can replay on start. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 13, 2014 8:25 PM, shushuai zhu ss...@yahoo.com wrote: Hi, I noticed the following post indicating that Solr could recover not-committed

Please Enable Wiki Editing

2014-03-13 Thread Greg Gilles
Hi, Please update my account so I can edit the wiki https://wiki.apache.org/solr.  GregG  /  greg22...@yahoo.com Specifically, I was installing Solr on Windows using Tomcat following the instructions on https://wiki.apache.org/solr/SolrInstall, and had some issues with the instructions and

Re: Please Enable Wiki Editing

2014-03-13 Thread Erick Erickson
Done, thanks! We can always use more editors who contribute their experiences... On Thu, Mar 13, 2014 at 8:01 PM, Greg Gilles greggil...@yahoo.com wrote: Hi, Please update my account so I can edit the wiki https://wiki.apache.org/solr. GregG / greg22...@yahoo.com Specifically, I was

Re: Zookeeper latencies and pending requests - Solr 4.3

2014-03-13 Thread Chris W
Any help on this is much appreciated. Is it better to use more cores for zookeeper (as opposed to 1 core machine)? On Wed, Mar 12, 2014 at 4:28 PM, Chris W chris1980@gmail.com wrote: Hi Furkan Load on the network is very low when read workload is on the cluster. During indexing, a few

Re: Please Enable Wiki Editing

2014-03-13 Thread Alexandre Rafalovitch
What about SEO? If somebody gives me Google Analytics access, I would be happy to dig around that for a while to see if people can actually find stuff on the Wiki. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is

Check my thinking on this, wildcard matching in phrases.

2014-03-13 Thread Erick Erickson
or why haven't I thought of this before? I'm once again being faced with the recurring problem of phrase searches with wildcards. It'll lead to index bloat, but that's acceptable in this situation, at least until proved not so. The surround query parser can deal with wildcards and proximith, but

Re: Check my thinking on this, wildcard matching in phrases.

2014-03-13 Thread Alexandre Rafalovitch
Different but (conceptually) similar? http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/index.html Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the

solr securing index files

2014-03-13 Thread Prasi S
Hi, Is there any way to secure the solr index directory . I have many users on a server and i want to restrict file access to only the administrator. does securing the index directory affect solr accessing the folder Thanks, Prasi

Re: Zookeeper latencies and pending requests - Solr 4.3

2014-03-13 Thread Shawn Heisey
On 3/13/2014 7:24 PM, Chris W wrote: Any help on this is much appreciated. Is it better to use more cores for zookeeper (as opposed to 1 core machine)? I would guess that disk latency is the biggest bottleneck for zookeeper. Unless the SolrCloud install is quite large, I don't think that much

Re: Help me understand these newrelic graphs

2014-03-13 Thread Otis Gospodnetic
It really depends, hard to give a definitive instruction without more pieces of info. e.g. if your CPUs are all maxed out and you already have a high number of concurrent queries than sharding may not be of any help at all. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr