Re: set fq operator independently

2014-03-05 Thread Dmitry Kan
Hi Andreas, You should be able to say: (-organisations:[ TO *] -roles:[ TO *]) OR (+organisations:(150 42) +roles:(174 72)) Study your queries with debuqQuery=true http parameter, at times this is invaluable. Dmitry On Wed, Mar 5, 2014 at 2:54 AM, Andreas Owen a...@conx.ch wrote: i want to

Re: set fq operator independently

2014-03-05 Thread Shawn Heisey
On 3/4/2014 5:54 PM, Andreas Owen wrote: i want to use the following in fq and i need to set the operator to OR. My q.op is AND but I need OR in fq. I have read about ofq but that is for putting OR between multiple fq. Can I set the operator for fq? (-organisations:[ TO *] -roles:[ TO

Re: set fq operator independently

2014-03-05 Thread Mikhail Khludnev
And if you need to cache OR legs separately, here is the workaround http://blog.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html On Wed, Mar 5, 2014 at 12:31 PM, Shawn Heisey s...@elyograg.org wrote: On 3/4/2014 5:54 PM, Andreas Owen wrote: i want to use the following in fq and i

Re: SOLR OutOfMemoryError Java heap space

2014-03-05 Thread Angel Tchorbadjiiski
Hi Toke, thank you for the mail. On 04.03.2014 11:20, Toke Eskildsen wrote: Angel Tchorbadjiiski [angel.tchorbadjii...@antibodies-online.com] wrote: [Single shard / 2 cores Solr 4.6.1, 65M docs / 50GB, 20 facet fields] The OS in use is a 64bit linux with an OpenJDK 1.7 Java with 48G RAM.

Re: SOLR OutOfMemoryError Java heap space

2014-03-05 Thread Angel Tchorbadjiiski
Hi Shawn, It may be your facets that are killing you here. As Toke mentioned, you have not indicated what your max heap is.20 separate facet fields with millions of documents will use a lot of fieldcache memory if you use the standard facet.method, fc. Try adding facet.method=enum to all your

Re: Solr is NoSQL database or not?

2014-03-05 Thread Furkan KAMACI
Hi; As I said: The first link you provided includes ElasticSearch: http://en.wikipedia.org/wiki/NoSQL as a Document Store and plus a note that it is a search engine. What are the main differences between ElasticSearch and Solr that makes ElasticSearch a NoSQL store but not Solr. I think that

Re: SOLR OutOfMemoryError Java heap space

2014-03-05 Thread Toke Eskildsen
On Wed, 2014-03-05 at 09:59 +0100, Angel Tchorbadjiiski wrote: On 04.03.2014 11:20, Toke Eskildsen wrote: Angel Tchorbadjiiski [angel.tchorbadjii...@antibodies-online.com] wrote: [Single shard / 2 cores Solr 4.6.1, 65M docs / 50GB, 20 facet fields] The OS in use is a 64bit linux with an

Implementing a customised tokenizer

2014-03-05 Thread epnRui
I have managed to understand how to properly implement and change the words on a CharFilter and a Filter, but I fail to understand how the Tokenizer works... I also fail to find any tutorials on the thing.. Could you provide some example implementation of incrementToken and how to manipulate the

Re: SOLR OutOfMemoryError Java heap space

2014-03-05 Thread Angel Tchorbadjiiski
On 05.03.2014 11:51, Toke Eskildsen wrote: On Wed, 2014-03-05 at 09:59 +0100, Angel Tchorbadjiiski wrote: On 04.03.2014 11:20, Toke Eskildsen wrote: Angel Tchorbadjiiski [angel.tchorbadjii...@antibodies-online.com] wrote: [Single shard / 2 cores Solr 4.6.1, 65M docs / 50GB, 20 facet fields]

Re: SOLR OutOfMemoryError Java heap space

2014-03-05 Thread Angel Tchorbadjiiski
Hi Shawn, On 05.03.2014 10:05, Angel Tchorbadjiiski wrote: Hi Shawn, It may be your facets that are killing you here. As Toke mentioned, you have not indicated what your max heap is.20 separate facet fields with millions of documents will use a lot of fieldcache memory if you use the

Re: Replicating Between Solr Clouds

2014-03-05 Thread Toby Lazar
Unless Solr is your system of record, aren't you already replicating your source data across the WAN? If so, could you load Solr in colo B from your colo B data source? You may be duplicating some indexing work, but at least your colo B Solr would be more closely in sync with your colo B

Re: Implementing a customised tokenizer

2014-03-05 Thread Ahmet Arslan
Hi Rui, I think ClassicTokenizerImpl.jflex file is good start for understanding tokenizers. http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ClassicTokenizerImpl.jflex Please see other *.jflex files in source tree. But

to reduce indexing time

2014-03-05 Thread sweety
Before indexing , this was the memory layout, System Memory : 63.2% ,2.21 gb JVM Memory : 8.3% , 81.60mb of 981.38mb I have indexed 700 documents of total size 12MB. Following are the results i get : Qtime: 8122, System time : 00:00:12.7318648 System Memory : 65.4% ,2.29 gb JVM Memory : 15.3% ,

Re: to reduce indexing time

2014-03-05 Thread Ahmet Arslan
Hi, Batch/bulk indexing is the way to go for speed.  * Disable autoSoftCommit feature for the bulk indexing. * Disable transaction log for the bulk indexing. Ater you finish bulk indexing, you can enable above. Again you are too generous with 1 second refresh rate (autoSoftCommit maxTime). 

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Cassandra Targett
You know, I didn't even notice that. It did go up to 30M. I've made a note to look into that before we release the 4.8 version to see if it can be reduced at all. I suspect the screenshots are causing it to balloon - we made some changes to the way they appear in the PDF for 4.7 which may be the

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Steve Rowe
Not sure if it’s relevant anymore, but a few years ago Atlassian resolved as won’t fix” a request to configure exported PDF compression ratio: https://jira.atlassian.com/browse/CONF-21329. Their suggestion: zip the PDF. I tried that - the resulting zip size is roughly 9MB, so it’s definitely

Apache Solr Configuration Problem (Japanese Language)

2014-03-05 Thread Andy Alexander
I am trying to pass a string of Japanese characters to an Apache Solr query. The string in question is '製品'. When a search is passed without any arguments, it brings up all of the indexed information, including all of the documents that have this particular string in them, however when this

Re: SOLR OutOfMemoryError Java heap space

2014-03-05 Thread Shawn Heisey
On 3/5/2014 4:40 AM, Angel Tchorbadjiiski wrote: Hi Shawn, On 05.03.2014 10:05, Angel Tchorbadjiiski wrote: Hi Shawn, It may be your facets that are killing you here. As Toke mentioned, you have not indicated what your max heap is.20 separate facet fields with millions of documents will

Re: to reduce indexing time

2014-03-05 Thread Shawn Heisey
On 3/5/2014 7:47 AM, sweety wrote: Before indexing , this was the memory layout, System Memory : 63.2% ,2.21 gb JVM Memory : 8.3% , 81.60mb of 981.38mb I have indexed 700 documents of total size 12MB. Following are the results i get : Qtime: 8122, System time : 00:00:12.7318648 System

Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-03-05 Thread epnRui
Hi guys, So, I keep facing this problem which I can't solve. I thought it was due to HTML anchors containing the name of the hashtag, and thus repeating it, but it's not. So the use case is: 1 - I need to consider hashtags as tokens. 2 - The hashtag has to show up in the facets. Right now if I

Re: to reduce indexing time

2014-03-05 Thread sweety
Now i have batch indexed, with batch of 250 documents.These were the results. After 7,000 documents, Qtime: 46894, System time : 00:00:55.9384892 JVM memory : 249.02mb, 24.8% This shows quite a reduction in timing. After 70,000 documents, Qtime: 480435, System time : 00:09:29.5206727 System

query problem

2014-03-05 Thread Kishan Parmar
hi there my schema file is this--- ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.2 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / fieldType name=int class=solr.TrieIntField precisionStep=0

Re: to reduce indexing time

2014-03-05 Thread Greg Walters
It doesn't sound like you have much of an understanding of java's garbage collection. You might read http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html to get a better understanding of how it works and why you're seeing different levels of memory utilization at any

Re: query problem

2014-03-05 Thread Ahmet Arslan
Hi, I suspect q=State:tamil nadu parsed as State:tamil text:nadu. You can confirm this by adding debugQuery=on. Either use quotes q=State:tamil nadu or use term query parser q={!term f=State}tamil nadu Ahmet On Wednesday, March 5, 2014 8:29 PM, Kishan Parmar kishan@gmail.com wrote: hi

Re: query problem

2014-03-05 Thread Kishan Parmar
Thanks , but still no change in output --- q=State:tamil nadu it parse as q: State:\tamil nadu\ Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !! 2014-03-06 0:17 GMT+05:30 Ahmet Arslan iori...@yahoo.com: Hi, I suspect q=State:tamil nadu parsed as

Re: Automate search results filtering based on scoring

2014-03-05 Thread Jeff Wartes
It¹s worth mentioning that scores should not be considered comparable across queries, so equating ³confidence² and ³score² is a tricky proposition. That is, the maxScore for the search field1:foo may be 10.0, and the maxScore for ³field1:bar² may be 1.0, but that doesn¹t mean the top result for

Solr Cloud Cores, Zookeepers and Zk Data

2014-03-05 Thread KNitin
Hi I am trying to understand the flow between zk and SolrCloud nodes during writes and restarts. *Writes*: When an indexing job runs , it looks like the leader for every shard is identified from zk and the write requests goes to the leader and then eventually data flows to replicas.

Indexing huge data

2014-03-05 Thread Rallavagu
All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj based solution to collect data from difference resources to index into

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, 6M is really not huge these days. 6B is big, though also still not huge any more. What seems to be the bottleneck? Solr or DB or network or something else? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5,

Re: Solr Cloud Cores, Zookeepers and Zk Data

2014-03-05 Thread KNitin
I should also mention that the watch count is in the order of 400-500 but the maxClientConnections is 100. Not sure if this has to do with the issue but just putting it out there On Wed, Mar 5, 2014 at 11:37 AM, KNitin nitin.t...@gmail.com wrote: Hi I am trying to understand the flow

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Robert Muir
I debugged the PDF a little. FWIW, the following code (using iText) takes it to 9MB: public static void main(String args[]) throws Exception { Document document = new Document(); PdfSmartCopy copy = new PdfSmartCopy(document, new FileOutputStream(/home/rmuir/Downloads/test.pdf));

Re: Indexing huge data

2014-03-05 Thread Rallavagu
It seems the latency is introduced by collecting the data from different sources and putting them together then actual Solr index. I would say all these activities are contributing equally though I would say So, is it normal to expect to run indexing to run for long? Wondering what to expect

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Kashish
Hi, Pls help me with this. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-if-the-query-conatins-numbers-along-with-special-characters-tp4119608p4121457.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: to reduce indexing time

2014-03-05 Thread sweety
I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context:

Re: to reduce indexing time

2014-03-05 Thread Ahmet Arslan
Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote:

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, It depends. Are docs huge or small? Server single core or 32 core? Heap big or small? etc. etc. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu rallav...@gmail.com wrote: It

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Ahmet Arslan
Hi Kashish, This is confusing. You gave the following example : query 1999/99* should return RABIAN NIGHTS #01 (1999/99) However you said I cannot ignore parenthesis or other special characters... Above two contadicts each other. Since you are after autocomplete you might be interested in

Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread Erick Erickson
Right, that patch is really about fixing the distribution solrconfig file... What you need to do is (and I'm assuming you're running SolrCloud) is change the solrconfig.xml file, push it up to ZK with the client tools and restart all the nodes in your collection, or reload all the cores. I don't

Re: Solr Filter Cache Size

2014-03-05 Thread Erick Erickson
This, BTW, is an ENORMOUS number cached queries. Here's a rough guide: Each entry will be (length of query) + maxDoc/8 bytes long. Think of the filterCache as a map where the key is the query and the value is a bitmap large enough to hold maxDoc bits. BTW, I'd kick this back to the default

Re: Elevation and core create

2014-03-05 Thread Erick Erickson
Well, if you're going to go that route, how about developing a patch for QEV? Of course there may be a very good reason it wasn't done there, I haven't looked at the code Best, Erick On Mon, Mar 3, 2014 at 1:07 PM, David Stuart d...@axistwelve.com wrote: HI Erick, Thanks for the response.

Re: Indexing huge data

2014-03-05 Thread Rallavagu
Otis, Good points. I guess you are suggesting that it depends on the resources. The document is 100k each the pre processing server is a 2 cpu VM running with 4G RAM. So, that could be a small machine relatively to process such amount of data?? On 3/5/14, 12:27 PM, Otis Gospodnetic wrote:

Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread eShard
Hi Erick, Let me make sure I understand you: I'm NOT running SolrCloud; so I just have to put the default field in ALL of my solrconfig.xml files and then restart and that should be it? Thanks for your reply, -- View this message in context:

Re: Re[2]: query parameters

2014-03-05 Thread Erick Erickson
You can just use OR GQ clauses can be most any legal query. On Mar 3, 2014 4:31 PM, Andreas Owen a...@conx.ch wrote: ok i like the logic, you can do much more. i think this should do it for me: (-organisations:[ TO *] -roles:[ TO *]) (+organisations:(150 42) +roles:(174 72)) i

Re: query problem

2014-03-05 Thread Ahmet Arslan
Hi Kishan, can you please give us example document query pair that query should retrieve that document. e.g. query q=State:tamil nadu should return what document text? Ahmet On Wednesday, March 5, 2014 9:04 PM, Kishan Parmar kishan@gmail.com wrote: Thanks , but still no change in output 

Re: SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no default specified via 'df' param

2014-03-05 Thread eShard
Ok, I updated all of my solrconfig.xml files and I restarted the tomcat server AND the errors are still there on 2 out of 10 cores Am I not reloading correctly? Here's my /browse handler: requestHandler name=/browse class=solr.SearchHandler lst name=defaults str

Re: Elevation and core create

2014-03-05 Thread Axis12
Hi Erick The patch is in progress. Looking at the code I can't figure out why this restriction was added I'll create a jira issue and post. Thanks for your help Regards Sent from my iPhone On 5 Mar 2014, at 20:36, Erick Erickson erickerick...@gmail.com wrote: Well, if you're going to go

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, Each doc is 100K? That's on the big side, yes, and the server seems on the small side, yes. Hence the speed. :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Kashish
Hi Ahmet, Let me explain with another scenario . There is a title - ARABIAN NIGHTS - 1999/99 Now in autocomplete, if i give 1999/99 , in the backend i append an asterisk to it and form the solr url thsi way q=titleName:1999/99* I get the above mentioned title.- so works perfect Now lets add

Re: Indexing huge data

2014-03-05 Thread Jack Krupansky
Make sure you're not doing a commit on each individual document add. Commit every few minutes or every few hundred or few thousand documents is sufficient. You can set up auto commit in solrconfig.xml. -- Jack Krupansky -Original Message- From: Rallavagu Sent: Wednesday, March 5,

Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com

Re: Wildcard search not working if the query contains numbers along with special characters.

2014-03-05 Thread Ahmet Arslan
Hi, Forget about patternReplaceCharFilter for a moment. Your example is more clear this time. q=titleName:1999/99* should return following two docs: d1) JULIUS CAESER (1999/99) d2) ARABIAN NIGHTS - 1999/99 This is achievable with the following type.  1) MappingCharFilterFactory with

Re: to reduce indexing time

2014-03-05 Thread Ahmet Arslan
Hi Toby, SolrJ uses javabin by default. Ahmet On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com wrote: I believe SolrJ uses XML under the covers.  If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before

Re: ANNOUNCE: Apache Solr Reference Guide for 4.7

2014-03-05 Thread Chris Hostetter
Thanks to Alexandre for pointing this out Let's use SOLR-5819 for any followup investivation/discussion so it doesn't get lost in the ANNOUNCE thread... https://issues.apache.org/jira/browse/SOLR-5819 : Date: Wed, 5 Mar 2014 14:49:41 -0500 : From: Robert Muir rcm...@gmail.com : Reply-To:

Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
Thanks Ahmet for the correction. I used wireshark to capture an UpdateRequest to solr and saw this XML: adddoc boost=1.0field name=caseID123/fieldfield name=caseNameblah/field/doc/add and figured that javabin was only for the responses. Does wt apply for how solrj send requests to solr?

Re: to reduce indexing time

2014-03-05 Thread Shawn Heisey
On 3/5/2014 2:31 PM, Toby Lazar wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Until recently, SolrJ always used XML by default for requests and

Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
OK, I was using HttpSolrServer since I haven't yet migrated to CloudSolrServer. I added the line: solrServer.setRequestWriter(new BinaryRequestWriter()) after creating the server object and now see the difference through wireshark. Is it fair to assume that this usage is multi-thread safe?

Re: to reduce indexing time

2014-03-05 Thread Shawn Heisey
On 3/5/2014 2:58 PM, Toby Lazar wrote: OK, I was using HttpSolrServer since I haven't yet migrated to CloudSolrServer. I added the line: solrServer.setRequestWriter(new BinaryRequestWriter()) after creating the server object and now see the difference through wireshark. Is it fair to

Problem with indexing xml using DataImportHandler and XPath

2014-03-05 Thread Farhan Ali
Hi, I am a newbie to Solr and I am trying to index some xml documents using DIH and XPath but I am unable to do it. I get a response message of successful indexing but no document is added to the index. I do not know what i m doing wrong. This is my data config xml file dataConfig

Re: Re[2]: query parameters

2014-03-05 Thread Erick Erickson
Bah. meant FQ clauses can be most any legal query. Erick On Wed, Mar 5, 2014 at 3:49 PM, Erick Erickson erickerick...@gmail.com wrote: You can just use OR GQ clauses can be most any legal query. On Mar 3, 2014 4:31 PM, Andreas Owen a...@conx.ch wrote: ok i like the logic, you can do much

Re: Elevation and core create

2014-03-05 Thread Erick Erickson
Right, that's perfectly appropriate. Feel free to attach unfinished versions of the patch! Just comment that it's not finished and people may have time to take a look at what you've done so far and make comments. Sometimes this saves you from a whole bunch of work :)... Best, Erick On Wed, Mar

Re: Indexing huge data

2014-03-05 Thread Erick Erickson
Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from SolrJ. My bet: Your program will run at about the same speed it does when you actually index the docs,

RE: Indexing huge data

2014-03-05 Thread Susheel Kumar
One more suggestion is to collect/prepare the data in CSV format (1-2 million sample depending on size) and then import data direct into Solr using CSV handler curl. This will give you the pure indexing time the differences. Thanks, Susheel -Original Message- From: Erick Erickson

Re: Problem with indexing xml using DataImportHandler and XPath

2014-03-05 Thread Farhan Ali
Sorry figured out my problem. It was stupid mistake on my part. Once again sorry for that Thanks Farhan On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali farhan@gmail.com wrote: Hi, I am a newbie to Solr and I am trying to index some xml documents using DIH and XPath but I am unable to do it.

Re: Problem with indexing xml using DataImportHandler and XPath

2014-03-05 Thread Erick Erickson
NP, Been there, done that, got the t-shirt :)... On Wed, Mar 5, 2014 at 9:51 PM, Farhan Ali farhan@gmail.com wrote: Sorry figured out my problem. It was stupid mistake on my part. Once again sorry for that Thanks Farhan On Wed, Mar 5, 2014 at 7:14 PM, Farhan Ali farhan@gmail.com

Re: SOLRJ and SOLR compatibility

2014-03-05 Thread Michael Sokolov
On 3/5/2014 1:36 AM, Shawn Heisey wrote: On 3/4/2014 8:15 PM, Michael Sokolov wrote: Thanks, Tim, it's great to hear you say that! I tried to make that point myself with various patches, but they never really got taken up by committers, so I kind of gave up, but I agree with you 100% this is a

Re: query problem

2014-03-05 Thread Kishan Parmar
Thanks, my documents are xml files i am attaching that document in this and in my project i have to search from each field defined in schema.xml and my output should be in solr is like { responseHeader: { status: 0, QTime: 1, params: { indent: true, q: State:Delhi,

Re: query problem

2014-03-05 Thread Gora Mohanty
On 6 March 2014 11:23, Kishan Parmar kishan@gmail.com wrote: Thanks, my documents are xml files i am attaching that document in this and in my project i have to search from each field defined in schema.xml [...] The type for State in your schema is string which is a non-analysed field

need suggestions for storing TBs of strutucred data in SolrCloud

2014-03-05 Thread Chia-Chun Shih
Hi, I am planning a system for searching TB's of structured data in SolrCloud. I need suggestions for handling such huge amount of data in SolrCloud. (e.g., number of shards per collection, number of nodes, etc.) Here are some specs of the system: 1. Raw data is 35,000 CSV files per day.