Re: Injecting synonymns into Solr

2015-05-04 Thread Zheng Lin Edwin Yeo
Would like to check, will this method of splitting the synonyms into multiple files use up a lot of memory? I'm trying it with about 10 files and that collection is not able to be loaded due to insufficient memory. Although currently my machine only have 4GB of memory, but I only have 500,000

Editing the Solr Wiki

2015-05-04 Thread Nicole Butterfield
Dear Solr Admins, I'm writing on behalf of Manning Publications regarding the Solr wiki page:  https://wiki.apache.org/solr/.  I would like to edit the book listings on the Solr wiki to include our new MEAP Taming Search: http://www.manning.com/turnbull/. I have already set up an account with

Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Vijay Bhoomireddy
Hi, Just wondering whether there is a provision to store SolrCloud index data on Amazon S3? Please let me know any pointers. Regards Vijay -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please

Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
Dear Solr Community, I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR 5.0, Java 7 This is a brand new installation. all work fine but I would like to increase the JAVA_MEM_SOLR (40% of total RAM available). So I edit the bin/solr.in.sh # Increase Java Min/Max Heap

Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
Dear Solr Users, I have a brand new computer where I installed Ubuntu 14.04, 8Go RAM, SOLR 5.0, Java 7 I indexed 92 000 000 docs (little text file ~2ko each) I have around 30 fields All work fine but each Tuesday I need to delete some docs inside, so I create a batch file with inside line like

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran
Sadly with the size of our complex, spiting and adding more HW is not a viable long term solution. I guess the options we have are to run optimize regularly and/or become aggressive in our merges proactively even before solr cloud gets into this situation. Thanks, Rishi.

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Rishi Easwaran
Thanks for the responses Mark and Ramkumar. The question I had was, why does Solr need 2 copies at any given time, leading to 2x disk space usage. Not sure if this information is not published anywhere, and makes HW estimation almost impossible for large scale deployment. Even if the copies

Re: Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Toke Eskildsen
On Mon, 2015-05-04 at 10:03 +0100, Vijay Bhoomireddy wrote: Just wondering whether there is a provision to store SolrCloud index data on Amazon S3? Please let me know any pointers. Not to my knowledge. From what I can read, Amazon S3 is intended for bulk data and has really poor latency. For

Re: Injecting synonymns into Solr

2015-05-04 Thread Roman Chyla
It shouldn't matter. Btw try a url instead of a file path. I think the underlying loading mechanism uses java File , it could work. On May 4, 2015 2:07 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Would like to check, will this method of splitting the synonyms into multiple files use up

Solr Cloud

2015-05-04 Thread Jilani Shaik
Hi All, Do we have any monitoring tools for Apache Solr Cloud? similar to Apache Ambari which is used for Hadoop Cluster. Basically I am looking for tool similar to Apache Ambari, which will give us various metrics in terms of graphs and charts along with deep details for each node in Hadoop

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-05-04 Thread Steven White
Thanks Doug. This is extremely helpful. It is much appreciated that you took the time to write it all. Do we have a Solr / Lucene wiki with such did you know? write ups? If not, just having this kind of knowledge in an email isn't good enough as it won't be as searchable as a wiki. Steve On

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Shawn Heisey
On 5/4/2015 3:19 AM, Bruno Mannina wrote: All work fine but each Tuesday I need to delete some docs inside, so I create a batch file with inside line like this: /home/solr/solr-5.0.0/bin/post -c docdb -commit no -d deletequeryf1:58644/query/delete /home/solr/solr-5.0.0/bin/post -c docdb

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Shawn Heisey
On 5/4/2015 4:55 AM, Rishi Easwaran wrote: Sadly with the size of our complex, spiting and adding more HW is not a viable long term solution. I guess the options we have are to run optimize regularly and/or become aggressive in our merges proactively even before solr cloud gets into this

Re: Injecting synonymns into Solr

2015-05-04 Thread Shawn Heisey
On 5/4/2015 12:07 AM, Zheng Lin Edwin Yeo wrote: Would like to check, will this method of splitting the synonyms into multiple files use up a lot of memory? I'm trying it with about 10 files and that collection is not able to be loaded due to insufficient memory. Although currently my

Re: Solr Cloud

2015-05-04 Thread Shawn Heisey
On 5/4/2015 6:16 AM, Jilani Shaik wrote: Do we have any monitoring tools for Apache Solr Cloud? similar to Apache Ambari which is used for Hadoop Cluster. Basically I am looking for tool similar to Apache Ambari, which will give us various metrics in terms of graphs and charts along with

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Walter Underwood
One segment is in-use, being searched. That segment (and others) are merged into a new segment. After the new segment is ready, searches are directed to the new copy and the old copies are deleted. That is how two copies are needed. If you cannot provide 2X the disk space, you will not have a

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-05-04 Thread Shawn Heisey
On 5/4/2015 6:29 AM, Steven White wrote: Thanks Doug. This is extremely helpful. It is much appreciated that you took the time to write it all. Do we have a Solr / Lucene wiki with such did you know? write ups? If not, just having this kind of knowledge in an email isn't good enough as it

How to get exact match along with text edge_ngram

2015-05-04 Thread Vishal Swaroop
We have item_name indexed as text edge_ngram which returns like results... Please suggest what will be the best approach (like string index (in addition to ...edge_ngram... or using copyField...) to search ALSO for exact matches? e.g. url should return item_name as abc entries only... I tried

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Rishi Easwaran
Walter, Unless I am missing something here.. I completely get that, when a few segment merges solr requires 2x space of segments to accomplish this. Usually any index has multiple segments files so this fragmented 2x space consumption is not an issue, even as merged segments grow bigger. But

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran
Thanks Shawn.. yeah regular optimize might be the route we take, if this becomes a recurring issue. I remember in our old multicore deployment CPU used to spike and the core almost became non responsive. My guess with solr cloud architecture, any slack by leader while optimizing is picked up

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
ok I have this OOM error in the log file ... # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=/home/solr/solr-5.0.0/bin/oom_solr.sh 8983/home/solr/solr-5.0.0/server/logs # Executing /bin/sh -c /home/solr/solr-5.0.0/bin/oom_solr.sh

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Scott Dawson
Bruno, You have the wrong kind of dash (a long dash) in front of the Xmx flag. Could that be causing a problem? Regards, Scott On Mon, May 4, 2015 at 5:06 AM, Bruno Mannina bmann...@free.fr wrote: Dear Solr Community, I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
I increase the formdataUploadLimitInKB to 2048000 and the problem is the same, same error an idea ? Le 04/05/2015 16:38, Bruno Mannina a écrit : ok I have this OOM error in the log file ... # # java.lang.OutOfMemoryError: Java heap space #

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Shawn Heisey
On 5/4/2015 8:38 AM, Bruno Mannina wrote: ok I have this OOM error in the log file ... # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=/home/solr/solr-5.0.0/bin/oom_solr.sh 8983/home/solr/solr-5.0.0/server/logs # Executing /bin/sh -c

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
Yes ! it works !!! Scott perfect For my config 3g do not work, but 2g yes ! Thanks Le 04/05/2015 16:50, Scott Dawson a écrit : Bruno, You have the wrong kind of dash (a long dash) in front of the Xmx flag. Could that be causing a problem? Regards, Scott On Mon, May 4, 2015 at 5:06 AM,

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
Yes it was that ! I increased the SOLR_JAVA_MEM to 2g (with 8Go Ram i do more, 3g fail to run solr on my brand new computer) thanks ! Le 04/05/2015 17:03, Shawn Heisey a écrit : On 5/4/2015 8:38 AM, Bruno Mannina wrote: ok I have this OOM error in the log file ... # #

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Shawn Heisey
On 5/4/2015 9:09 AM, Bruno Mannina wrote: Yes ! it works !!! Scott perfect For my config 3g do not work, but 2g yes ! If you can't start Solr with a 3g heap, chances are that you are running a 32-bit version of Java. A 32-bit Java cannot go above a 2GB heap. A 64-bit JVM requires a

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
Shaun thanks a lot for this comment, So, I have this information, no information about 32 or 64 bits... solr@linux:~$ java -version java version 1.7.0_79 OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK Server VM (build 24.79-b02, mixed mode) solr@linux:~$

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Shawn Heisey
On 5/4/2015 10:28 AM, Bruno Mannina wrote: solr@linux:~$ java -version java version 1.7.0_79 OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK Server VM (build 24.79-b02, mixed mode) solr@linux:~$ solr@linux:~$ uname -a Linux linux 3.13.0-51-generic

Answer engine - NLP related question

2015-05-04 Thread bbarani
Hi, Note: I have very basic knowledge on NLP.. I am working on an answer engine prototype where when the user enters a keyword and searches for it we show them the answer corresponding to that keyword (rather than displaying multiple documents that match the keyword) For Ex: When user

apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy
Hey all, I need to run solr 5.1.0 on port 80 with some basic apache authentication. Normally, under earlier versions of solr I would set it up to run under tomcat, then connect it to apache web server using mod_jk. However 5.1.0 seems totally different. I see that tomcat support has been removed

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Shawn Heisey
On 5/4/2015 1:04 PM, Tim Dunphy wrote: I need to run solr 5.1.0 on port 80 with some basic apache authentication. Normally, under earlier versions of solr I would set it up to run under tomcat, then connect it to apache web server using mod_jk. However 5.1.0 seems totally different. I see

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
ok, I note all these information, thanks ! I will update if it's needed. 2go seems to be ok. Le 04/05/2015 18:46, Shawn Heisey a écrit : On 5/4/2015 10:28 AM, Bruno Mannina wrote: solr@linux:~$ java -version java version 1.7.0_79 OpenJDK Runtime Environment (IcedTea 2.5.5)

Re: Injecting synonymns into Solr

2015-05-04 Thread Zheng Lin Edwin Yeo
Yes, the underlying mechanism uses java. But the collection isn't able to load when the Solr starts up, so it didn't return anything even if I use url. Is it just due to my machine not having enough memory? Regards, Edwin On 4 May 2015 20:12, Roman Chyla roman.ch...@gmail.com wrote: It

Re: SolrCloud+HDFS disappointed indexing performance

2015-05-04 Thread xinwu
Can someone help me ? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-HDFS-disappointed-indexing-performance-tp4203155p4203852.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Cloud

2015-05-04 Thread Anirudha Jadhav
the jmx metrics are good, you can start there, lets talk offline for more. -Ani On Mon, May 4, 2015 at 10:51 PM, Jilani Shaik jilani24...@gmail.com wrote: Thanks Shawn, It has provided the pointers of open source, I am really interested to look for open source solution, I have basic knowledge

Re: Solr Cloud

2015-05-04 Thread Jilani Shaik
Thanks Shawn, It has provided the pointers of open source, I am really interested to look for open source solution, I have basic knowledge of Ganglia and Nagios. I have gone through the sematext and our company already using newrelic on this space. But I am interested in open source similar to

Union and intersection methods in solr DocSet

2015-05-04 Thread Gajendra Dadheech
I have a requirement where i need to find matching docsets for different queries and then do either union or intersection on those docsets. e.g : DocSet docset1 = Searcher.getDocSet(query1) DocSet docset2 = Searcher.getDocSet(query2); Docset finalDocset = docset1.intersection(docset2); Is this

Re: Answer engine - NLP related question

2015-05-04 Thread Upayavira
What you seem to be asking for is POS (parts of speech) analysis. You can use OpenNLP to do that for you, likely outside of Solr. OpenNLP will identify nouns, verbs, etc in your sentences. The question is, can you identify certain of those types to be filtered out from your queries? A simple bit

Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina
Dear Solr users, I have a problem with SOLR5.0 (and not on SOLR3.6) What kind of field can I use for my uniqueKey field named code if I want it case insensitive ? On SOLR3.6, I defined a string_ci field like this: fieldType name=string_ci class=solr.TextField sortMissingLast=true

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Chris Hostetter
: On SOLR3.6, I defined a string_ci field like this: : : fieldType name=string_ci class=solr.TextField : sortMissingLast=true omitNorms=true : analyzer : tokenizer class=solr.KeywordTokenizerFactory/ : filter class=solr.LowerCaseFilterFactory/ : /analyzer : /fieldType :

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina
Hello Chris, yes I confirm on my SOLR3.6 it works fine since several years, and each doc added with same code is updated not added. To be more clear, I receive docs with a field name pn and it's the uniqueKey, and it always in uppercase so I must define in my schema.xml field name=id

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Chris Hostetter
XY-ish problem -- if you are deleting a bunch of documents by id, why have you switched from using delete-by-id to using delete-by-query? What drove that decision? Did you try using delete-by-query in your 3.6 setup? : my f1 field is my key field. It is unique. ... : On my old solr

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis
Hi Shawn, Thanks for your inputs. The 12GB is for solr. I did read through your wiki and your G1 related recommended settings are already included. Tried a lower memory config (7G) as well and it did not result in any better results. Right now, in the process of changing the updates to use Solrj

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Shawn Heisey
On 5/4/2015 2:36 PM, Vinay Pothnis wrote: But nonetheless, we will give the latest solrJ client + cloudSolrServer a try. * Yes, the documents are pretty small. * We are using G1 collector and there are no major GCs, but however, there are a lot of minor GCs sometimes going upto 2s per minute

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Shawn Heisey
On 5/4/2015 1:50 PM, Tim Dunphy wrote: However it sounds like you're sure it's supposed to work this way. Can I get some advice on this error? If you tried copying JUST the .war file with any version from 4.3 on, something similar would happen. At the request of many of our more advanced

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis
Hi Erick, Thanks for your inputs. I think long before we had made a conscious decision to skip solrJ client and use plain http. I think it might have been because at the time solrJ client was queueing update in its memory or something. But nonetheless, we will give the latest solrJ client +

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Chris Hostetter
: I need to run solr 5.1.0 on port 80 with some basic apache authentication. : Normally, under earlier versions of solr I would set it up to run under : tomcat, then connect it to apache web server using mod_jk. the general gist of what you should look into is running Solr (via ./bin/solr) on

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy
The container in the default 5.x install is a completely unmodified Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config. The config for Jetty is similar to tomcat, you just need to figure out how to make it work with Apache like you would with Tomcat. Incidentially, at