Re: Tomcat creates a thread for each SOLR core

2014-04-15 Thread Atanas Atanasov
Hello again, Current situation is, after setting the two options in order not to load the cores on start up and ramBufferSizeMB=32 Tomcat is stable, responsive, threads reach 60 as a maximum. Browsing and storing are fast. I should note that I have many cores with small amount of documents.

[ANNOUNCE] Apache Solr 4.7.2 released.

2014-04-15 Thread Robert Muir
April 2014, Apache Solr™ 4.7.2 available The Lucene PMC is pleased to announce the release of Apache Solr 4.7.2 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted

Re: filter capabilities are limited?

2014-04-15 Thread horot
Variables over which the comparison is a string data type. I can not apply to them or mathematical functions needed to perform the conversion type (string to integer). Will I be able to build a circuit without changing a filter? -- View this message in context:

Re: Solr join and lucene scoring

2014-04-15 Thread mm
Thank you for the clarification. We really need scoring with solr joins, but as you can see I'm not a specialist in solr development. We would like to hire somebody with more experience to write a qparser plugin for scoring in joins and donate the source code to the community. Any

Autocomplete with Case-insensitive feature

2014-04-15 Thread Sunayana
Hi All, I have been trying out this autocomplete feature in Solr4.7.1 using Suggester.I have configured it to display phrase suggestions also.Problem is If I type game I get suggestions as game or phrases containing game. But If I type Game *no suggestion is displayed at all*.How can I get

Re: Autocomplete with Case-insensitive feature

2014-04-15 Thread Dmitry Kan
Hi, Configure LowerCaseFilterFactory into the query side of your type config. Dmitry On Tue, Apr 15, 2014 at 10:50 AM, Sunayana sunayana...@wipro.com wrote: Hi All, I have been trying out this autocomplete feature in Solr4.7.1 using Suggester.I have configured it to display phrase

Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-04-15 Thread Salman Akram
Looking at this, sharding seems to be best and simple option to handle such queries. On Wed, Apr 2, 2014 at 1:26 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Salman, Let's me drop few thoughts on

Re: Autocomplete with Case-insensitive feature

2014-04-15 Thread Sunayana
Hi, Did u mean changing field type as fieldType name=text_auto class=solr.TextField positionIncrementGap=100 indexed=true stored=false multiValued=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/

Indexing Big Data With or Without Solr

2014-04-15 Thread Vineet Mishra
Hi All, I have worked with Solr 3.5 to implement real time search on some 100GB data, that worked fine but was little slow on complex queries(Multiple group/joined queries). But now I want to index some real Big Data(around 4 TB or even more), can SolrCloud be solution for it if not what could be

Re: Class not found ICUFoldingFilter (SOLR-4852)

2014-04-15 Thread Ronak Kirit
Hello Shawn, Thanks for your reply. Yes, I have defined ${solr.solr.home} explicitly, and all the mentioned jars present in ${solr.solr.home}/lib. solr.log also shows that those files are getting added once (grep icu4 solr.log). I could see the lines in log, INFO - 2014-04-15 15:40:21.448;

Re: Class not found ICUFoldingFilter (SOLR-4852)

2014-04-15 Thread ronak kirit
Hello Shawn, Thanks for your reply. Yes, I have defined ${solr.solr.home} explicitly, and all the mentioned jars present in ${solr.solr.home}/lib. solr.log also shows that those files are getting added once (grep icu4 solr.log). I could see the lines in log, INFO - 2014-04-15 15:40:21.448;

Re: Error Arising from when I start to crawl

2014-04-15 Thread Cihad Guzel
Hi Ridwan, This error is not related to Solr. Solr is used in IndexerJob for Nutch. This error is thrown from InjectorJob. It is related Nutch and Gora. You check your hbase and nutch configuration. You ensure the HBase run correctly and to use the correct version. For more accurate information,

Re: Analysis Tool Not Working for CharFilterFactory?

2014-04-15 Thread Alexandre Rafalovitch
Which version of Solr. I think there was a bug in ui. You can check network traffic to confirm. On 15/04/2014 5:32 pm, Steve Huckle steve.huc...@gmail.com wrote: I have used a CharFilterFactory in my schema.xml for fileType text_general, so that queries for cafe and café return the same

Re: multiple analyzers for one field

2014-04-15 Thread Michael Sokolov
A blog post is a great idea, Alex! I think I should wait until I have a complete end-to-end implementation done before I write about it though, because I'd also like to include some tips about configuring the new suggesters with Solr (the documentation on the wiki hasn't quite caught up yet,

Re: multiple analyzers for one field

2014-04-15 Thread Alexandre Rafalovitch
Your call, though from experience thus sounds like either two or no blog posts. I certainly have killed a bunch of good articles by waiting for perfection:-) On 15/04/2014 7:01 pm, Michael Sokolov msoko...@safaribooksonline.com wrote: A blog post is a great idea, Alex! I think I should wait

Re: Indexing Big Data With or Without Solr

2014-04-15 Thread Furkan KAMACI
Hi Vineet; I've been using SolrCloud for such kind of Big Data and I think that you should consider to use it. If you have any problems you can ask it here. Thanks; Furkan KAMACI 2014-04-15 13:20 GMT+03:00 Vineet Mishra clearmido...@gmail.com: Hi All, I have worked with Solr 3.5 to

Bug within the solr query parser (version 4.7.1)

2014-04-15 Thread Johannes Siegert
Hi, I have updated my solr instance from 4.5.1 to 4.7.1. Now the parsed query seems to be not correct. Query: /*q=*:*fq=title:TEdebug=true */ Before the update the parsed filter query is */+title:te +title:t +title:e/*. After the update the parsed filter query is */+((title:te

clusterstate.json does not reflect current state of down versus active

2014-04-15 Thread Rich Mayfield
Solr 4.7.1 I am trying to orchestrate a fast restart of a SolrCloud (4.7.1). I was hoping to use clusterstate.json would reflect the up/down state of each core as well as whether or not a given core was leader. clusterstate.json is not kept up to date with what I see going on in my logs though -

Re: clusterstate.json does not reflect current state of down versus active

2014-04-15 Thread Shawn Heisey
On 4/15/2014 8:58 AM, Rich Mayfield wrote: I am trying to orchestrate a fast restart of a SolrCloud (4.7.1). I was hoping to use clusterstate.json would reflect the up/down state of each core as well as whether or not a given core was leader. clusterstate.json is not kept up to date with what

Re: Empty documents in Solr\lucene 3.6

2014-04-15 Thread Shawn Heisey
On 4/15/2014 9:41 AM, Alexey Kozhemiakin wrote: We've faced a strange data corruption issue with one of our clients old solr setup (3.6). When we do a query (id:X OR id:Y) we get 2 nodes, one contains normal doc data, another is empty (doc /). We've looked inside lucene index using Luke -

Empty documents in Solr\lucene 3.6

2014-04-15 Thread Alexey Kozhemiakin
Dear Community, We've faced a strange data corruption issue with one of our clients old solr setup (3.6). When we do a query (id:X OR id:Y) we get 2 nodes, one contains normal doc data, another is empty (doc /). We've looked inside lucene index using Luke - same story, one of documents is

Race condition in Leader Election

2014-04-15 Thread Rich Mayfield
I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes. I too am wondering - if I force all leaders onto one node, then shut down both, then start up the node with all of the leaders on it first, then

RE: Empty documents in Solr\lucene 3.6

2014-04-15 Thread Alexey Kozhemiakin
The system was up and running for long time(months) without any updates. There was no crashes for sure, at least support team says so. Logs indicate that at some point there was not enough disk space (caused by weekend index optimization). Were there any other similar cases or it's unique for

Re: Race condition in Leader Election

2014-04-15 Thread Mark Miller
We have to fix that then. --  Mark Miller about.me/markrmiller On April 15, 2014 at 12:20:03 PM, Rich Mayfield (mayfield.r...@gmail.com) wrote: I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader election process. Roughly 30 minutes.

Re: Empty documents in Solr\lucene 3.6

2014-04-15 Thread Shawn Heisey
On 4/15/2014 10:22 AM, Alexey Kozhemiakin wrote: The system was up and running for long time(months) without any updates. There was no crashes for sure, at least support team says so. Logs indicate that at some point there was not enough disk space (caused by weekend index optimization).

Re: What's the actual story with new morphline and hadoop contribs?

2014-04-15 Thread Wolfgang Hoschek
The solr morphline jars are integrated with solr by way of the solr specific solr/contrib/map-reduce module. Ingestion from Flume into Solr is available here: http://flume.apache.org/FlumeUserGuide.html#morphlinesolrsink FWIW, for our purposes we see no role for DataImportHandler anymore.

Re: What is Overseer?

2014-04-15 Thread Chris Hostetter
: So, is Overseer really only an implementation detail or something that Solr : Ops guys need to be very aware of? Most people don't ever need to worry about the overseer - it's magic and it will take care of itself. The recent work on adding support for an overseer role in 4.7 was

cache warming questions

2014-04-15 Thread Matt Kuiper
Hello, I have a few questions regarding how Solr caches are warmed. My understanding is that there are two ways to warm internal Solr caches (only one way for document cache and lucene FieldCache): Auto warming - occurs when there is a current searcher handling requests and new searcher is

Re: Question regarding solrj

2014-04-15 Thread Prashant Golash
Sorry for not replying!!! It was wrong version of solrj that client was using (As it was third-party code, we couldn't find out earlier). After fixing the version, things seem to be working fine. Thanks for your response!!! On Sun, Apr 13, 2014 at 7:26 PM, Erick Erickson

Distributed commits in CloudSolrServer

2014-04-15 Thread Peter Keegan
I have a SolrCloud index, 1 shard, with a leader and one replica, and 3 ZKs. The Solr indexes are behind a load balancer. There is one CloudSolrServer client updating the indexes. The index schema includes 3 ExternalFileFields. When the CloudSolrServer client issues a hard commit, I observe that

Re: multiple analyzers for one field

2014-04-15 Thread Michael Sokolov
Ha! You were right. Thanks for the nudge; here's my post: http://blog.safariflow.com/2014/04/15/search-suggestions-with-solr-2/ there's code at http://github.com/safarijv/ifpress-solr-plugin cheers -Mike On 04/15/2014 08:18 AM, Alexandre Rafalovitch wrote: Your call, though from experience

Re: Empty documents in Solr\lucene 3.6

2014-04-15 Thread Dmitry Kan
Alexey, 1. Can you take a backup of the index and run the index checker with -fix option? Will it modify the index at all? 2. Are all the missing fields configured as stored? Are they marked as required in the schema or optional? Dmitry On Tue, Apr 15, 2014 at 7:22 PM, Alexey Kozhemiakin

Re: What is Overseer?

2014-04-15 Thread Jack Krupansky
I should have suggested three levels in my question: 1) important to average users, 2) expert-only, and 3) internal implementation detail. Yes, expert-only does have a place, but it is good to mark features as such. -- Jack Krupansky -Original Message- From: Chris Hostetter Sent:

Re: Distributed commits in CloudSolrServer

2014-04-15 Thread Mark Miller
Inline responses below. --  Mark Miller about.me/markrmiller On April 15, 2014 at 2:12:31 PM, Peter Keegan (peterlkee...@gmail.com) wrote: I have a SolrCloud index, 1 shard, with a leader and one replica, and 3  ZKs. The Solr indexes are behind a load balancer. There is one  CloudSolrServer

Transformation on a numeric field

2014-04-15 Thread Jean-Sebastien Vachon
Hi All, I am looking for a way to index a numeric field and its value divided by 1 000 into another numeric field. I thought about using a CopyField with a PatternReplaceFilterFactory to keep only the first few digits (cutting the last three). Solr complains that I can not have an analysis

Re: Transformation on a numeric field

2014-04-15 Thread Rafał Kuć
Hello! You can achieve that using update processor, for example look here: http://wiki.apache.org/solr/ScriptUpdateProcessor What you would have to do, in general, is create a script that would take a value of the field, divide it by the 1000 and put it in another field - the target numeric

Re: Transformation on a numeric field

2014-04-15 Thread Jack Krupansky
You can use an update processor. The stateless script update processor will let you write arbitrary JavaScript code, which can do this calculation. You should be able to figure it out from the wiki: http://wiki.apache.org/solr/ScriptUpdateProcessor My e-book has plenty of script examples for

Odd extra character duplicates in spell checking

2014-04-15 Thread Ed Smiley
Hi, I am going to make this question pretty short, so I don’t overwhelm with technical details until the end. I suspect that some folks may be seeing this issue without the particular configuration we are using. What our problem is: 1. Correctly spelled words are returning as not spelled

Re: svn vs GIT

2014-04-15 Thread Jeff Wartes
I guess I should¹ve double-checked it was still the case before saying anything, but I¹m glad to be proven wrong. Yes, it worked nicely for me when I tried today, which should simplify my life a bit. On 4/14/14, 4:35 PM, Shawn Heisey s...@elyograg.org wrote: On 4/14/2014 12:56 PM, Ramkumar R.

Re: cache warming questions

2014-04-15 Thread Erick Erickson
bq: What does it mean that items will be regenerated or prepopulated from the current searcher's cache... You're right, the values aren't cached. They can't be since the internal Lucene document id is used to identify docs, and due to merging the internal ID may bear no relation to the old

Re: More Robust Search Timeouts (to Kill Zombie Queries)?

2014-04-15 Thread Steve Davids
I have also experienced a similar problem on our cluster, I went ahead and opened SOLR-5986 to track the issue. I know Apache Blur has implemented a mechanism to kill these long running term enumerations, would be fantastic if Solr can get a similar mechanism. -Steve On Apr 15, 2014, at 5:23

Tipping point of solr shards (Num of docs / size)

2014-04-15 Thread Mukesh Jha
Hi Gurus, In my solr cluster I've multiple shards and each shard containing ~500,000,000 documents total index size being ~1 TB. I was just wondering how much more can I keep on adding to the shard before we reach a tipping point and the performance starts to degrade? Also as best practice what

Re: deleting large amount data from solr cloud

2014-04-15 Thread Vinay Pothnis
Another update: I removed the replicas - to avoid the replication doing a full copy. I am able delete sizeable chunks of data. But the overall index size remains the same even after the deletes. It does not seem to go down. I understand that Solr would do this in background - but I don't seem to

Re: Tipping point of solr shards (Num of docs / size)

2014-04-15 Thread Vinay Pothnis
You could look at this link to understand about the factors that affect the solrcloud performance: http://wiki.apache.org/solr/SolrPerformanceProblems Especially, the sections about RAM and disk cache. If the index grows too big for one node, it can lead to performance issues. From the looks of

Re: Tipping point of solr shards (Num of docs / size)

2014-04-15 Thread Mukesh Jha
My index size per shard varies b/w 250 GB to 1 TB. The cluster is performing well even now but thought it's high time to change it, so that a shard doesn't get too big On Wed, Apr 16, 2014 at 10:25 AM, Vinay Pothnis poth...@gmail.com wrote: You could look at this link to understand about the