Re: Issue with multivalued fields in UIMA

2014-10-30 Thread 5ton3
I had to overcome this issue, as I needed to analyze multivalued fields. The fact that UIMA don't analyse multivalued fields is a known bug in UIMA. With the help of Maryam, I solved the issue. The JIRA issue, along with a working patch, can be found here:

issue related to blank value in datefield

2014-10-30 Thread Aman Tandon
Hi, I wants to set -00-00T00:00:00Z value for date field where I do not have the value. When the index the at field with value as desired it is getting indexed as 0002-11-30T00:00:00Z. What is the reason behind this? With Regards Aman Tandon

Re: Solr Memory Usage

2014-10-30 Thread Toke Eskildsen
On Wed, 2014-10-29 at 23:37 +0100, Will Martin wrote: This command only touches OS level caches that hold pages destined for (or not) the swap cache. Its use means that disk will be hit on future requests, but in many instances the pages were headed for ejection anyway. It does not have

prefix length in fuzzy search solr 4.10.1

2014-10-30 Thread elisabeth benoit
Hello all, Is there a parameter in solr 4.10.1 api allowing user to fix prefix length in fuzzy search. Best regards, Elisabeth

Re: Sharding configuration

2014-10-30 Thread Anca Kopetz
Hi, We did some tests with 4 shards / 4 different tomcat instances on the same server and the average latency was smaller than the one when having only one shard. We tested also é shards on different servers and the performance results were also worse. It seems that the sharding does not make

Re: SolrCloud : node recovery fails with No registered leader was found

2014-10-30 Thread yann180
Hi guys, just wondering if any solution was found for this? I have a similar problem - Solr 4.7.2, 2-server cloud, single replicated shard. At random times one of the server dies with a the same message as in the title of this thread. I was hoping there might be a solution? (upgrading Solr is

Re: Exporting Error in 4.10.1

2014-10-30 Thread Dmitry Kan
Hi, Luke has a feature of index exporting, given that output format suits your needs (xml). https://github.com/DmitryKey/luke/releases/tag/luke-4.10.1 http://dmitrykan.blogspot.fi/2014/09/exporting-lucene-index-to-xml-with-luke.html It does not have the option to export select fields only,

Re: phrase query in solr 4

2014-10-30 Thread Dmitry Kan
On top of what Shawn rightly said, two things: 1. Try to benchmark yourself (best bet) solution with and without the shingles. Then you know better and have story with numbers to tell. 2. If you go with the shingles approach, consider removing duplicates with

Re: Score phrases higher than the records containing the words?

2014-10-30 Thread hschillig
How can I tell if the stop words is resolved? This is what I get when I turn debugging on: http://apaste.info/0Uz http://apaste.info/0Uz When I put: q=title:(what if) OR title:what if^10 I get this: rawquerystring: title:(what if) OR title:\what if\^10, querystring: title:(what if) OR

Re: Score phrases higher than the records containing the words?

2014-10-30 Thread hschillig
Edit: I filtered my query to author:randall so I could see the score that it's getting from the query. This is the score of the record that contains what if: score: 0.004032644 The other two books are getting this score: score: 0.0069850935 So... the boost is obviously not hitting that record. I

Re: Exporting Error in 4.10.1

2014-10-30 Thread Joseph Obernberger
Thank you Dmitry. Any ideas why the Solr /export is not working for me? I forgot to mention that this is Solr Cloud. I believe I've defined the field correctly, and I've also tried using another field (title), but I get the same error: Title must have DocValues to use this feature.. My goal is

Re: Exporting Error in 4.10.1

2014-10-30 Thread Joel Bernstein
Solr 4.10 is the very first release of the export feature. It does require that all fields being sorted and exported have docValues = true in the schema. This is likely to change in the future, but DocValues will likely always provide the best indexing option for sorting and exporting full result

Re: Sharding configuration

2014-10-30 Thread Shawn Heisey
On 10/30/2014 4:32 AM, Anca Kopetz wrote: We did some tests with 4 shards / 4 different tomcat instances on the same server and the average latency was smaller than the one when having only one shard. We tested also é shards on different servers and the performance results were also worse.

Design optimal Solr Schema

2014-10-30 Thread tomas.kalas
Hello i have problem with design of schema in Solr. I have a transcript of a telephone conversation in this format. I parse it at individual fields. I have this schema: ?xml version=1.0? add doc field name=id01.cn/field field name=t0br / 1br / 2br / 2 br / 3 br / /field field name=st0.00br /

Re: Design optimal Solr Schema

2014-10-30 Thread Jorge Luis Betancourt González
Are you going to use the values stored on Solr to display the data in HTML? For searching purposes I suggest to delete all the HTML tags, and store the plain text, for this you could use the HTMLStripCharFilterFactory char filter, this will clean your content and only pass the actual text which

Re: Sharding configuration

2014-10-30 Thread Anca Kopetz
Hi, You are right, it is a mistake in my phrase, for the tests with 4 shards/ 4 instances, the latency was worse (therefore *bigger*) than for the tests with one shard. In our case, the query rate is high. Thanks, Anca On 10/30/2014 03:48 PM, Shawn Heisey wrote: On 10/30/2014 4:32 AM, Anca

Re: Score phrases higher than the records containing the words?

2014-10-30 Thread hschillig
The other ones are still rating higher. I think it's because the other two titles contain what 3 times.. the more it says what, the higher it scores. I'm not sure what else can be done. Does anybody else have any ideas? -- View this message in context:

Re: facet on field aliases of same field

2014-10-30 Thread Dan Field
Thanks Michael. We’re looking into the use of localparams now. On 29 Oct 2014, at 12:56, Michael Ryan mr...@moreover.com wrote: It is indeed possible. Just need to use a different syntax. As far as I know, the facet parameters need to be local parameters, like this...

Re: Design optimal Solr Schema

2014-10-30 Thread Alexandre Rafalovitch
I am afraid, it is not very clear what you are trying to do here (the sentence below). Could you explain again the business level results. Are you trying to search for words within particular given time range? Can those words span the segments? Or are you trying to find segments with all their

Re: Slow forwarding requests to collection leader

2014-10-30 Thread Matt Hilt
Thanks for the info Daniel. I will go forth and make a better client. On Oct 29, 2014, at 2:28 AM, Daniel Collins danwcoll...@gmail.com wrote: I kind of think this might be working as designed, but I'll be happy to be corrected by others :) We had a similar issue which we discovered by

Solr And query

2014-10-30 Thread vsriram30
Hi All, This might be a simple question. I tried to find a solution, but not exactly finding what I want. I have the following fields f1, f2 and f3. I want to do an AND query in these fields. If I want to search for single word in these 3 fields, then I am facing no problem. I can simply

Re: Migrating cloud to another set of machines

2014-10-30 Thread Jakov Sosic
On 10/30/2014 04:47 AM, Otis Gospodnetic wrote: Hi/Bok Jakov, 2) sounds good to me. It means no down-time. 1) means stoppage. If stoppage is not OK, but falling behind with indexing new content is OK, you could: * add a new cluster * start reading from old index and indexing into the new

Re: issue related to blank value in datefield

2014-10-30 Thread Chris Hostetter
Solr has never really owrked well with years prior to 1 because the specs for how they should be formated/parsed -- in particular realted to year 0 have always been painfully ambiguious/contradictory. https://issues.apache.org/jira/browse/SOLR-2773 If you are really trying to deal with year 0

Boosting on field-not-empty

2014-10-30 Thread Håvard Wahl Kongsgård
Hi, a simple question how to boost field-not-empty. For some reasons solr(4.6) returns rows with empty fields first (while the fields are not part of the search query). I came across this old thread http://grokbase.com/t/lucene/solr-user/125e4yenha/boosting-on-field-empty-or-not , but no solution

Automating Solr

2014-10-30 Thread Craig Hoffman
Simple question: What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? Thanks, Craig -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman

Re: Automating Solr

2014-10-30 Thread Alexandre Rafalovitch
You don't reindex Solr. You reindex data into Solr. So, this depends where you data is coming from and how often it changes. If the data does not change, no point re-indexing it. And how do you get the data into the Solr in the first place? Regards, Alex. Personal:

Re: Automating Solr

2014-10-30 Thread Craig Hoffman
Right, of course. The data changes every few days. According to this article, you can run a CRON Job to create a new index. http://www.finalconcept.com.au/article/view/apache-solr-hints-and-tips On Thu, Oct 30, 2014 at 12:04 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: You don't reindex

Re: Automating Solr

2014-10-30 Thread Craig Hoffman
The data gets into Solr via MySQL script. -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 12:11 PM, Craig Hoffman mountain@gmail.com wrote: Right, of course. The

Re: Automating Solr

2014-10-30 Thread Håvard Wahl Kongsgård
Then you have to run it again and again 30. okt. 2014 19:18 skrev Craig Hoffman mountain@gmail.com følgende: The data gets into Solr via MySQL script. -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW:

Re: Automating Solr

2014-10-30 Thread Alexandre Rafalovitch
Do you mean DataImportHandler? If so, you can create full and incremental queries and trigger them - from CRON - as often as you would like. E.g. 1am nightly. Regards, Alex. On 30 October 2014 14:17, Craig Hoffman mountain@gmail.com wrote: The data gets into Solr via MySQL script.

Re: Automating Solr

2014-10-30 Thread Ramzi Alqrainy
Simple add this line to your crontab with crontab -e command: 0,30 * * * * /usr/bin/wget http://solr_host:8983/solr/core_name/dataimport?command=full-import This will full import every 30 minutes. Replace solr_host and core_name with your configuration *Using delta-import command* Delta

Re: Boosting on field-not-empty

2014-10-30 Thread Ramzi Alqrainy
You can use FunctionQuery that allows one to use the actual value of a field and functions of those fields in a relevancy score. Two function will help you, which are : *exists* exists(field|function) returns true if a value exists for a given document. Example use: exists(myField) will return

Re: Automating Solr

2014-10-30 Thread Craig Hoffman
Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true -- Craig Hoffman w:

Re: Automating Solr

2014-10-30 Thread Michael Della Bitta
You probably just need to put double quotes around the url. On 10/30/14 15:27, Craig Hoffman wrote: Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host

Re: Automating Solr

2014-10-30 Thread Shawn Heisey
On 10/30/2014 1:27 PM, Craig Hoffman wrote: Thanks! One more question. WGET seems to choking on a my URL in particular the # and the character . What’s the best method escaping? http://My Host :8983/solr/#/articles/dataimport//dataimport?command=full-importclean=trueoptimize=true Putting

Re: Automating Solr

2014-10-30 Thread Craig Hoffman
Thanks everyone. I got it working. -- Craig Hoffman w: http://www.craighoffmanphotography.com FB: www.facebook.com/CraigHoffmanPhotography TW: https://twitter.com/craiglhoffman On Oct 30, 2014, at 1:48 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/30/2014 1:27 PM, Craig

Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Ian Rose
Howdy all - The short version is: We are not seeing Solr Cloud performance scale (event close to) linearly as we add nodes. Can anyone suggest good diagnostics for finding scaling bottlenecks? Are there known 'gotchas' that make Solr Cloud fail to scale? In detail: We have used Solr (in

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Shawn Heisey
On 10/30/2014 2:23 PM, Ian Rose wrote: My methodology is as follows. 1. Start up a K solr servers. 2. Remove all existing collections. 3. Create N collections, with numShards=K for each. 4. Start load testing. Every minute, print the number of successful updates and the number of failed

Re: Boosting on field-not-empty

2014-10-30 Thread Håvard Wahl Kongsgård
Thanks :) On Thu, Oct 30, 2014 at 7:49 PM, Ramzi Alqrainy ramzi.alqra...@gmail.com wrote: You can use FunctionQuery that allows one to use the actual value of a field and functions of those fields in a relevancy score. Two function will help you, which are : *exists*

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Ian Rose
If you want to increase QPS, you should not be increasing numShards. You need to increase replicationFactor. When your numShards matches the number of servers, every single server will be doing part of the work for every query. I think this is true only for actual queries, right? I am

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Matt Hilt
If you are issuing writes to shard non-leaders, then there is a large overhead for the eventual redirect to the leader. I noticed a 3-5 times performance increase by making my write client leader aware. On Oct 30, 2014, at 2:56 PM, Ian Rose ianr...@fullstory.com wrote: If you want to

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Shawn Heisey
On 10/30/2014 2:56 PM, Ian Rose wrote: I think this is true only for actual queries, right? I am not issuing any queries, only writes (document inserts). In the case of writes, increasing the number of shards should increase my throughput (in ops/sec) more or less linearly, right? No, that

Missing Records

2014-10-30 Thread AJ Lemke
Hi All, We have a SOLR cloud instance that has been humming along nicely for months. Last week we started experiencing missing records. Admin DIH Example: Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s) A *:* search claims that there are only 903,902 this is the first full

Re: Indexing documents/files for production use

2014-10-30 Thread Olivier Austina
Thank you Alexandre, Jürgen and Erick for your replies. It is clear for me. Regards Olivier 2014-10-28 23:35 GMT+01:00 Erick Erickson erickerick...@gmail.com: And one other consideration in addition to the two excellent responses so far In a SolrCloud environment, SolrJ via

Re: Missing Records

2014-10-30 Thread S.L
I am curious , how many shards do you have and whats the replication factor you are using ? On Thu, Oct 30, 2014 at 5:27 PM, AJ Lemke aj.le...@securitylabs.com wrote: Hi All, We have a SOLR cloud instance that has been humming along nicely for months. Last week we started experiencing

Master Slave set up in Solr Cloud

2014-10-30 Thread S.L
Hi All, As I previously reported due to no overlap in terms of the documets in the SolrCloud replicas of the index shards , I have turned off the replication and basically have there shards with a replication factor of 1. It obviously seems will not be scalable due to the fact that the same core

Re: Migrating cloud to another set of machines

2014-10-30 Thread Otis Gospodnetic
I think ZK stuff may actually be easier to handle, no? Add new ones to the existing ZK cluster and then remove the old ones. Won't this work smoothly? Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Thu, Oct

Re: Solr And query

2014-10-30 Thread vsriram30
Actually I found out how to form the query. I just need to use, q=f1:(word1 word2) AND f2:(word3 word4) AND f3:(word5 word6) Thanks, V.Sriram -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166744.html Sent from the Solr - User mailing list

Re: Sharding configuration

2014-10-30 Thread Erick Erickson
This is not too surprising. There are additional hops necessary for a cloud setup. This is the sequence, let's say there are 4 shards and the rows parameter on the query is 10 and you're sorting by score node1 receives request. node1 sends the request out to each shard node1 receives the top 10

Re: Score phrases higher than the records containing the words?

2014-10-30 Thread Erick Erickson
So what happens if you increase the boost to 100? or 20? The problem is that boosting will always be more art than science. What about the other 3 possibilities I mentioned? Basically, you have to tweak things to fit your corpus, and it's often an empirically determined thing. Best, Erick On

Re: Slow forwarding requests to collection leader

2014-10-30 Thread Erick Erickson
Matt: You might want to look at SolrJ, in particular with the use of CloudSolrServer. The big benefit here is that it'll route the docs to the correct leader for each shard rather than relying on the nodes to communicate with each other. Here's a SolrJ example. NOTE: it used

Re: Boosting on field-not-empty

2014-10-30 Thread Erick Erickson
bq: ...while the fields are not part of the search query I'm really confused. The presence or absence of fields that aren't part of the search should be totally irrelevant to scoring. Are you perhaps sorting by a different field? It'd help if you showed us the query you're sending, a sample of

Re: issue related to blank value in datefield

2014-10-30 Thread Aman Tandon
Hi Chris, Thanks for replying. but if your goal, as you said, is to index -00-00T00:00:00Z for documenst that have no value in the date field -- i have to ask why? I was just trying to index the fields returned by my msql and i found this issue. So i asked in the group. Sorry for writing

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Erick Erickson
Your indexing client, if written in SolrJ, should use CloudSolrServer which is, in Matt's terms leader aware. It divides up the documents to be indexed into packets that where each doc in the packet belongs on the same shard, and then sends the packet to the shard leader. This avoids a lot of

Re: Missing Records

2014-10-30 Thread Erick Erickson
First question: Is there any possibility that some of the docs have duplicate IDs (uniqueKeys)? If so, then some of the docs will be replaced, which will lower your returns. One way to figuring this out is to go to the admin screen and if numDocs maxDoc, then documents have been replaced. Also,

Re: Migrating cloud to another set of machines

2014-10-30 Thread Erick Erickson
Jakov: Be particularly aware of the ADDREPLICA collections API command here: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica That allows you to specify exactly which node the new replica should be on, so you can force it to be on the new HW. Here's

Re: issue related to blank value in datefield

2014-10-30 Thread Chris Hostetter
: I was just trying to index the fields returned by my msql and i found this If you are importing dates from MySql where you have -00-00T00:00:00Z as the default value, you should actaully be getting an error lsat time i checked, but this explains the right way to tell the MySQL JDBC

Re: Solr And query

2014-10-30 Thread Erick Erickson
Right, but do be aware of one thing. The form f1:(word1 word2) has an implicit OR between them based on q.op which is specified in your solrconfig.xml file for the request handler you're using. This is no problem, but if you ever specify q.op as AND either in solrconfig.xml or as an explicit

Re: Slow forwarding requests to collection leader

2014-10-30 Thread CP Mishra
+1 for CloudSolrServer CloudSolrServer also has built in fault tolerance (i.e. if the master shard is not reachable then it adds to the replica) and much better error reporting than ConcurrentUpdateSolrServer. The only downside is lack of batching. As long as you are adding documents in decent

Re: Solr And query

2014-10-30 Thread vsriram30
Thanks Eric. I tried q.op=AND and noticed that it is equivalent to specifying, q=f1:word1 word2 AND f2:word3 word4 AND f3:word5 word6 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-And-query-tp4166685p4166760.html Sent from the Solr - User mailing list archive at

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Ian Rose
Thanks for the suggestions so for, all. 1) We are not using SolrJ on the client (not using Java at all) but I am working on writing a smart router so that we can always send to the correct node. I am certainly curious to see how that changes things. Nonetheless even with the overhead of extra

Re: Solr And query

2014-10-30 Thread Erick Erickson
U. That may be true for your particular example data set, but not in the general case, so don't be fooled. q.op=AND is equivalent to q=f1:(word1 AND word2) AND f2:(word3 AND word4) AND f3:(word5 AND word6) This query q=f1:word1 word2 AND f2:word3 word4 AND f3:word5 word6 would not match a

Re: Ideas for debugging poor SolrCloud scalability

2014-10-30 Thread Erick Erickson
I'm really confused: bq: I am not issuing any queries, only writes (document inserts) bq: It's clear that once the load test client has ~40 simulated users bq: A cluster of 3 shards over 3 Solr nodes *should* support a higher QPS than 2 shards over 2 Solr nodes, right QPS is usually used to