Re: Bad fieldNorm when using morphologic synonyms

2013-12-26 Thread Isaac Hebsh
Attached patch into the JIRA issue. Reviews are welcome. On Thu, Dec 19, 2013 at 7:24 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Roman, do you have any results? created SOLR-5561 Robert, if I'm wrong, you are welcome to close that issue. On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh

Re: LocalParam for nested query without escaping?

2013-12-19 Thread Isaac Hebsh
created SOLR-5560 On Tue, Dec 10, 2013 at 8:48 AM, William Bell billnb...@gmail.com wrote: Sounds like a bug. On Mon, Dec 9, 2013 at 1:16 PM, Isaac Hebsh isaac.he...@gmail.com wrote: If so, can someone suggest how a query should be escaped (securely and correctly)? Should I escape

Re: Bad fieldNorm when using morphologic synonyms

2013-12-19 Thread Isaac Hebsh
Roman, do you have any results? created SOLR-5561 Robert, if I'm wrong, you are welcome to close that issue. On Mon, Dec 9, 2013 at 10:50 PM, Isaac Hebsh isaac.he...@gmail.com wrote: You can see the norm value, in the explain text, when setting debugQuery=true. If the same item gets

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Isaac Hebsh
Hi Robert and Manuel. The DefaultSimilarity indeed sets discountOverlap to true by default. BUT, the *factory*, aka DefaultSimilarityFactory, when called by IndexSchema (the getSimilarity method), explicitly sets this value to the value of its corresponding class member. This class member is

Re: LocalParam for nested query without escaping?

2013-12-09 Thread Isaac Hebsh
If so, can someone suggest how a query should be escaped (securely and correctly)? Should I escape the quote mark (and backslash mark itself) only? On Fri, Dec 6, 2013 at 2:59 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Obviously, there is the option of external parameter ({... v=$nestedq

Re: Global query parameters to facet query

2013-12-09 Thread Isaac Hebsh
created SOLR-5542. Anyone else want it? On Thu, Dec 5, 2013 at 8:55 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, It seems that a facet query does not use the global query parameters (for example, field aliasing for edismax parser). We have an intensive use of facet queries (in some

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Isaac Hebsh
, Isaac Hebsh isaac.he...@gmail.comjavascript:; wrote: Hi Robert and Manuel. The DefaultSimilarity indeed sets discountOverlap to true by default. BUT, the *factory*, aka DefaultSimilarityFactory, when called by IndexSchema (the getSimilarity method), explicitly sets this value

Re: Bad fieldNorm when using morphologic synonyms

2013-12-06 Thread Isaac Hebsh
like its broken. On Thu, Dec 5, 2013 at 1:53 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem (on the same position, of course). The stemming is done

LocalParam for nested query without escaping?

2013-12-06 Thread Isaac Hebsh
We want to set a LocalParam on a nested query. When quering with v inline parameter, it works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\} the parsedquery_toString is +id:TERM1 +(text:term2

Re: LocalParam for nested query without escaping?

2013-12-06 Thread Isaac Hebsh
Obviously, there is the option of external parameter ({... v=$nestedq}nestedq=...) This is a good solution, but it is not practical, when having a lot of such nested queries. Any ideas? On Friday, December 6, 2013, Isaac Hebsh wrote: We want to set a LocalParam on a nested query. When quering

Bad fieldNorm when using morphologic synonyms

2013-12-05 Thread Isaac Hebsh
Hi, we implemented a morphologic analyzer, which stems words on index time. For some reasons, we index both the original word and the stem (on the same position, of course). The stemming is done on a specific language, so other languages are not stemmed at all. Because of that, two documents with

Global query parameters to facet query

2013-12-05 Thread Isaac Hebsh
Hi, It seems that a facet query does not use the global query parameters (for example, field aliasing for edismax parser). We have an intensive use of facet queries (in some cases, we have a lot of facet.query for a single q), and the using of LocalParams for each facet.query is not convenient.

Re: Bad fieldNorm when using morphologic synonyms

2013-12-05 Thread Isaac Hebsh
, Ahmet Arslan iori...@yahoo.com wrote: Hi Isaac, Did you consider omitting norms completely for that field? omitNorms=true Are you using solr.RemoveDuplicatesTokenFilterFactory? On Thursday, December 5, 2013 8:55 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, we implemented a morphologic

Re: Solr Result Tagging

2013-10-27 Thread Isaac Hebsh
Hi, Try using facet.query on each part, you will get the number of total hits for every OR. If you need this info per document, the answers might appear when specifying debug query=true.. If that info is useful, try adding [explain] to fl param (probably requires registering the augmenter plugin

Re: Profiling Solr Lucene for query

2013-10-01 Thread Isaac Hebsh
Hi Dmitry, I'm trying to examine your suggestion to create a frontend node. It sounds pretty usefull. I saw that every node in solr cluster can serve request for any collection, even if it does not hold a core of that collection. because of that, I thought that adding a new node to the cluster

Re: Profiling Solr Lucene for query

2013-10-01 Thread Isaac Hebsh
for reading the index, or more CPUs because the merging process might be more CPU intensive). Isn't it possible? On Wed, Oct 2, 2013 at 12:42 AM, Shawn Heisey s...@elyograg.org wrote: On 10/1/2013 2:35 PM, Isaac Hebsh wrote: Hi Dmitry, I'm trying to examine your suggestion to create a frontend

Considerations about setting maxMergedSegmentMB

2013-09-30 Thread Isaac Hebsh
Hi, Trying to solve query performance issue, we suspect on the number of index segments, which might slow the query (due to I/O seeks, happens for each term in the query, multiplied by number of segments). We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4). We can reduce the number of

Re: Data duplication using Cloud+HDFS+Mirroring

2013-09-30 Thread Isaac Hebsh
Hi Greg, Did you get an answer? I'm interested in the same question. More generally, what are the benefits of HdfsDirectoryFactory, besides the transparent restore of the shard contents in case of a disk failure, and the ability to rebuild index using MR? Is the next statement exact? blocks of a

Re: Getting a query parameter in a TokenFilter

2013-09-21 Thread Isaac Hebsh
/SOLR-5053 What would you do? On Tue, Sep 17, 2013 at 10:31 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi everyone, We developed a TokenFilter. It should act differently, depends on a parameter supplied in the query (for query chain only, not the index one, of course). We found no way

Getting a query parameter in a TokenFilter

2013-09-17 Thread Isaac Hebsh
Hi everyone, We developed a TokenFilter. It should act differently, depends on a parameter supplied in the query (for query chain only, not the index one, of course). We found no way to pass that parameter into the TokenFilter flow. I guess that the root cause is because TokenFilter is a pure

documentCache and lazyFieldLoading

2013-08-29 Thread Isaac Hebsh
Hi, We've investigated a memory dump, which was taken after some frequent OOM incidents. The main issue we found was a lot of millions of LazyField instances, taking ~2GB of memory, even though queries request about 10 small fields only. We've found that LazyDocument creates a LazyField object

Re: documentCache and lazyFieldLoading

2013-08-29 Thread Isaac Hebsh
Thanks Hoss. 1. We currently use Solr 4.3.0. 2. I understand this architecture of LazyFields, but i did not understand why multiple LazyFields should be created for the multivalued field. You can't load a part of them. If you request the field, you will get ALL of its values. so 100 (or more)

Re: Sending shard requests to all replicas

2013-07-31 Thread Isaac Hebsh
Thanks to Ryan Ernst, my issue is duplicate of SOLR-4449. I think that this proposal might be very useful (some supporting links are attached there. worth reading..) On Tue, Jul 30, 2013 at 11:49 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, I submitted a new JIRA for this: https

Re: Sending shard requests to all replicas

2013-07-27 Thread Isaac Hebsh
_why_ you so often have a slow shard and whether the problem could be cured with, say, better warming queries on the shards... Best Erick On Fri, Jul 26, 2013 at 8:23 AM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi! When SolrClound executes a query, it creates shard requests, which

Re: Sending shard requests to all replicas

2013-07-27 Thread Isaac Hebsh
or not.. :) On Sun, Jul 28, 2013 at 1:06 AM, Shawn Heisey s...@elyograg.org wrote: On 7/27/2013 3:33 PM, Isaac Hebsh wrote: I have about 40 shards. repFactor=2. The cause of slower shards is very interesting, and this is the main approach we took. Note that in every query, it is another shard

Sending shard requests to all replicas

2013-07-26 Thread Isaac Hebsh
Hi! When SolrClound executes a query, it creates shard requests, which is sent to one replica of each shard. Total QTime is determined by the slowest shard response (plus some extra time). [For simplicity, let's assume that no stored fields are requested.] I suffer from a situation where in

MoinMoin Dump

2013-07-17 Thread Isaac Hebsh
Hi, There was a thread about viewing Solr Wiki offline, About 6 months ago. I'm intersted, too. It seems that a manual (cron?) dump will do the work... Would it be too much to ask that one of the admins will manually create such a dump? (http://moinmo.in/HelpOnMoinCommand/ExportDump) Otis, is

Re: Wildcards and Phrase queries

2013-06-23 Thread Isaac Hebsh
. You could try with higher solr versions too. If it does not work, please lets us know. https://issues.apache.org/jira/secure/attachment/12579832/ComplexPhrase-4.2.1.zip From: Isaac Hebsh isaac.he...@gmail.com To: solr-user@lucene.apache.org Sent: Saturday

Re: Wildcards and Phrase queries

2013-06-22 Thread Isaac Hebsh
wanted to use these for production. I confess I don't know what state they were left in or why they were never committed. FWIW, Erick On Wed, Jun 19, 2013 at 10:08 AM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, I'm trying to understand what is the status of enabling wildcards

Wildcards and Phrase queries

2013-06-19 Thread Isaac Hebsh
Hi, I'm trying to understand what is the status of enabling wildcards on phrase queries? Lucene JIRA issue: https://issues.apache.org/jira/browse/LUCENE-1486 Solr JIRA issue: https://issues.apache.org/jira/browse/SOLR-1604 It looks like these issues are not going to be solved in the close

OutOfMemory while indexing (PROD environment!)

2013-06-06 Thread Isaac Hebsh
Hi everyone, My SolrCloud cluster (4.3.0) has came into production a few days ago. Docs are being indexed into Solr using /update requestHandler, as a POST request, containing text/xml content-type. The collection is sharded into 36 pieces, each shard has two replicas. There are 36 nodes (each

Re: Prevention of heavy wildcard queries

2013-06-02 Thread Isaac Hebsh
28, 2013 at 7:08 AM, Isaac Hebsh isaac.he...@gmail.com wrote: I don't want to affect on the (correctness of the) real query parsing, so creating a QParserPlugin is risky. Instead, If I'll parse the query in my search component, it will be detached from the real query parsing, (obviously

Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
Hi. Searching terms with wildcard in their start, is solved with ReversedWildcardFilterFactory. But, what about terms with wildcard in both start AND end? This query is heavy, and I want to disallow such queries from my users. I'm looking for a way to cause these queries to fail. I guess there

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
this way, you are changing semantics - but don't need to touch the syntax definition; of course, you may also change the grammar and allow only one instance of wildcard (or some combination) but for that you should probably use LUCENE-5014 roman On Mon, May 27, 2013 at 2:18 PM, Isaac Hebsh

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Isaac Hebsh
, or just above, the wildcard processor also make sure you are setting your qparser for FQ queries, ie. fq={!nw}foo On Mon, May 27, 2013 at 5:01 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Thanks Roman. Based on some of your suggestions, will the steps below do the work? * Create

Re: SurroundQParser does not analyze the query text

2013-05-17 Thread Isaac Hebsh
:38 , Isaac Hebsh wrote: Hi, I'm trying to use Surround Query Parser for two reasons, which are not covered by proximity slops: 1. find documents with two words within a given distance, *unordered* 2. given two lists of words, find documents with (at least) one word from list A and (at least

Bloom Filters

2013-05-17 Thread Isaac Hebsh
Hi everyone.. I'm indexing docs into Solr using the update request handler, by POSTing data to the REST endpoint (not SolrJ, not DIH). My indexer should return an indication, whether the document existed in the collection before or not, based in its ID. The obvious solution is the perform a

SurroundQParser does not analyze the query text

2013-05-16 Thread Isaac Hebsh
Hi, I'm trying to use Surround Query Parser for two reasons, which are not covered by proximity slops: 1. find documents with two words within a given distance, *unordered* 2. given two lists of words, find documents with (at least) one word from list A and (at least) one word from list B, within

Re: Basic auth on SolrCloud /admin/* calls

2013-03-29 Thread Isaac Hebsh
Hi Tim, Are you running Solr 4.2? (In 4.0 and 4.1, the Collections API didn't return any failure message. see SOLR-4043 issue). As far as I know, you can't tell Solr to use authentication credentials when communicating other nodes. It's a bigger issue.. for example, if you want to protect the

Re: Combining Solr Indexes at SolrCloud

2013-03-29 Thread Isaac Hebsh
Let's say you have machine A and machine B. you want to shutdown B. If all the shards on B have replicas (on A), you can shutdown B instantly. If there is a shard on B that has no replica, you should create one on machine A (using Core API), let it replicate the whole shard contents, and then you

Solr 4.2 - DocValues on id field

2013-03-13 Thread Isaac Hebsh
Hi, The example schema.xml in Solr 4.2 does not define id field as docValues=true. Any good reason? (other than backward compat for index for previous version...) If my common case is fl=id (and no other field), DocValues is classic for me. Am I right?

Any documentation on Solr MBeans?

2013-03-07 Thread Isaac Hebsh
Hi, I'm trying to monitor some Solr behaviour, using JMX. It looks like a great job was done there, but I can't find any documentation on the MBeans themselves. For example, DirectUpdateHandler2 attributes. What is the difference between adds and cumulative_adds? Is adds count the last X seconds

Re: Timestamp field is changed on update

2013-02-28 Thread Isaac Hebsh
). This solution exactly covers my case. Thank you! On Wed, Feb 20, 2013 at 11:33 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Nobody responded my JIRA issue :( Should I commit this patch into SVN's trunk, and set the issue as Resolved? On Sun, Feb 17, 2013 at 9:26 PM, Isaac Hebsh isaac.he

update fails if one doc is wrong

2013-02-26 Thread Isaac Hebsh
Hi. I add documents to Solr by POSTing them to UpdateHandler, as bulks of add commands (DIH is not used). If one document contains any invalid data (e.g. string data into numeric field), Solr returns HTTP 400 Bad Request, and the whole bulk is failed. I'm searching for a way to tell Solr to

Re: Timestamp field is changed on update

2013-02-20 Thread Isaac Hebsh
Nobody responded my JIRA issue :( Should I commit this patch into SVN's trunk, and set the issue as Resolved? On Sun, Feb 17, 2013 at 9:26 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Thank you Alex. Atomic Update allows you to add new values into multivalued field, for example... It means

Re: Timestamp field is changed on update

2013-02-17 Thread Isaac Hebsh
Thank you Alex. Atomic Update allows you to add new values into multivalued field, for example... It means that the original document is being read (using RealTimeGet, which depends on updateLog). There is no reason that the list of operations (add/set/inc) will not include a create-only

Re: Timestamp field is changed on update

2013-02-16 Thread Isaac Hebsh
I opened a JIRA for this improvement request (attached a patch to DistributedUpdateProcessor). It's my first JIRA. please review it... (Or, if someone has an easier solution, tell us...) https://issues.apache.org/jira/browse/SOLR-4468 On Fri, Feb 15, 2013 at 8:13 AM, Isaac Hebsh isaac.he

Re: Timestamp field is changed on update

2013-02-16 Thread Isaac Hebsh
created in the system? I think an external create timestamp would be a lot more useful. wunder On Feb 16, 2013, at 12:37 PM, Isaac Hebsh wrote: I opened a JIRA for this improvement request (attached a patch to DistributedUpdateProcessor). It's my first JIRA. please review

Re: Timestamp field is changed on update

2013-02-16 Thread Isaac Hebsh
...@odoko.co.uk wrote: I think what Walter means is make the thing that sends it to Solr set the timestamp when it does so. Upayavira On Sat, Feb 16, 2013, at 08:56 PM, Isaac Hebsh wrote: Hi, I do have an externally-created timestamp, but some minutes may pass before it will be sent

Re: How to limit queries to specific IDs

2013-02-12 Thread Isaac Hebsh
them as a MUST clause, like +(original query) +id:(1 2 3 4). Third possibility, see https://issues.apache.org/jira/browse/SOLR-2429, but the short form is: fq={!cache=false}restoffq On Mon, Feb 11, 2013 at 2:41 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi everyone. I have queries

How to limit queries to specific IDs

2013-02-11 Thread Isaac Hebsh
Hi everyone. I have queries that should be bounded to a set of IDs (the uniqueKey field of my schema). My client front-end sends two Solr request: In the first one, it wants to get the top X IDs. This result should return very fast. No time to waste on highlighting. this is a very standard query.

Re: Trying to understand soft vs hard commit vs transaction log

2013-02-08 Thread Isaac Hebsh
Shawn, what about 'flush to disk' behaviour on MMapDirectoryFactory? On Fri, Feb 8, 2013 at 11:12 AM, Prakhar Birla prakharbi...@gmail.comwrote: Great explanation Shawn! BTW soft commited documents will be not be recovered on JVM crash. On 8 February 2013 13:27, Shawn Heisey

Re: IP Address as number

2013-02-07 Thread Isaac Hebsh
Small addition: To support query, I probably have to implement an analyzer (query time)... An analyzer can be configured on numeric (i.e non TEXT) field? On Thu, Feb 7, 2013 at 6:48 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi. I have to index field which contains an IP address. Users

Re: Servlet Filter for randomizing core names

2013-02-04 Thread Isaac Hebsh
. I feel that we can achieve some improvement in this case... On Mon, Feb 4, 2013 at 12:45 AM, Shawn Heisey s...@elyograg.org wrote: On 2/3/2013 3:24 PM, Isaac Hebsh wrote: Thanks Shawn for your quick answer. When using collection name, Solr will choose the leader, when available

Re: Servlet Filter for randomizing core names

2013-02-04 Thread Isaac Hebsh
/2013 12:06 PM, Isaac Hebsh wrote: LBHttpSolrServer is only solrj feature.. doesn't it? I think that Solr does not balance queries among cores in the same server. You can claim that it's a non-issue, if a single core can completely serve multiple queries on the same time, and passing requests

Re: Servlet Filter for randomizing core names

2013-02-03 Thread Isaac Hebsh
works well here, Is utilizing all the cores would not be useful? On Sun, Feb 3, 2013 at 11:49 PM, Shawn Heisey s...@elyograg.org wrote: On 2/3/2013 1:18 PM, Isaac Hebsh wrote: Hi. I have a SolrCloud cluster, which contains some servers. each server runs multiple cores. I want to distribute

Re: Distibuted search

2013-01-28 Thread Isaac Hebsh
, and the boost is pretty impressive (roughly 2-5x faster for a complicated query) Ming On Mon, Jan 28, 2013 at 10:54 AM, Isaac Hebsh isaac.he...@gmail.com wrote: Does adding replicas (on additional servers) help to improve search performance? It is known that each query goes to all the shards

Re: secure Solr server

2013-01-27 Thread Isaac Hebsh
You can define a security filter in WEB-INF\web.xml, on specific url patterns. You might want to set the url pattern to /admin/*. [find examples here: http://stackoverflow.com/questions/7920092/how-can-i-bypass-security-filter-in-web-xml ] On Sun, Jan 27, 2013 at 8:07 PM, Mingfeng Yang

Re: uniqueKey field type

2013-01-23 Thread Isaac Hebsh
... On Thu, Jan 24, 2013 at 3:31 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, I think trie type fields add value only if you do range queries in them and it sounds like that is bit your use case. Otis Solr ElasticSearch Support http://sematext.com/ On Jan 23, 2013 2:53 PM, Isaac

Re: Solr cache considerations

2013-01-20 Thread Isaac Hebsh
(http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html). Find a bottleneck _then_ tune. Premature optimization and all that Several tens of millions of docs isn't that large unless the text fields are enormous. Best Erick On Sat, Jan 19, 2013 at 2:32 PM, Isaac Hebsh

Re: Solr cache considerations

2013-01-19 Thread Isaac Hebsh
. openSearcher=false makes sense when you are using hard-commits together with soft-commits, as the soft-commit is dealing with opening/closing searchers, you don't need hard commits to do it. Tomás On Fri, Jan 18, 2013 at 2:20 AM, Isaac Hebsh isaac.he...@gmail.com wrote: Unfortunately

Re: Solr cache considerations

2013-01-17 Thread Isaac Hebsh
integrity. Not to mention that your tlog will be huge. Not to mention that there is some memory usage for each document in the tlog. Hard commits roll over the tlog, flush the in-memory tlog pointers, close index segments, etc. Best Erick On Thu, Jan 17, 2013 at 1:29 PM, Isaac Hebsh