Solr developer IRC channel

2013-06-10 Thread Yonik Seeley
FYI, I've created a #solr-dev IRC channel for those who contribute to Solr's development. There used to be more of a "community" feel on some of the IRC channels that's since been lost, so I'm trying to get some of that back with a smaller subset of people interested in developing Solr. The channe

Re: Clear cache used by Solr

2013-06-07 Thread Yonik Seeley
On Fri, Jun 7, 2013 at 7:32 AM, Erick Erickson wrote: > I really question whether this is valuable. Much of Solr performance > is there explicitly because of caches Right, and it's also the case that certain solr features are coded with the cache in mind (i.e. they will be utilized for a single r

Re: facet.missing=true returns null records with zero count also

2013-06-05 Thread Yonik Seeley
On Wed, Jun 5, 2013 at 6:11 PM, Chris Hostetter wrote: > and think that conceptually it > doesn't make sense for facet.missing to consider facet.mincount. +1 "facet.missing" asks for the missing count - regardless of what it is. Although it might make sense in some use cases to make facet.missin

Re: Getting tons of EofException with jetty/SolrCloud

2013-05-31 Thread Yonik Seeley
On Fri, May 31, 2013 at 4:11 PM, Jason Hellman wrote: > Those are default, though autoSoftCommit is commented out by default. > > Keep in mind about the hard commit running every 15 seconds: it is not > updating your searchable data (due to the openSearcher=false setting). In > theory, your da

Re: Overlapping onDeckSearchers=2

2013-05-27 Thread Yonik Seeley
On Mon, May 27, 2013 at 7:11 AM, Jack Krupansky wrote: > The intent is that optimize is obsolete and should no longer be used That's incorrect. People need to understand the cost of optimize, and that it's use is optional. It's up to the developer to figure out of the benefits of calling optimiz

Re: Replica shards not updating their index when update is sent to them

2013-05-20 Thread Yonik Seeley
On Mon, May 20, 2013 at 4:21 PM, Sebastián Ramírez wrote: > When I send an update to a non-leader (replica) shard (B), the updated > results are reflected in the leader shard (A) and in the other replica > shard (C), but not in the shard that received the update (B). I've never seen that before.

Re: Transaction Logs Leaking FileDescriptors

2013-05-16 Thread Yonik Seeley
Yonik http://lucidworks.com On Wed, May 15, 2013 at 6:04 PM, Steven Bower wrote: > They are visible to ls... > > > On Wed, May 15, 2013 at 5:49 PM, Yonik Seeley wrote: > >> On Wed, May 15, 2013 at 5:20 PM, Steven Bower wrote: >> > when the TransactionLog objects are dereferenc

Re: Solr group.order not work

2013-05-15 Thread Yonik Seeley
"group.order" is not a valid parameter. You're probably looking for "group.sort" -Yonik http://lucidworks.com On Wed, May 15, 2013 at 9:30 PM, alexzhang wrote: > I use the Solr 4.0.0 +, when I try to sort the results which within one > group, it does not work? > > The wiki I referenced: http://

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 5:06 PM, Steven Bower wrote: > This leads me to believe that the > TransactionLog is not properly closing all of it's files before getting rid > of the object... I tried some ad hoc tests, and I can't reproduce this behavior yet. There must be some other code path that inc

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 5:20 PM, Steven Bower wrote: > when the TransactionLog objects are dereferenced > their RandomAccessFile object is not closed.. Have the files been deleted (unlinked from the directory), or are they still visible via "ls"? -Yonik http://lucidworks.com

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 5:20 PM, Steven Bower wrote: > I'm hunting through the UpdateHandler code to try and find where this > happens now.. UpdateLog.addOldLog() -Yonik http://lucidworks.com

Re: Transaction Logs Leaking FileDescriptors

2013-05-15 Thread Yonik Seeley
Hmmm, we keep open a number of tlog files based on the number of records in each file (so we always have a certain amount of history), but IIRC, the number of tlog files is also capped. Perhaps there is a bug when the limit to tlog files is reached (as opposed to the number of documents in the tlo

Re: Function queries

2013-05-15 Thread Yonik Seeley
On Wed, May 15, 2013 at 7:25 AM, sathish_ix wrote: > Hi , i would like to get all documents when searching for a keyword. > > http://localhost:8080/solr/select?q=caram&rows=_val_:"docfreq(SEARCH_TERM,'caram')" > > Searching for 'caram', there are 200 documents, but iam getting first 10 > documents

Re: How to improve performance of geodist()

2013-05-13 Thread Yonik Seeley
On Mon, May 13, 2013 at 1:12 PM, Nicholas Ding wrote: > I'm using geodist() in a recip boost function. I noticed a performance > impact to the response time. I did a profiling session, the geodist() > calculation took 30% of CPU time. Are you also using an "fq" with geofilt to narrow down the num

Re: stats cache

2013-05-07 Thread Yonik Seeley
On Tue, May 7, 2013 at 12:48 PM, J Mohamed Zahoor wrote: > Hi > > I am computing lots of stats as part of a query… > looks like the solr caching is not helping here… > > Does solr caches stats of a query? No. Neither facet counts or stats part of a request are cached. The query cache only cache

Re: [solr 3.4] anomaly during distributed facet query with 102 shards

2013-04-25 Thread Yonik Seeley
On Thu, Apr 25, 2013 at 8:32 AM, Dmitry Kan wrote: > Are there any distrib facet gurus on the list? I would be ready to try > sensible ideas, including on the source code level, if someone of you could > give me a hand. The Lucene/Solr Revolution conference is coming up next week, so I think many

Re: Reordered DBQ.

2013-04-23 Thread Yonik Seeley
On Tue, Apr 23, 2013 at 3:51 PM, Marcin Rzewucki wrote: > Recently I noticed a lot of "Reordered DBQs detected" messages in logs. As > far as I checked in logs it could be related with deleting documents, but > not sure. Do you know what is the reason of those messages ? For high throughput index

Re: Too many close, count -1

2013-04-22 Thread Yonik Seeley
Can you tell what operations cause this to happen? I've added a comment to https://issues.apache.org/jira/browse/SOLR-4749 where we're looking at some related issues around CoreContainer, but perhaps it should get it's own issue. -Yonik http://lucidworks.com On Mon, Apr 22, 2013 at 7:57 PM, yri

Re: Bug? JSON output changes when switching to solr cloud

2013-04-22 Thread Yonik Seeley
Thanks David, I've confirmed this is still a problem in trunk and opened https://issues.apache.org/jira/browse/SOLR-4746 -Yonik http://lucidworks.com On Sun, Apr 21, 2013 at 11:16 PM, David Parks wrote: > We just took an installation of 4.1 which was working fine and changed it to > run as sol

Re: Solr cloud and batched updates

2013-04-21 Thread Yonik Seeley
On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter wrote: > There's no problem here, but I'm curious about how batches of updates > are handled on the Solr server side in Solr cloud? > > Going over the code for DistributedUpdateProcessor and > SolrCmdDistributor, it appears that the batch is broken

Re: Solr 4.2 fl issue

2013-04-18 Thread Yonik Seeley
When using a field name that doen't follow conventions (basically like Java identifiers), try this: fl=field(098765-765-788558-7654_userid) Or enclose it in quotes if it's really a whacky field name: fl=field("098765-765-788558-7654_userid") -Yonik http://lucidworks.com On Thu, Apr 18, 2013 a

Re: TooManyClauses: maxClauseCount is set to 1024

2013-04-18 Thread Yonik Seeley
Can you provide a full stack trace of the exception? There's a maxClauseCount in solrconfig.xml that you can increase to work around the issue. -Yonik http://lucidworks.com On Thu, Apr 18, 2013 at 7:31 AM, sawanverma wrote: > Its quite confusing about this error. > > I had a situation where i

Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Yonik Seeley
On Tue, Apr 16, 2013 at 9:44 PM, Roman Chyla wrote: > Is there some profound reason why the defType is not passed onto the filter > query? defType is a convenience so that the main query parameter "q" can directly be the user query (without specifying it's type like "edismax"). Filter queries are

Re: Function Query performance in combination with filters

2013-04-16 Thread Yonik Seeley
On Tue, Apr 16, 2013 at 7:51 AM, Rogalon wrote: > Hi, > I am using pretty complex function queries to completely customize (not only > boost) the score of my result documents that are retrieved from an index of > approx 10e7 documents. To get to an acceptable level of performance I > combine my qu

Re: Combining join queries

2013-04-11 Thread Yonik Seeley
On Wed, Apr 10, 2013 at 7:33 AM, Upayavira wrote: > On Wed, Apr 10, 2013, at 12:22 PM, Upayavira wrote: >> I'm sure the best way for me to solve this issue myself is to ask it >> publicly, so... >> >> If I have two {!join} queries that select a collection of documents >> each, how do I create a fi

Re: Boost parameter with query function - how to pass in complex params?

2013-04-07 Thread Yonik Seeley
On Sun, Apr 7, 2013 at 10:11 AM, dc tech wrote: > Yonik: > Pasted the wrong URL as I was trying various things. > > I did not work with OR > http://localhost:8983/solr/cars/select?fl=text,score&defType=edismax&q=suv&boost=query($boostq,1)&boostq=toyota%20OR%20honda&debug=true > > See dumps below.

Re: Boost parameter with query function - how to pass in complex params?

2013-04-07 Thread Yonik Seeley
On Sun, Apr 7, 2013 at 8:39 AM, dc tech wrote: > Yonik, > Many thanks. > The OR is still not working... here is the full URL > 1. Honda or Toyota individually work > http://localhost:8983/solr/cars/select?fl=text,score&defType=edismax&q=suv&boost=query($boostq,1)&boostq=honda > http://localhost:89

Re: Boost parameter with query function - how to pass in complex params?

2013-04-06 Thread Yonik Seeley
On Sat, Apr 6, 2013 at 9:42 AM, dc tech wrote: > See example below > 1. Search for SUVs and boost Honda models > q=suv&boost=query({! v='honda'},1) > > 2. Search for SUVs and boost Honda OR toyota model > > a) Using OR in the query does NOT work >q=suv&boost=query({! v='honda or toyota'},

Re: Compressed Fields in 4.2.1

2013-04-04 Thread Yonik Seeley
On Thu, Apr 4, 2013 at 7:41 PM, Jamie Johnson wrote: > I had read somewhere that text fields by default were compressed in 4.2.1, > is this the case? If not how do I enable compression of stored text fields? Compressed stored fields are the default since 4.1 -Yonik http://lucidworks.com

Re: Nested queries with proximity/slop

2013-03-21 Thread Yonik Seeley
https://issues.apache.org/jira/browse/SOLR-4625 -Yonik http://lucidworks.com On Tue, Mar 19, 2013 at 11:12 PM, Yonik Seeley wrote: > On Tue, Mar 19, 2013 at 8:52 PM, Michael Ryan wrote: >> I was wondering if anyone is aware of an existing Jira for this bug... >> >&g

Re: Nested queries with proximity/slop

2013-03-19 Thread Yonik Seeley
On Tue, Mar 19, 2013 at 8:52 PM, Michael Ryan wrote: > I was wondering if anyone is aware of an existing Jira for this bug... > > _query_:"\"a b\"~2" > ...is parsed as... > PhraseQuery(someField:"a b") > ...instead of the expected... > PhraseQuery(someField:"a b"~2) > > _query_:"\"a b\""~2 > ...is

Re: NPE when adding docs in 4.2

2013-03-16 Thread Yonik Seeley
On Sat, Mar 16, 2013 at 11:36 AM, J Mohamed Zahoor wrote: > aahha… i used a replication factor of 0. > I thought 0 means no replication of original.. > > Should that be 1 if i want no replication? Think of it as the number of copies of a book at a library. replicationFactor is the number of copi

Re: Is Lucene's DrillSideways something suitable for Solr?

2013-03-12 Thread Yonik Seeley
On Tue, Mar 12, 2013 at 10:27 PM, Alexandre Rafalovitch wrote: > Lucene seems to get a new DrillSideways functionality on top of its own > facet implementation. > > I would love to have something like that in Solr Solr has had multi-select faceting for 4 years now. My understanding of DrillSidewa

Re: Dynamic schema design: feedback requested

2013-03-11 Thread Yonik Seeley
On Mon, Mar 11, 2013 at 5:51 PM, Chris Hostetter wrote: > : I guess my main point is, we shouldn't decide a priori that using the > : API means you can no longer hand edit. > > and my point is we should build a feature where solr has the ability to > read/write some piece of information, we should

Re: Dynamic schema design: feedback requested

2013-03-11 Thread Yonik Seeley
On Mon, Mar 11, 2013 at 2:50 PM, Chris Hostetter wrote: > > : > 2) If you wish to use the /schema REST API for read and write operations, > : > then schema information will be persisted under the covers in a data store > : > whose format is an implementation detail just like the index file format.

Re: Dynamic schema design: feedback requested

2013-03-11 Thread Yonik Seeley
On Wed, Mar 6, 2013 at 7:50 PM, Chris Hostetter wrote: > 2) If you wish to use the /schema REST API for read and write operations, > then schema information will be persisted under the covers in a data store > whose format is an implementation detail just like the index file format. This really n

Re: Distributed Search and the Stale Check

2013-02-25 Thread Yonik Seeley
> On my particular benchmark rig, each stale check call accounted for an > additional ~10ms. That's insane! It's still not even clear to me how the stale check works (reliably). Couldn't the server still close the connection between the stale check and the send of data by the client? -Yonik

Re: Field collapsing bad performances, schema redesign

2013-02-04 Thread Yonik Seeley
On Mon, Feb 4, 2013 at 10:34 AM, Mickael Magniez wrote: > group.ngroups=true This is currently very inefficient - if you can live without retrieving the total number of groups, performance should be much better. -Yonik http://lucidworks.com

Re: Solr4.1 changing result order FIFO to LIFO

2013-02-03 Thread Yonik Seeley
On Sun, Feb 3, 2013 at 7:46 AM, Erick Erickson wrote: > Nope. Problem is that the tie breaker is the internal Lucene Doc id. Which > a long time ago was invariant, that is a document indexed later always had > a larger internal doc id. But the various merge policies can combine > segments such th

Re: Join across cores on same shard.

2013-02-02 Thread Yonik Seeley
On Sat, Feb 2, 2013 at 5:49 AM, Marcin Rzewucki wrote: > I meant I get fields from parent core only. Is it possible to get fields > from both cores using join query? Not yet. Joins are currently only for filtering. -Yonik http://lucidworks.com

Re: expert question about SolrReplication

2013-02-01 Thread Yonik Seeley
On Fri, Feb 1, 2013 at 4:13 AM, Bernd Fehling wrote: > A question to the experts, > > why is the replicated index copied from its temporary location > (index.x) > to the real index directory and NOT moved? The intent is certainly to move and not copy (provided the Directory supports it).

Re: Solr 4.1.0 index leaving write.lock file

2013-02-01 Thread Yonik Seeley
On Fri, Feb 1, 2013 at 5:41 PM, dm_tim wrote: > I've been using Solr 4.1.0 for a little while now and I just noticed that > when I index any core I have the write.lock file doesn't go away until I > stop the server where solr is running. Sounds like it's working as it should. The write lock is j

Re: Join across cores on same shard.

2013-02-01 Thread Yonik Seeley
You're missing the query to do the join on: fq={!join from=parent_id to=child_id fromIndex=core2}*:* We should have a better error message rather than a NPE of course... -Yonik http://lucidworks.com On Fri, Feb 1, 2013 at 3:45 PM, Marcin Rzewucki wrote: > Check below if that's better for you:

Re: queryResultCache *very* low hit ratio

2013-01-29 Thread Yonik Seeley
One other thing that some auto-warming of the query result cache can achieve is loading FieldCache entries for sorting / function queries so real user queries don't experience increased latency. If you remove all auto-warming of the query result cache, you may want to add static warming entries fo

Re: Solr 4.1 Custom Hashing DIH

2013-01-25 Thread Yonik Seeley
On Fri, Jan 25, 2013 at 4:09 PM, davers wrote: > I'm not sure I understand. I thought ID had to be unique. Right - the group becomes part of the ID (the prefix), not the whole ID. > for example > I have the following > > [ > { "id" : 1, "groupid" : 1 }, > { "id" : 2, "groupid" : 1}, > { "id" : 3

Re: Solr 4.1 Custom Hashing DIH

2013-01-25 Thread Yonik Seeley
On Fri, Jan 25, 2013 at 3:59 PM, davers wrote: > I want to shard on groupid instead of id but it doesn't seem to be working. That's not yet implemented. Currently you need to put the group in the ID. From the release notes: * Simple multi-tenancy through enhanced document routing: - The "comp

Re: Solr 4.1 Custom Hashing DIH

2013-01-25 Thread Yonik Seeley
On Fri, Jan 25, 2013 at 1:56 PM, davers wrote: > When I used 4.0 I could use my DIH on any shard and the documents would be > distributed based on the internal hashing algorithm and end up distributed > evenly across my three shards. > > I have just upgraded to Solr 4.1 and I have noticed that my

Re: JSON query syntax

2013-01-24 Thread Yonik Seeley
On Thu, Jan 24, 2013 at 8:55 PM, Otis Gospodnetic wrote: > Yes, this is JSON, so right > there it may be better, but for instance I see "v" here which to a regular > human may not be as nice as "value" if that is what "v" stands for. One goal was to reuse the parsers/parameter names. A completel

JSON query syntax

2013-01-24 Thread Yonik Seeley
Although "lucene" syntax tends to be quite concise, nice looking, and easy to build by hand (the web browser is a major debugging tool for me), some people prefer to use a more "structured" query language that's easier to build up programmatically. XML fits the bill, but people tend to prefer JSON

Re: Issues with docFreq/docCount on SolrCloud

2013-01-23 Thread Yonik Seeley
On Wed, Jan 23, 2013 at 6:15 PM, Markus Jelsma wrote: > We need, and i think many SolrCloud users are going to need this as well, to > make replica's don't deviate too much from eachother, because if they do > documents are certainly going to jump positions. The synchronization that would be ne

Re: JSON with order-preserving update commands?

2013-01-23 Thread Yonik Seeley
On Wed, Jan 23, 2013 at 9:50 AM, Craig Ching wrote: > The problem I have is that JSON is not specified to preserve order of > keys. JSON is a serialization format, and readers/writers can preserve order if they wish to. If you send JSON to solr in a specific order, that order will definitely be r

Re: SolrCloud index recovery

2013-01-22 Thread Yonik Seeley
On Tue, Jan 22, 2013 at 4:37 PM, Marcin Rzewucki wrote: > Sorry, my mistake. I did 2 tests: in the 1st I removed just index directory > and in 2nd test I removed both index and tlog directory. Log lines I've > sent are related to the first case. So Solr could read tlog directory in > that moment.

Re: SolrCloud :: Distributed query processing

2013-01-18 Thread Yonik Seeley
Hopefully the explanation here will shed some light on this: https://issues.apache.org/jira/browse/SOLR-3912 -Yonik http://lucidworks.com On Fri, Jan 18, 2013 at 2:59 PM, Mishkin, Ernest wrote: > Hello, > > I'm trying to reconcile my understanding of how distributed queries are > handled by So

Re: how to get abortOnConfigurationError=false working

2013-01-17 Thread Yonik Seeley
On Thu, Jan 17, 2013 at 3:40 PM, snake wrote: > Ok so is there any other to stop this problem I am having where any site > can break solr by delering their collection? > Seems odd everyone would vote to remove a feature that would make solr more > stable. I agree. abortOnConfigurationError was m

Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:42 PM, Walter Underwood wrote: > Ah, that would be it. Does 4.0 also give a stack trace if you call a function > that doesn't exist? Stack trace still appears in the logs, but the error message returned seems OK: http://localhost:8983/solr/query?q=*:*&defType=edismax&b

Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:35 PM, Walter Underwood wrote: > None of the variants worked. I started with that syntax for both exists() and > if(). All gave the same stack trace. --wunder These boolean functions are new for 4.0, but it looks like you're using 3.3? -Yonik http://lucidworks.com

Re: 400 error with boost and exists()

2013-01-16 Thread Yonik Seeley
On Wed, Jan 16, 2013 at 6:11 PM, Walter Underwood wrote: > I got the syntax from: > http://lucidworks.lucidimagination.com/display/solr/Function+Queries Oops, I've alerted our tech writers! It should be fixed now. exists(field|function) returns true if a value exists for a given document. Exam

Re: Solr exception when parsing XML

2013-01-16 Thread Yonik Seeley
On Tue, Jan 15, 2013 at 3:55 PM, Alexandre Rafalovitch wrote: > Basically, the > recommendation is to avoid CDATA and automatically encode characters such > as yours, as well as less/more and ampersand. Unfortunately that doesn't even work. Just as a raw control character like a 0 byte is invali

Re: SolrJ | Atomic Updates | How works exactly?

2013-01-13 Thread Yonik Seeley
On Sun, Jan 13, 2013 at 1:51 PM, Uwe Clement wrote: > What is the best the most performant way to update a large document? That *is* the best way to update a large document that we currently have. Although it re-indexes under the covers, it ensures that it's atomic, and it's faster because it doe

Re: Difference between IntField and TrieIntField in Lucene 4.0

2013-01-12 Thread Yonik Seeley
On Sat, Jan 12, 2013 at 4:56 PM, jefferyyuan wrote: > Looked at Lucene Javadoc, seems we can run range query, filter, sorting on > IntField. > Also seems IntField is also indexed as trie structure. > Javadoc for IntField: You're reading the javadoc for *lucene* IntField. Unfortunately, Lucene has

Re: parsing debug output for readability

2013-01-10 Thread Yonik Seeley
On Thu, Jan 10, 2013 at 6:16 PM, Petersen, Robert wrote: > Thanks, debug.explain.structured=true helps a lot! Could you also tell me > what these `#8;#0;#0;#0;#1; strings represent in the debug output? That's internally how a number is encoded into a string (5 bytes, the first being binary 8, t

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Yonik Seeley
On Fri, Jan 4, 2013 at 1:35 PM, Alexandre Rafalovitch wrote: > Hmm. Doesn't that make (logical) index=collection? And (physical) > index=core? Which creates duplication of terminology and at the same time > can cause confusion between highest logical and lowest physical level. That's why I've avo

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Yonik Seeley
On Fri, Jan 4, 2013 at 2:26 AM, Per Steffensen wrote: > Our biggest problem is that we really havent decided once and for all and > made sure to reflect the decision consistently across code and > documentation. As long as we havnt I believe it is still ok to change our > minds. IMO, I *think* it

Re: What is group.query?

2013-01-03 Thread Yonik Seeley
>From http://wiki.apache.org/solr/FieldCollapsing "Return a single group of documents that also match the given query." ''' We can find the top documents that also match arbitrary queries with the group.query command (much like facet.query). For example, we could use this to find the top 3 docume

Re: Solr Collection API doesn't seem to be working

2013-01-03 Thread Yonik Seeley
axShardsPerNode = msgStrToInt(message, MAX_SHARDS_PER_NODE, 1); > > Remember than replicationFactor decides how many "instances" of you shard > you will get, so a value of 1 does not provide you any replication. > > > On 1/3/13 3:46 AM, Yonik Seeley wrote: >> >> On We

Re: Re: Terminology question: Core vs. Collection vs...

2013-01-03 Thread Yonik Seeley
On Thu, Jan 3, 2013 at 9:17 AM, Darren Govoni wrote: > I think what's confusing about your explanation below is when you have a > situation where there is no replication factor. That's possible too, yes? > > So in that case, is each core of a shard of a collection, still referred to > as a replica

Re: Solr Collection API doesn't seem to be working

2013-01-02 Thread Yonik Seeley
On Wed, Jan 2, 2013 at 9:21 PM, davers wrote: > So by providing the correct replicationFactor parameter for the number of > servers has fixed my issue. > > So can you not provide a higher replicationFactor than you have live_nodes? > What if you want to add more replicants to the collection in the

Re: "order" question on solr multi value field

2012-12-19 Thread Yonik Seeley
On Tue, Dec 18, 2012 at 8:24 PM, Robert Muir wrote: > I agree with James. Actually lucene tests will fail if a codec violates this. > > Actually it goes much deeper than this. > > From the lucene apis, when you call IndexReader.document() with your > storedfieldVisitor, it must visit the fields in

Re: Will SolrCloud always slice by ID hash?

2012-12-18 Thread Yonik Seeley
On Tue, Dec 18, 2012 at 2:20 PM, Scott Stults wrote: > I'm going to be building a Solr cluster and I want to have a rolling set of > slices so that I can keep a fixed number of days in my collection. If I > send an update to a particular slice leader, will it always hash the unique > key and (prob

Re: Order SOLR 4 output

2012-12-18 Thread Yonik Seeley
On Tue, Dec 18, 2012 at 4:58 AM, roySolr wrote: > Hello, > > I have a really simple question i think: > > What is the order of the fields that are in the SOLR response? > In SOLR 3.1 it was alfabetic but in SOLR 4 it isn't anymore. Is it > configurable? > > I want to know this because i have test

Re: small QTime but slow results to user

2012-12-15 Thread Yonik Seeley
On Sat, Dec 15, 2012 at 1:11 PM, S L wrote: > My virtual machine has 6GB of RAM. Tomcat is currently configured to use 4GB > of it. The size of the index is 5.4GB for 3 million records which averages > out to 1.8KB per record. I can look at trimming the data, having fewer > records in the index to

Re: small QTime but slow results to user

2012-12-15 Thread Yonik Seeley
On Sat, Dec 15, 2012 at 12:04 PM, S L wrote: > Thanks everyone for the responses. > > I did some more queries and watched disk activity with iostat. Sure enough, > during some of the slow queries the disk was pegged at 100% (or more.) > > The requirement for the app I'm building is to be able to r

Re: small QTime but slow results to user

2012-12-14 Thread Yonik Seeley
On Fri, Dec 14, 2012 at 3:43 PM, S L wrote: > Does anyone have an idea why a query that takes solr just half a second (500 > ms) to execute would take 3 seconds to transfer the data? Normally this is due to slow reading of the stored fields (i.e. slow disk IO). For scalability, we don't read all

Re: The shard called `properties`

2012-12-13 Thread Yonik Seeley
Seems like we should just bite the bullet and do it right. {"collection1": { "config" : "myconf" "router" : "compositeId", "shards" : { "shard1" : {... -Yonik http://lucidworks.com > - mark > > On Dec 6,

Re: Sort speed asc vs desc - is desc slower?

2012-12-12 Thread Yonik Seeley
On Wed, Dec 12, 2012 at 5:49 PM, Michael Ryan wrote: > When sorting a TrieLongField, should there be any expected difference in > query speed when sorting ascending vs sorting descending? I'm seeing desc > queries sometimes take 10x longer than asc queries. I can provide more > details if neces

Re: SolrCloud - Query performance degrades with multiple servers

2012-12-12 Thread Yonik Seeley
On Wed, Dec 12, 2012 at 5:03 PM, sausarkar wrote: > We still could replicate the issue in 4.1 branch i.e. queries going to one > server (numShards=1) is being distributed among all the servers which is > creating CPU spikes in all the servers in the cloud. Do you think this > behavior is as expect

Re: SolrCloud - Query performance degrades with multiple servers

2012-12-11 Thread Yonik Seeley
OK, I tried to reproduce it on trunk, and I can't (i.e. everything is looking fine). rm -rf example/solr/zoo_data cp -rp example example2 cp -rp example example3 cd example java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar cd exa

Re: SolrCloud - Query performance degrades with multiple servers

2012-12-11 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 8:08 PM, sausarkar wrote: > Ok we think we found out the issue here. When solrcloud is started without > specifying numShards argument solrcloud starts with a single shard but still > thinks that there are multiple shards, so it forwards every single query to > all the nodes

Re: SOLR4 (sharded) and join query

2012-12-09 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 6:47 PM, Erick Erickson wrote: > see: http://wiki.apache.org/solr/DistributedSearch > > joins aren't supported in distributed search. Any time you have more than > one shard in SolrCloud, you are, by definition, doing distributed search. It is supported, but there is a limi

Re: Minimum HA Setup with SolrCloud

2012-12-06 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 8:42 PM, Jack Krupansky wrote: > And this is precisely why the mystery remains - because you're only > describing half the picture! Describe the rest of the picture - including > what exactly those two zks can and can't do, including resolution of ties > and the concept of "

Re: Solr 4 : Optimize very slow

2012-12-06 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 12:17 PM, Sandeep Mestry wrote: > I followed the advice Michael and the timings reduced to couple of hours now > from 6-8 hours :-) Just changing from mmap to NIO, eh? What does your system look like? operating system, JVM, drive, memory, etc? -Yonik http://lucidworks.com

Re: Minimum HA Setup with SolrCloud

2012-12-06 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 5:55 PM, Jack Krupansky wrote: > I trust that you have the right answer, Mark, but maybe I'm just struggling > to parse this statement: "the remaining two machines do not constitute a > majority." > > If you start with 3 zk and lose one, you have an ensemble that does not >

Re: Minimum HA Setup with SolrCloud

2012-12-06 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 5:21 PM, Jack Krupansky wrote: > If 1 is the minimum, what is the 3 "minimum" all about? The minimum for running an ensemble (a cluster) and having any sort of fault tolerance? > The zk web page does say "Three ZooKeeper servers is the minimum recommended > size for an ens

Re: The shard called `properties`

2012-12-06 Thread Yonik Seeley
On Wed, Dec 5, 2012 at 5:17 PM, Mark Miller wrote: > See the custom hashing issue - the UI has to be updated to ignore this. > > Unfortunately, it seems that clients have to be hard coded to realize > properties is not a shard unless we add another nested layer. Yeah, I talked about this a while

Re: Minimum HA Setup with SolrCloud

2012-12-06 Thread Yonik Seeley
On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma wrote: > The quorum is the minimun, so it depends on how many you have running in the > ensemble. If it's three or four, then two is the quorum I think that for 4 ZK servers, then 3 would be the quorum? -Yonik http://lucidworks.com

Re: Solr Query Parameter : ids - What is this used for?

2012-12-03 Thread Yonik Seeley
On Mon, Dec 3, 2012 at 10:55 PM, deniz wrote: > Hello, as it is clear in the title too, i wanna know for what solr uses this > parameter... i see it on a sharding env on cloud, so i guess it is related > with cloud but still there is no explanation about it in any of wiki pages > that i have check

Re: Solr 4, optimizing while doing other updates?

2012-11-27 Thread Yonik Seeley
On Tue, Nov 27, 2012 at 3:21 PM, Shawn Heisey wrote: > but even way back then, rumblings on the mailing list said "don't optimize > for performance reasons." Count me amongst the dissenters. Optimize can make a lot of sense, and that's why it still exists. People should be careful to not assum

Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output

2012-11-23 Thread Yonik Seeley
Sounds like perhaps the SynonymFilter is losing the positionIncrement of 0 (which make the first two tokens overlap)? You could perhaps verify with the analysis debugging on the admin page. -Yonik http://lucidworks.com On Tue, Nov 20, 2012 at 10:55 PM, Chris Book wrote: > Hello, I've recently u

Re: SolrCloud and exernal file fields

2012-11-22 Thread Yonik Seeley
On Tue, Nov 20, 2012 at 4:16 AM, Martin Koch wrote: > around 7M documents in the index; each document has a 45 character ID. 7M documents isn't that large. Is there a reason why you need so many shards (16 in your case) on a single box? -Yonik http://lucidworks.com

Re: sort by function error

2012-11-12 Thread Yonik Seeley
field the query working. > > BTW, I just find out that the version of solr we are using is an old copy of > 4.0 snapshot before the alpha release. Could that be the problem? we have > some customized parsers so it will take quite some time to upgrade. > > > Ben > ______

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread Yonik Seeley
On Tue, Nov 13, 2012 at 2:27 AM, wrote: > I'm surprised that this has not been logged as adefect. The fact that this > is ON bydefault, means someone can bring down a server; this is bad enough to > categorizethis as a security issue. It's all relative. There are tons of queries that can tak

Re: How to speed up Facet count (Big index) ??!!!!

2012-11-12 Thread Yonik Seeley
On Mon, Nov 12, 2012 at 8:39 PM, Aeroox Aeroox wrote: > Hi folks, > > I have a solr index with up to 50M documents. A document contain 62 fields > (docid, name, location). > > The facet count took 1 to 2 minutes with this params : > > http://.../select/?q=solr&; > version=2.2&start=0&rows=

Re: sort by function error

2012-11-12 Thread Yonik Seeley
On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben wrote: > more information, problem only happends when I have both sort by function > and grouping in query. I haven't been able to duplicate this with a few ad-hoc queries. Could you give your complete request (or at least all of the relevant grouping

Re: zkcli issues

2012-11-11 Thread Yonik Seeley
On Sun, Nov 11, 2012 at 10:39 PM, Nick Chase wrote: > So I'm trying to use ZkCLI without success. I DID start and stop Solr in > non-cloud mode, so everything is extracted and it IS finding zookeeper*.jar. > However, now it's NOT finding SolrJ. Not sure about your specific problem in this case,

Re: indexing CSV using Solr 3.6.1

2012-11-10 Thread Yonik Seeley
My guess is that this might have to do with the fact that you are on Windows, and shell escaping is different (i.e. curl isn't getting all of the parameters and hence isn't sending everything to Solr). My first recommendation would be to install cygwin to get a UNIX command line environment like L

Re: SolrCloud and distributed search

2012-10-26 Thread Yonik Seeley
On Fri, Oct 26, 2012 at 10:14 AM, Bill Au wrote: > I am currently using one master with multiple slaves so I do have high > availability for searching now. > > My index does fit on a single machine and a single query does not take too > long to execute. But I do want to take advantage of high ava

Re: Occasional Solr performance issues

2012-10-22 Thread Yonik Seeley
On Mon, Oct 22, 2012 at 4:39 PM, Michael Della Bitta wrote: > Has the Solr team considered renaming the optimize function to avoid > leading people down the path of this antipattern? If it were never the right thing to do, it could simply be removed. The problem is that it's sometimes the right t

Re: Why does SolrIndexSearcher.java enforce mutual exclusion of filter and filterList?

2012-10-21 Thread Yonik Seeley
On Sun, Oct 21, 2012 at 3:57 PM, Aaron Daubman wrote: > Greetings, > > I'm wondering if somebody would please explain why > SolrIndexSearcher.java enforces mutual exclusion of filter and > filterList > (e.g. see: > https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/sol

Re: differences of LockFactory between solr 3.6.1 and 4.0.0?

2012-10-17 Thread Yonik Seeley
On Wed, Oct 17, 2012 at 9:33 AM, Bernd Fehling wrote: > Hi list, > > while checking the runtime behavior of solr 4.0.0 I recognized that the > handling > of write.lock seams to be different. > > With solr 3.6.1 after calling optimize the index is optimzed and write.lock > removed. > This tells m

Re: Why is SolrDispatchFilter using 90% of the Time?

2012-10-10 Thread Yonik Seeley
> When I look at the distribution of the Response-time I notice > 'SolrDispatchFilter.doFilter()' is taking up 90% of the time. That's pretty much the top-level entry point to Solr (from the servlet container), so it's normal. -Yonik http://lucidworks.com

<    1   2   3   4   5   6   7   8   9   10   >