Re: Numerical ids for terms?

2011-04-13 Thread Toke Eskildsen
On Tue, 2011-04-12 at 11:41 +0200, Gregor Heinrich wrote: Hi -- has there been any effort to create a numerical representation of Lucene indices. That is, to use the Lucene Directory backend as a large term-document matrix at index level. As this would require bijective mapping between

Re: New facet module

2011-07-11 Thread Toke Eskildsen
On Sat, 2011-07-09 at 05:44 +0200, Shai Erera wrote: The taxonomy is global to the index, but I think it will be interesting to explore per-segment taxonomy, and how it can be used to improve indexing or search perf (hopefully both). I have struggled with this for some time and still haven't

Re: Source Control

2012-11-06 Thread Toke Eskildsen
if I had a place to make a public repository (which admittedly is easy enough with GitHub et al). - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional

Re: Optimize facets when actually single valued?

2012-11-13 Thread Toke Eskildsen
On Tue, 2012-11-13 at 19:50 +0100, Yonik Seeley wrote: The original version of Solr (SOLAR when it was still inside CNET) did this - a multiValued field with a single value was output as a singe value, not an array containing a single value. Some people wanted more predictability (always an

Re: Optimize facets when actually single valued?

2012-11-16 Thread Toke Eskildsen
On Wed, 2012-11-14 at 14:46 +0100, Robert Muir wrote: On Tue, Nov 13, 2012 at 11:41 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Dynamically changing response formats sounds horrible. I don't understand how this is related with my proposal to automatically use a different data

Re: pro coding style

2012-12-03 Thread Toke Eskildsen
On Sat, 2012-12-01 at 17:18 +0100, Per Steffensen wrote: With change/merge-tracking in both system, the important thing must be that you do not have to throw the tracked information away before in you attempt to get your changes into the main repository. People write commit messages in many

RE: Polymorphic Index

2010-10-21 Thread Toke Eskildsen
Mark Harwood [markharw...@yahoo.co.uk]: Given a large range of IDs (eg your 300 million) you could constrain the number of unique terms using a double-hashing technique e.g. Pick a number n for the max number of unique terms you'll tolerate e.g. 1 million and store 2 terms for every primary

RE: Polymorphic Index

2010-10-21 Thread Toke Eskildsen
From: Mark Harwood [markharw...@yahoo.co.uk] Good point, Toke. Forgot about that. Of course doubling the number of hash algos used to 4 increases the space massively. Maybe your hashing-idea could work even with collisions? Using your original two-hash suggestion, we're just about sure to get

Re: Polymorphic Index

2010-10-22 Thread Toke Eskildsen
On Fri, 2010-10-22 at 11:23 +0200, eks dev wrote: Both of these solutions are just better way to do it wrong :) The real solution is definitely somewhere around ParallelReader usage. The problem with parallel is with updates of documents. The IndexWriter takes terms and queries for

RE: Lucene project announcement

2010-11-19 Thread Toke Eskildsen
development. My gut feeling says the latter, but then again, I'm biased by being firmly in the low-level group. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail

Where do I go with ordinals?

2010-11-20 Thread Toke Eskildsen
of projects can benefit, but I would very much like to hear some thoughts on this. Thank you for listening, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h

Collator-based facet sorting in Solr

2012-09-11 Thread Toke Eskildsen
Claudio Ranieri and I briefly discussed collator based sorting for facets in the thread Problem with accented words sorting on the solr-user mailing list. Here's the idea: Solr faceting supports sorting by either count or index order. Claudio and I both need the order to be collator-based. My

Re: Collator-based facet sorting in Solr

2012-09-12 Thread Toke Eskildsen
On Tue, 2012-09-11 at 17:23 +0200, Robert Muir wrote: Just a concern where things could act a little funky: today for example, If I set strength=primary, then its going to fold Test and test to the same unique term, but under this scheme you would have bytesTest and bytestest as two terms.

Re: VOTE: release 4.0

2012-09-24 Thread Toke Eskildsen
On Mon, 2012-09-24 at 06:11 +0200, Robert Muir wrote: Artifacts are here: http://s.apache.org/lusolr40rc0 Sorry to interrupt as a non-voter, but I am afraid that https://issues.apache.org/jira/browse/SOLR-3875 might be a blocker for 4.0. Maybe a veteran could take a quick look? - Toke Eskildsen

Proper use of TermsEnum.seek?

2011-02-21 Thread Toke Eskildsen
My low-memory sorting/faceting-hacking requires terms to be accessed by ordinals. With Lucene 4.0 I cannot depend on TermsEnums supporting ord() and seek(long), so the code switches to a cache that keeps track of every X terms if they are not implemented. When the terms for an ordinal is

Re: Proper use of TermsEnum.seek?

2011-02-22 Thread Toke Eskildsen
, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Proper use of TermsEnum.seek?

2011-02-25 Thread Toke Eskildsen
the TermState could hold a reference to the BytesRef itself, if it is needed by the implementation? Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h

Re: A lucene performance test

2012-08-14 Thread Toke Eskildsen
2000 is strange. Worse performance than 5000 When I ran your test (results attached), the index at 20M had 76 files and the index at 50M had 46 files and I got the same slowdown at 20M as you did. More segments = more merge overhead. Thank you for sharing your test measurements, Toke

Re: Lucene tests killed one other SSD - Policeman Jenkins

2013-08-20 Thread Toke Eskildsen
/reviews/ssd-reliability-failure-rate,2923-3.html It is a bit old and does not speak well for the Vertex 2 series. So just to conclude: Lucene kills SSDs :-) I am an accomplice to murder!? Oh Noes! - Toke Eskildsen, happily using an old 160GB Intel X25 SSD with 11TB written and 3 reallocated

RE: VOTE: solr no longer webapp

2013-05-02 Thread Toke Eskildsen
? - Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: solr no longer webapp

2013-05-08 Thread Toke Eskildsen
as the most important component and that makes us somewhat blind to the situations where Solr is just another cog in a complex machinery. As the choice of how Solr is deployed is highly relevant for users and maintenance guys, hearing their point of view is important. - Toke Eskildsen, State

Re: VOTE: solr no longer webapp

2013-05-13 Thread Toke Eskildsen
for large scale projects but can also be used for small scale - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: OOM failures caused by java 1.7.0_09?

2012-12-21 Thread Toke Eskildsen
-a' and check that max user processes is sufficiently large. If the limit is fairly low, your reboot might explain why switching to 1.7.0_10 seemed to be the solution, as you probably had less running applications after reboot. - Toke Eskildsen

Line length in Lucene/Solr code

2013-02-25 Thread Toke Eskildsen
line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke

Re: Log level cleanup

2013-03-20 Thread Toke Eskildsen
. What is gained by logging queries outside of the standard logging framework? Wouldn't it be better to create a logger with an agreed-upon name, such as queries or interaction? - Toke Eskildsen - To unsubscribe, e-mail: dev

RE: [ANNOUNCE] Solr wiki editing change

2013-03-25 Thread Toke Eskildsen
Steve Rowe [sar...@gmail.com]: From now on, only people who appear on http://wiki.apache.org/solr/ContributorsGroup will be able to create/modify/delete wiki pages. TokeEskildsen would like to be added to the list and would like spammers to suffer greatly.

Re: Solr Ref Guide vs. Wiki

2014-04-07 Thread Toke Eskildsen
at least know that I can stop searching. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

maxThreads in Jetty

2014-04-25 Thread Toke Eskildsen
give the same throughput with lower memory requirements. By the logic above, maxThreads of 100 or maybe 200 would be an appropriate default for Jetty with Solr. So why the 10,000? - Toke Eskildsen, State and Univeristy Library, Denmark

Re: maxThreads in Jetty

2014-04-25 Thread Toke Eskildsen
if a limitation on threads is on the radar for Solr 5? Thank you, Toke Eskildsen, State and Univeristy Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

RE: maxThreads in Jetty

2014-04-26 Thread Toke Eskildsen
if there is no real limit on burst rate. - Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

RE: maxThreads in Jetty

2014-04-27 Thread Toke Eskildsen
by this. Tomcat and Jetty default to allowing 200 threads. Solr will not scale with container defaults, which is why the example sets maxThreads to 1. Are you talking about performance or deadlocks? - Toke Eskildsen

RE: maxThreads in Jetty

2014-04-27 Thread Toke Eskildsen
Shawn Heisey [s...@elyograg.org] wrote: On 4/27/2014 12:29 AM, Toke Eskildsen wrote: Are you talking about performance or deadlocks? Deadlocks. It's not a performance thing -- with only 200 threads allowed, Jetty will refuse to start the additional threads that a large Solr install wants

RE: maxThreads in Jetty

2014-04-29 Thread Toke Eskildsen
to have it as par of the Solr server instead of outside. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: (Issue) How improve solr facet performance

2014-05-23 Thread Toke Eskildsen
values or something third like I/O. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Slow searching limited but high rows across many shards all with high hits

2014-11-17 Thread Toke Eskildsen
resources from the system while performing the search. Can you outline what you are doing? Related to that, why are you running 50+ shards on each machine, when you're doing search across all shards? Why not fewer shards/machine and less distribution overhead? - Toke Eskildsen, State and University

RE: Slow searching limited but high rows across many shards all with high hits

2014-11-17 Thread Toke Eskildsen
a sounds insane, but it's probably correct-mindset. Anyway, setup accepted, problem acknowledged, your possibly re-usable solution not understood. - Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Toke Eskildsen
interesting as ID-resolving would not take up as much of the overall processing time. But it would make it possible to scaling that number up (top-1 or above). - Toke Eskildsen, State and University Library, Denmark

Re: Slow searching limited but high rows across many shards all with high hits

2014-11-18 Thread Toke Eskildsen
will still be able to benefit from doing the other one. I noticed that. Multiplying solutions are awesome. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

facet.mincount in SolrCloud

2014-06-16 Thread Toke Eskildsen
the logic here: When my request is for mincount 0, when does it ever make sense to have terms with count=0 returned from any shard? - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr

Re: facet.mincount in SolrCloud

2014-06-16 Thread Toke Eskildsen
of SOLR-5894, having mincount = 1 is essential there, but it seems like it would provide a speed-up to all distributed faceting with a sparse result set. Regards, Toke Eskildsen, State and University Library, Denmark

RE: facet.mincount in SolrCloud

2014-06-16 Thread Toke Eskildsen
ysee...@gmail.com [ysee...@gmail.com] On Behalf Of Yonik Seeley [yo...@heliosearch.com] wrote: On Mon, Jun 16, 2014 at 8:39 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: I do not understand the logic here: When my request is for mincount 0, when does it ever make sense to have terms

Determining NumericType for a field

2014-12-10 Thread Toke Eskildsen
to determine with certainty, I could use a way of performing a best-guess. On a similar note, does Lucene have a concept of single and multi-value stored fields or do I have to infer that by iterating all the documents and check each one? - Toke Eskildsen, State and University Library, Denmark

Re: Determining NumericType for a field

2014-12-15 Thread Toke Eskildsen
manner. Thanks for the pointer. As far as I can see, the demo is very explicit about the type of DocValues being long, so no auto-guessing there. It's a very interesting idea though, with seamless DV-enabling. Thank you, Toke Eskildsen, State and University Library, Denmark

Re: Determining NumericType for a field

2014-12-15 Thread Toke Eskildsen
On Mon, 2014-12-15 at 11:33 +0100, Michael McCandless wrote: On Mon, Dec 15, 2014 at 4:53 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: [Toke: Limit on faceting with many references] Hmm that's probably the DocTermOrds 16 MB internal addressing limit? Yes, we've hit that one before

Re: Determining NumericType for a field

2014-12-15 Thread Toke Eskildsen
for Disk. But thanks for the suggested fix. You could copy the code too to use newer Lucene versions… We looked at that sometime back and the code tentacles reached too far for us to dare grapple with. Regards, Toke Eskildsen, State and University Library, Denmark

Re: DocValues instead of stored values

2015-03-03 Thread Toke Eskildsen
FunctionValues that, unfortunately for us, are limited to single-value. We'll have to extract the multi-values explicitly with faceting or export, as Joel suggests, for the time being. - Toke Eskildsen, State and University Library, Denmark

Re: DocValues instead of stored values

2015-03-02 Thread Toke Eskildsen
or in the reference guide (my Google-fu is weak). - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

DocValues instead of stored values

2015-03-02 Thread Toke Eskildsen
=*. If a field is referenced explicitly with fl=myfield and is DocValued but not stored, return the DocValued value. * State that DocValued fields, that are not stored, should be returned with a flag: resolvedv=true - Toke Eskildsen, State and University Library, Denmark

RE: Optimize maxSegments=2 not working right with Solr 4.10.2

2015-02-25 Thread Toke Eskildsen
- much difference between 2 or 4 (or 10) segments. - Toke Eskildsen From: Tom Burton-West [tburt...@umich.edu] Sent: 25 February 2015 18:11 To: dev@lucene.apache.org Subject: Fwd: Optimize maxSegments=2 not working right with Solr 4.10.2 No replies

Re: Change line length setting in eclipse to 120 chars

2015-04-20 Thread Toke Eskildsen
On Sat, 2015-04-18 at 10:07 +0300, Shai Erera wrote: Our dev-tools/eclipse configure the project to break lines on 80 characters. Are there objections to change it to 120? Line length was discussed back in 2013 (search for Line length in Lucene/Solr code) and AFAIR the conclusion was not to

Re: Moving to git?

2015-05-30 Thread Toke Eskildsen
contributions? - Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Better DocSetCollector

2015-08-01 Thread Toke Eskildsen
this sound reasonable? Should I open a JIRA? Attempt a patch? - Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Better DocSetCollector

2015-08-03 Thread Toke Eskildsen
-is-it-worth-reusing-arrays-in-java In the case of an update-tracked structure, the cost of zeroing is linear to the amount of changed values. This makes it even harder to determine the best strategy as it will be tied to concrete index size and query pattern. - Toke Eskildsen, State and University

Re: Better DocSetCollector

2015-08-11 Thread Toke Eskildsen
that it looks very promising. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Main query runs in both phases of distributed search

2015-08-28 Thread Toke Eskildsen
of resolving its ordinal, then doing a lookup in the counter structure. Unfortunately that does not work for Numerics. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr

Re: Introducing Alba, a small framework to simplify Solr plugins development

2015-09-14 Thread Toke Eskildsen
med filter would (guessing here) be a matter of writing a small alba-annotated class that takes the filter-ID as input and returns the corresponding custom-made Filter, which really is just a list of docIDs underneath (probably represented as a bitmap). - Toke Eskildsen, State and University

Named filters (was: Introducing Alba, a small framework to simplify Solr plugins development)

2015-09-16 Thread Toke Eskildsen
nk you for bringing it to my attention, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Introducing Alba, a small framework to simplify Solr plugins development

2015-09-14 Thread Toke Eskildsen
ny feedback is very welcome. I know very little writing plugins, so I am in no position to qualify how much alba helps with that: From what I can see in your GitHub repository, it seems very accessible though. Thank

Sanity checking in Solr

2015-09-29 Thread Toke Eskildsen
in(rows, maxDoc) # ScoreDoc Objects temporarily , which can trigger excessive garbage # collection. # Alternative: Use pagination (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results) - Toke Eskildsen, State and University L

Re: Sanity checking in Solr

2015-09-30 Thread Toke Eskildsen
d. I'll take a closer look on how the debug mechanism ties into Solr. If sanity checking fits well, I'll try and make a proof of concept and a JIRA. - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail:

Re: Welcome Toke Eskildsen as a Lucene/Solr committer

2017-02-14 Thread Toke Eskildsen
Thank you for the invitation and the warm welcome. I am a 43 year old Danish man, with a family and a job at the Royal Danish Library, where I have been working mostly with search-related technology for 10 years. I have done a fair bit of Lucene/Solr hacking during the years, with focus on

Re: First stumble

2017-02-14 Thread Toke Eskildsen
page got published? I never got past the "Publish lucene site"-page and my current sort-correction is still in staging. Maybe someone else OK'ed the change? Thank you, Toke Eskildsen, Danish Royal Library - To unsub

First stumble

2017-02-14 Thread Toke Eskildsen
That did not take long... The initiation rite of adding my name to the committers list went well until it was time to publish. The Publish lucene site at https://cms.apache.org/lucene/publish shows nothing under "Authors:" and when I press "View Diff", the browser waits until I close the tab. I

Re: First stumble

2017-02-16 Thread Toke Eskildsen
On Wed, 2017-02-15 at 22:37 +, Toke Eskildsen wrote: > Jan Høydahl <jan@cominvent.com> wrote: > > https://ci.apache.org/builders/lucene-site-production > [...] > have been in contact with INFRA (Gavin McDonald on the HipChat- > channel) and he kicked something loo

Re: First stumble

2017-02-15 Thread Toke Eskildsen
Jan Høydahl wrote: > https://ci.apache.org/builders/lucene-site-production [...] > Toke, could you report this to INFRA perhaps? Looks like it has been failing > for several days... I have been in contact with INFRA (Gavin McDonald on the HipChat-channel) and he kicked

Re: HitQueue.getSentinelObject() and performance

2017-01-20 Thread Toke Eskildsen
osed win from pre-allocating the sentinels gets shadowed by overall processing. It only works well when hitcount is near top-N, where "near" is one of those things that are really hard to measure properly. - Toke Eskildsen -

Re: Solr configuration format fracturing

2016-09-28 Thread Toke Eskildsen
icles from source X, remove if that source is deprecated",   "type", "ImportantText",   "stored", "true",   ... }, ... } It would be great to have the content from such a documentation field pop up in the schema browser in the GUI. - Toke Eskildsen, State and University Library, Denmark

Re: Future of FieldCache in Solr

2016-11-23 Thread Toke Eskildsen
scenario be supported if uninversion is removed? - Toke Eskildsen, State and University Library, Denmark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Future of FieldCache in Solr

2016-11-24 Thread Toke Eskildsen
r a way that a random end-user can easily do faceting on analyzed terms, leveraging all the nice build-in filters in Solr. - Toke Eskildsen, State and University Library, Denmark

Re: Baby steps as new committer

2017-08-15 Thread Toke Eskildsen
n for 5 & 6 + master. Was that correct? Thank you, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Baby steps as new committer

2017-08-16 Thread Toke Eskildsen
nd that I to stay clear of any 'x'-versions, should they be created by others. Thank you, Toke Eskildsen, Royal Danish Library - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Baby steps as new committer

2017-08-15 Thread Toke Eskildsen
be closed? Will an accept be reflected at the Apache repo or should one close the pull-request without accept, and commit the code directly to the Apache-repo (by whatever method is easiest for transferring code between git repos)? Thank you, Toke Eskildsen, Royal Dani

Re: Baby steps as new committer

2017-08-21 Thread Toke Eskildsen
in isolation. Ah yes. That's me being overly cautious of (non-existing) unrelated changes. Cherry-pick with hash is the clean way. Thank you, Toke Eskildsen, Royal Danish Library - To unsubscribe, e-mail: dev-unsubscr...@lucen

Re: Baby steps as new committer

2017-08-21 Thread Toke Eskildsen
branch_7x and cherry-pick the changed files, check that everything works and commit. My I-think-I-am-doing-the-right-thing confidence level is rising, but I'll keep asking for sanity-checks for some time. - Toke Eskildsen, Royal Danis

Re: Lucene/Solr 7.6

2018-11-08 Thread Toke Eskildsen
grade to new major versions. Thanks, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr 7.6

2018-11-08 Thread Toke Eskildsen
2000+ line patch that has not been reviewed. It seems a bit forced to add it to 7.6, but on the other hand it will be tested thoroughly as part of the release process. What is the best action here? - Toke Eskildsen, Roya

DocValues, retrieval performance and policy

2018-09-24 Thread Toke Eskildsen
Doc Values, maybe I could be explained what the problem is or directed towards more information? - Toke Eskildsen, Royal Danish Library - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: de

Re: DocValues, retrieval performance and policy

2018-09-24 Thread Toke Eskildsen
disagreement about improving docValues in the ways > you suggest. You are right about that. I apologize if I was being unclear: It is not the concrete patch I am asking about, that's just how this started. I am asking for background on why it is considered misuse to use Doc Values for docume

Re: DocValues, retrieval performance and policy

2018-09-24 Thread Toke Eskildsen
t; So I think as usual, "it depends". I would like to think so, as that implies that it does make sense to consider if changes to Doc Values codec representation causes a performance regression, when using them to populate documents. - Toke Eskildsen --

Re: Solr Size Limitation upto 32 kb limitation

2019-01-17 Thread Toke Eskildsen
it did not solve your problem. Cc: to Kranthi as he might have mailinglist-related delivery problems. - Toke Eskildsen, royal Danish Library - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands,

Re: [1/4] lucene-solr:master: LUCENE-8374 part 1/4: Reduce reads for sparse DocValues

2018-12-04 Thread Toke Eskildsen
Toke Eskildsen wrote: > Gus Heck wrote: >> Precommit appears to be failing related to this series of commits > I apologize and will correct it right away. Fixed. ant precommit now passes for me on master. Thanks for the note Gus, To

Re: [1/4] lucene-solr:master: LUCENE-8374 part 1/4: Reduce reads for sparse DocValues

2018-12-03 Thread Toke Eskildsen
From: Gus Heck wrote: > Precommit appears to be failing related to this series of commits Thank you. I clearly did not perform this step, even if I thought I did. I apologize and will correct it right away. Toke Eskild

Re: Improving DocValue sorting performance

2019-03-13 Thread Toke Eskildsen
as the thread is a month old. - Toke Eskildsen, Royal Danish Library - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-1257) Port to Java5

2008-04-09 Thread Toke Eskildsen
On Tue, 2008-04-08 at 18:48 -0500, robert engels wrote: That is opposite of my testing:... The 'foreach' is consistently faster. The time difference is independent of the size of the array. What I know about JVM implementations, the foreach version SHOULD always be faster - because

Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
an easy alternative to buying more RAM would be nice. I would like to hear if Exposed sounds like a feasible idea to the more seasoned Lucene developers. Regards, Toke Eskildsen - To unsubscribe, e-mail: java-dev-unsubscr

RE: Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
methods both for simple Locale and for custom sorting, so I guess it would be the same for Exposed. Regards, Toke Eskildsen - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h

RE: Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
- to my knowledge - loads the Strings into memory. For my quick test, this means a tripling of memory usage for the sort field when indexing collatorKeys? Regards, Toke Eskildsen - To unsubscribe, e-mail: java-dev-unsubscr

RE: Sorting with little memory: A suggestion

2010-03-19 Thread Toke Eskildsen
vs. the 10M*log2(10M)/8 = 27MB for a compressed order array. Still, depending on how little space a byte-array will take in flex, using the indexed collator key approach might turn out to be the best choice in a lot of cases as it works really well for incremental updates. Regards, Toke

Changing the subject for a JIRA-issue (Was: [jira] Created: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field

2010-04-06 Thread Toke Eskildsen
The current subject and description of https://issues.apache.org/jira/browse/LUCENE-2335 is obsolete due to new knowledge. Is it possible to change it? If not, what is the policy here? To open a new issue and close the old one? Cc: To Michael McCandless as he is the reporter of the issue. If

[jira] Created: (SOLR-2412) Multipath hierarchical faceting

2011-03-09 Thread Toke Eskildsen (JIRA)
: 4.0 Environment: Fast IO when huge hierarchies are used Reporter: Toke Eskildsen Hierarchical faceting with slow startup, low memory overhead and fast response. Distinguishing features as compared to SOLR-64 and SOLR-792 are * Multiple paths per document * Query-time

[jira] Updated: (SOLR-2412) Multipath hierarchical faceting

2011-03-09 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toke Eskildsen updated SOLR-2412: - Attachment: SOLR-2412.patch Alpha-level patch (aka Proof Of Concept). Works with trunk@1066767

[jira] Commented: (SOLR-2412) Multipath hierarchical faceting

2011-03-16 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007411#comment-13007411 ] Toke Eskildsen commented on SOLR-2412: -- The syntax for calling is kept close to SOLR

[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-03-16 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007497#comment-13007497 ] Toke Eskildsen commented on SOLR-2403: -- Dividing by shard count is fairly risky

[jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount

2011-03-16 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13007513#comment-13007513 ] Toke Eskildsen commented on SOLR-2403: -- My first example was hills, while the second

[jira] [Commented] (SOLR-2396) add [ICU]CollationField

2011-03-22 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009627#comment-13009627 ] Toke Eskildsen commented on SOLR-2396: -- A rough idea: It seems that ICU Collator Keys

[jira] [Commented] (SOLR-2396) add [ICU]CollationField

2011-03-22 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/SOLR-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009665#comment-13009665 ] Toke Eskildsen commented on SOLR-2396: -- The JavaDoc for CollationKey is very explicit

[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055402#comment-13055402 ] Toke Eskildsen commented on LUCENE-3079: This is quite another design than

[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-27 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055480#comment-13055480 ] Toke Eskildsen commented on LUCENE-3079: SOLR-2412/LUCENE-2369 were created

[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-28 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056377#comment-13056377 ] Toke Eskildsen commented on LUCENE-3079: The patch compiles neatly against a 3x

[jira] [Commented] (LUCENE-3079) Facetiing module

2011-06-28 Thread Toke Eskildsen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056517#comment-13056517 ] Toke Eskildsen commented on LUCENE-3079: Some preliminary performance testing: I

  1   2   3   4   >