Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
specific, have real effect. Thanks, roman On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote: On 7/30/2013 6:59 PM, Roman Chyla wrote: I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
self.gen.next() File solrjmeter.py, line 229, in changed_dir os.chdir(new) OSError: [Errno 20] Not a directory: '/home/dmitry/projects/lab/solrjmeter/queries/demo/demo.queries' Best, Dmitry On Wed, Jul 31, 2013 at 7:21 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, probably

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
/2013 6:59 PM, Roman Chyla wrote: I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested

Re: How to uncache a query to debug?

2013-08-01 Thread Roman Chyla
When you set your cache (solrconfig.xml) to size=0, you are not using a cache. so you can debug more easily roman On Thu, Aug 1, 2013 at 1:12 PM, jimtronic jimtro...@gmail.com wrote: I have a query that runs slow occasionally. I'm having trouble debugging it because once it's cached, it runs

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
Hi, here is a short post describing the results of the yesterday run with added parameters as per Shawn's recommendation, have fun getting confused ;) http://29min.wordpress.com/2013/08/01/measuring-solr-performance-ii/ roman On Wed, Jul 31, 2013 at 12:32 PM, Roman Chyla roman.ch...@gmail.com

Re: Measuring SOLR performance

2013-08-01 Thread Roman Chyla
On Thu, Aug 1, 2013 at 6:11 PM, Shawn Heisey s...@elyograg.org wrote: On 8/1/2013 2:08 PM, Roman Chyla wrote: Hi, here is a short post describing the results of the yesterday run with added parameters as per Shawn's recommendation, have fun getting confused ;) http://29min.wordpress.com

Re: Measuring SOLR performance

2013-08-02 Thread Roman Chyla
.items(): Dmitry On Thu, Aug 1, 2013 at 6:41 PM, Roman Chyla roman.ch...@gmail.com wrote: Dmitry, Can you post the entire invocation line? roman On Thu, Aug 1, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, When I try to run with -q /home/dmitry

Re: Measuring SOLR performance

2013-08-05 Thread Roman Chyla
/demo.queries, but there is no such path in the fresh checkout. Nice to have the -t param. Dmitry On Sat, Aug 3, 2013 at 5:01 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, Thanks, It was a toothing problem, fixed now, please try the fresh checkout AND add the following to your

Re: Measuring SOLR performance

2013-08-06 Thread Roman Chyla
, Dmitry Kan wrote: Of three URLs you asked for, only the 3rd one gave response: snip The rest report 404. On Mon, Aug 5, 2013 at 8:38 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So I think the admin pages are different on your version of solr, what do you

Re: Measuring SOLR performance

2013-08-07 Thread Roman Chyla
Thanks! Dmitry On Wed, Aug 7, 2013 at 6:54 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, I've modified the solrjmeter to retrieve data from under the core (the -t parameter) and the rest from the /solr/admin - I could test it only against 4.0

Re: Percolate feature?

2013-08-09 Thread Roman Chyla
On Fri, Aug 9, 2013 at 11:29 AM, Mark static.void@gmail.com wrote: *All* of the terms in the field must be matched by the querynot vice-versa. Exactly. This is why I was trying to explain it as a reverse search. I just realized I describe it as a *large list of known keywords when

Re: Percolate feature?

2013-08-09 Thread Roman Chyla
On Fri, Aug 9, 2013 at 2:56 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I'll look into this. Thanks for the concrete example as I don't even : know which classes to start to look at to implement such a feature. Either roman isn't understanding what you are aksing for, or i'm not --

Re: Measuring SOLR performance

2013-08-12 Thread Roman Chyla
In case it matters: Python 2.7.3, ubuntu, solr 4.3.1. Thanks, Dmitry On Thu, Aug 8, 2013 at 2:22 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, The command seems good. Are you sure your shell is not doing something funny with the params? You could try: python solrjmeter.py

Re: Measuring SOLR performance

2013-08-13 Thread Roman Chyla
at 0x7fc6d4040fd0 is not JSON serializable Regards, D. On Tue, Aug 13, 2013 at 8:10 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, On Mon, Aug 12, 2013 at 9:36 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, Good point. I managed to run the command with -C

Re: Measuring SOLR performance

2013-08-22 Thread Roman Chyla
turnarounds, Dmitry On Wed, Aug 14, 2013 at 1:32 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, oh yes, late night fixes... :) The latest commit should make it work for you. Thanks! roman On Tue, Aug 13, 2013 at 3:37 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi

Re: Measuring SOLR performance

2013-09-02 Thread Roman Chyla
=/admin/cores (which suggests that this is the right value to be used for cores), and not with adminPath=/admin. Bottom line, these core configuration is not self-evident. Dmitry On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla roman.ch...@gmail.com wrote

Re: Measuring SOLR performance

2013-09-03 Thread Roman Chyla
) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry

Re: Dynamic Query Analyzer

2013-09-03 Thread Roman Chyla
You don't need to index fields several times, you can index is just into one field, and use the different query analyzers just to build the query. We're doing this for authors, for example - if query language says =author:einstein, the query parser knows this field should be analyzed differently

Web App Engineer at Harvard-Smithsonian Astrophysical Observatory, full time, indefinite contract

2013-10-07 Thread Roman Chyla
online at: http://www.cfa.harvard.edu/hr/postings/13-32.html Thank you, Roman -- Dr. Roman Chyla ADS, Harvard-Smithsonian Center for Astrophysics roman.ch...@gmail.com

Re: Solr's Filtering approaches

2013-10-12 Thread Roman Chyla
David, We have a similar query in astrophysics, an user can select an area of the skymany stars out there I am long overdue in creating a Jira issue, but here you have another efficient mechanism for searching large number of ids

Re: Complex Queries in solr

2013-10-20 Thread Roman Chyla
i just tested it whether our 'beautifu' parser supports it, and funnily enough, it does :-) https://github.com/romanchyla/montysolr/commit/f88577345c6d3a2dbefc0161f6bb07a549bc6b15 but i've (kinda) given up hope that people need powerful query parsers in the lucene world, the LUCENE-5014 is there

Re: Compound words

2013-10-28 Thread Roman Chyla
Hi Parvesh, I think you should check the following jira https://issues.apache.org/jira/browse/SOLR-5379. You will find there links to other possible solutions/problems:-) Roman On 28 Oct 2013 09:06, Erick Erickson erickerick...@gmail.com wrote: Consider setting expand=true at index time. That

Re: Recherche avec et sans espaces

2013-11-04 Thread Roman Chyla
Hi Antoine, I'll permit myself to respond in English, cause my written French is slower;-) Your problem is a well known amongst Sold users, the query parser splits tokens by empty space, so the analyser never sees input 'la redoutte' but it receives 'la' 'reroute'. You can of course enclose your

Inconsistent number of hits returned by two solr instances (from the same index!)

2013-11-06 Thread Roman Chyla
Hello, We have two solr searchers/instances (read-only). They read the same index, but they did not return the same #hits for a particular query Log is below, but to summarize: first server always returns 576 hits, the second server returns: 440, 440, 576, 576... These are just few seconds

Re: Inconsistent number of hits returned by two solr instances (from the same index!)

2013-11-06 Thread Roman Chyla
/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Nov 6, 2013 at 4:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, We have two solr searchers/instances (read

Re: Inconsistent number of hits returned by two solr instances (from the same index!)

2013-11-07 Thread Roman Chyla
/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Nov 6, 2013 at 6:40 PM, Roman Chyla roman.ch...@gmail.com wrote: No, and I should add that this query was not against

building custom cache - using lucene docids

2013-11-22 Thread Roman Chyla
Hi, docids are 'ephemeral', but i'd still like to build a search cache with them (they allow for the fastest joins). i'm seeing docids keep changing with updates (especially, in the last index segment) - as per https://issues.apache.org/jira/browse/LUCENE-2897 That would be fine, because i could

Re: building custom cache - using lucene docids

2013-11-23 Thread Roman Chyla
with openSearcher=false don't open new searchers, which is why changes aren't visible until a softCommit or a hard commit with openSearcher=true despite the fact that the segments are closed. FWIW, Erick Best Erick On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi

Re: building custom cache - using lucene docids

2013-11-25 Thread Roman Chyla
of operations, which is something I'm not all that familiar with so I'll leave explanations to others. Thank you, it is useful to get insights from various sides, roman On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Erick, Many thanks for the info

Re: building custom cache - using lucene docids

2013-11-25 Thread Roman Chyla
, which is something I'm not all that familiar with so I'll leave explanations to others. On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Erick, Many thanks for the info. An additional question: Do i understand you correctly that when two segmets get merged

Re: building custom cache - using lucene docids

2013-11-25 Thread Roman Chyla
a state (of previous index) - as they can be shared by threads that build the cache Best, roman On Sat, Nov 23, 2013 at 9:40 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, docids are 'ephemeral', but i'd still like to build a search cache with them (they allow for the fastest joins

Re: building custom cache - using lucene docids

2013-11-25 Thread Roman Chyla
:54 PM, Roman Chyla roman.ch...@gmail.com wrote: On Mon, Nov 25, 2013 at 12:54 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Roman, I don't fully understand your question. After segment is flushed it's never changed, hence segment-local docids are always the same. Due to merge

Caches contain deleted docs (?)

2013-11-27 Thread Roman Chyla
Hi, I'd like to check - there is something I don't understand about cache - and I don't know if it is a bug, or feature the following calls return a cache FieldCache.DEFAULT.getTerms(reader, idField); FieldCache.DEFAULT.getInts(reader, idField, false); the resulting arrays *will* contain

Re: Caches contain deleted docs (?)

2013-11-27 Thread Roman Chyla
expected. Segments are write-once. It's been a long standing design that deleted data will be reclaimed on segment merge, but not before. It's pretty expensive to change the terms loaded on the fly to respect deleted document's removed data. Best, Erick On Wed, Nov 27, 2013 at 4:07 PM, Roman

Re: Bad fieldNorm when using morphologic synonyms

2013-12-09 Thread Roman Chyla
Isaac, is there an easy way to recognize this problem? We also index synonym tokens in the same position (like you do, and I'm sure that our positions are set correctly). I could test whether the default similarity factory in solrconfig.xml had any effect (before/after reindexing). --roman On

Re: Commit Issue in Solr 3.4

2014-02-08 Thread Roman Chyla
I would be curious what the cause is. Samarth says that it worked for over a year /and supposedly docs were being added all the time/. Did the index grew considerably in the last period? Perhaps he could attach visualvm while it is in the 'black hole' state to see what is actually going on. I

Re: Commit Issue in Solr 3.4

2014-02-08 Thread Roman Chyla
objects with holding to some big object etc/. Btw if i study the graph, i see that there *are* warning signs. That's the point of testing/measuring after all, IMHO. --roman On 8 Feb 2014 13:51, Shawn Heisey s...@elyograg.org wrote: On 2/8/2014 11:02 AM, Roman Chyla wrote: I would be curious what

Re: Solr4 performance

2014-02-12 Thread Roman Chyla
And perhaps one other, but very pertinent, recommendation is: allocate only as little heap as is necessary. By allocating more, you are working against the OS caching. To know how much is enough is bit tricky, though. Best, roman On Wed, Feb 12, 2014 at 2:56 PM, Shawn Heisey

Re: APACHE SOLR: Pass a file as query parameter and then parse each line to form a criteria

2014-02-13 Thread Roman Chyla
Hi Rajeev, You can take this: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3CCAEN8dyX_Am_v4f=5614eu35fnhb5h7dzkmkzdfwvrrm1xpq...@mail.gmail.com%3E I haven't created the jira yet, but I have improved the plugin. Recently, I have seen a use case of passing 90K identifiers

Re: filtering/faceting by a big list of IDs

2014-02-13 Thread Roman Chyla
Hi Tri, Look at this: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3CCAEN8dyX_Am_v4f=5614eu35fnhb5h7dzkmkzdfwvrrm1xpq...@mail.gmail.com%3E Roman On 13 Feb 2014 03:39, Tri Cao tm...@me.com wrote: Hi Joel, Thanks a lot for the suggestion. After thinking more about

Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-24 Thread Roman Chyla
perhaps useful, here is an open source implementation with near[digit] support, incl analysis of proximity tokens. When days become longer maybe itwill be packaged into a nice lib...:-) https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/grammars/ADS.g On 25 Mar 2014 00:14, Salman

Re: What is the usage of solr.NumericPayloadTokenFilterFactory

2014-05-17 Thread Roman Chyla
Hi, What will replace spans, if spans are nuked ? Roman On 17 May 2014 09:15, Ahmet Arslan iori...@yahoo.com wrote: Hi, Payloads are used to store arbitrary data along with terms. You can influence score with these arbitrary data. See :

Re: Anti-Pattern in lucent-join jar?

2014-12-04 Thread Roman Chyla
+1, additionally (as it follows from your observation) the query can get out of sync with the index, if eg it was saved for later use and ran against newly opened searcher Roman On 4 Dec 2014 10:51, Darin Amos dari...@gmail.com wrote: Hello All, I have been doing a lot of research in building

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
keys, hence it exclude such leakage across different searchers. On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla roman.ch...@gmail.com wrote: +1, additionally (as it follows from your observation) the query can get out of sync with the index, if eg it was saved for later use and ran against newly

Re: Anti-Pattern in lucent-join jar?

2014-12-05 Thread Roman Chyla
? I might not have followed you, this discussing challenges my understanding of Lucene and SOLR. Darin On Dec 5, 2014, at 12:47 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Mikhail, I think you are right, it won't be problem for SOLR, but it is likely an antipattern inside

Re: Queries not supported by Lucene Query Parser syntax

2015-01-01 Thread Roman Chyla
Hi Leonid, I didn't look into solr qparser for a long time, but I think you should be able to combine different query parsers in one query. Look at the SolrQueryParser code, maybe now you can specify custom query parser for every clause (?), st like: foo AND {!lucene}bar I dont know, but worth

Re: shards per disk

2015-01-20 Thread Roman Chyla
I think this makes sense to (ie. the setup), since the search is getting 1K documents each time (for textual analysis, ie. they are probably large docs), and use Solr as a storage (which is totally fine) then the parallel multiple drive i/o shards speed things up. The index is probably large, so

New UI for SOLR-based projects

2015-01-30 Thread Roman Chyla
Hi everybody, There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly coming. Rather than describing it in words, please see it in action for yourself at

Re: New UI for SOLR-based projects

2015-01-30 Thread Roman Chyla
, Roman On 30 Jan 2015 21:51, Shawn Heisey apa...@elyograg.org wrote: On 1/30/2015 1:07 PM, Roman Chyla wrote: There exists a new open-source implementation of a search interface for SOLR. It is written in Javascript (using Backbone), currently in version v1.0.19 - but new features are constantly

Re: SOLR - any open source framework

2015-01-06 Thread Roman Chyla
), but that was one year ago... On Tue, Jan 6, 2015 at 5:20 PM, Vishal Swaroop vishal@gmail.com wrote: Thanks Roman... I will check it... Maybe it's off topic but how about Angular... On Jan 6, 2015 5:17 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Vishal, Alexandre, Here is another one

Re: SOLR - any open source framework

2015-01-06 Thread Roman Chyla
Hi Vishal, Alexandre, Here is another one, using Backbone, just released v1.0.16 https://github.com/adsabs/bumblebee you can see it in action: http://ui.adslabs.org/ While it primarily serves our own needs, I tried to architect it to be extendible (within reasonable limits of code, man power)

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
I'm not sure I understand - the autophrasing filter will allow the parser to see all the tokens, so that they can be parsed (and multi-token synonyms) identified. So if you are using the same analyzer at query and index time, they should be able to see the same stuff. are you using multi-token

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
should have retrieved it; but it doesnt. What could I be doing wrong? On Wed, Apr 29, 2015 at 2:10 AM, Roman Chyla roman.ch...@gmail.com wrote: I'm not sure I understand - the autophrasing filter will allow the parser to see all the tokens, so that they can be parsed (and multi-token

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
, start: 0, docs: [] }, debug: { rawquerystring: tween 20, querystring: tween 20, parsedquery: name:tweenx20, parsedquery_toString: name:tweenx20, explain: {}, Thank you, Kaushik On Wed, Apr 29, 2015 at 4:00 PM, Roman Chyla roman.ch...@gmail.com wrote

Re: Injecting synonymns into Solr

2015-05-04 Thread Roman Chyla
It shouldn't matter. Btw try a url instead of a file path. I think the underlying loading mechanism uses java File , it could work. On May 4, 2015 2:07 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Would like to check, will this method of splitting the synonyms into multiple files use up

Re: Mutli term synonyms

2015-04-29 Thread Roman Chyla
, Roman Chyla roman.ch...@gmail.com wrote: Hi Kaushik, I meant to compare tween 20 against tween 20. Your autophrase filter replaces whitespace with x, but your synonym filter expects whitespaces. Try that. Roman On Apr 29, 2015 2:27 PM, Kaushik kaushika...@gmail.com wrote: Hi

Re: How to use BitDocSet within a PostFilter

2015-08-03 Thread Roman Chyla
Hi, inStockSkusBitSet.get(currentChildDocNumber) Is that child a lucene id? If yes, does it include offset? Every index segment starts at a different point, but docs are numbered from zero. So to check them against the full index bitset, I'd be doing Bitset.exists(indexBase + docid) Just one

Re: Forking Solr

2015-10-17 Thread Roman Chyla
I've taken the route of extending solr, the repo checks out solr and builds on top of that. The hard part was to figure out how to use solr test classes and the default location for integration tests, but once there, it is relatively easy. Google for montysolr, the repo is on github. Roman On Oct

Re: Scramble data

2015-10-08 Thread Roman Chyla
Or you could also apply XSL to returned records: https://wiki.apache.org/solr/XsltResponseWriter On Thu, Oct 8, 2015 at 5:06 PM, Uwe Reh wrote: > Hi, > > my suggestions are probably to simple, because they are not a real > protection of privacy. But maybe one fits

Re: Reverse query?

2015-10-02 Thread Roman Chyla
I'd like to offer another option: you say you want to match long query into a document - but maybe you won't know whether to pick "Mad Max" or "Max is" (not mentioning the performance hit of "*mad max*" search - or is it not the case anymore?). Take a look at the NGram tokenizer (say size of 2;

Jetty refuses connections

2016-05-16 Thread Roman Chyla
Hi, I'm hoping someone has seen/encountered a similar problem. We have solr instances with all Jetty threads in BLOCKED state. The application does not respond to any http requests. It is SOLR 4.9 running inside docker on Amazon EC2. Jetty is 8.1 and there is an nginx proxy in front of it (with

Re: The most efficient way to get un-inverted view of the index?

2016-08-17 Thread Roman Chyla
are available. --roman On Tue, Aug 16, 2016 at 9:54 PM, Joel Bernstein <joels...@gmail.com> wrote: > You'll want to use org.apache.lucene.index.DocValues. The DocValues api has > replaced the field cache. > > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > >

The most efficient way to get un-inverted view of the index?

2016-08-16 Thread Roman Chyla
I need to read data from the index in order to build a special cache. Previously, in SOLR4, this was accomplished with FieldCache or DocTermOrds Now, I'm struggling to see what API to use, there is many of them: on lucene level: UninvertingReader.getNumericDocValues (and others)

Re: The most efficient way to get un-inverted view of the index?

2016-08-17 Thread Roman Chyla
} transformer.process(docBase, i); i++; } } } } On Wed, Aug 17, 2016 at 1:22 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > Joel, thanks, but which of them? I've counted at least 4, if not more, > different ways of how to get DocValues. Are there many functi

storing large text fields in a database? (instead of inside index)

2018-02-20 Thread Roman Chyla
Hello, We have a use case of a very large index (slave-master; for unrelated reasons the search cannot work in the cloud mode) - one of the fields is a very large text, stored mostly for highlighting. To cut down the index size (for purposes of replication/scaling) I thought I could try to save

Re: storing large text fields in a database? (instead of inside index)

2018-02-21 Thread Roman Chyla
Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 20 Feb 2018, at 20:39, Roman Chyla <roman.ch...@gmail.com> wrote: > > > > Say there is a high load and I'd like to bring a new machine and let it > > replicate the index, if 10

Re: storing large text fields in a database? (instead of inside index)

2018-02-20 Thread Roman Chyla
east. > > On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla <roman.ch...@gmail.com> > wrote: > > > Hello, > > > > We have a use case of a very large index (slave-master; for unrelated > > reasons the search cannot work in the cloud mode) - one of the fields is

<    1   2