Re: Semantic autocomplete with Solr

2012-02-14 Thread Roman Chyla
done something along these lines: https://svnweb.cern.ch/trac/rcarepo/wiki/InspireAutoSuggest#Autosuggestautocompletefunctionality but you would need MontySolr for that - https://github.com/romanchyla/montysolr roman On Tue, Feb 14, 2012 at 11:10 PM, Octavian Covalschi

Re: Regexp and speed

2012-11-30 Thread Roman Chyla
, 1749708, 1744494] On Fri, Nov 30, 2012 at 12:13 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi, Some time ago we have done some measurement of the performance fo the regexp queries and found that they are VERY FAST! We can't be grateful enough, it saves many days/lives ;) This was an old

Re: Multi word synonyms

2012-11-30 Thread Roman Chyla
Try separating multi word synonyms with a null byte simple\0syrup,sugar\0syrup,stock\0syrup see https://issues.apache.org/jira/browse/LUCENE-4499 for details roman On Sun, Feb 5, 2012 at 10:31 PM, Zac Smith z...@trinkit.com wrote: Thanks for your response. When I don't include the

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Roman Chyla
@wunder It is a misconception (well, supported by that wiki description) that the query time synonym filter have these problems. It is actually the default parser, that is causing these problems. Look at this if you still think that index time synonyms are cure for all:

Re: Can a field with defined synonym be searched without the synonym?

2012-12-12 Thread Roman Chyla
, it was TV and television. Documents with TV had higher scores than those with television. wunder On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote: @wunder It is a misconception (well, supported by that wiki description) that the query time synonym filter have these problems. It is actually

Re: MoreLikeThis supporting multiple document IDs as input?

2012-12-26 Thread Roman Chyla
Jay Luker has written MoreLikeThese which is probably what you want. You may give it a try, though I am not sure if it works with Solr4.0 at this point (we didn't port it yet)

Re: Getting Lucense Query from Solr query (Or converting Solr Query to Lucense's query)

2013-01-07 Thread Roman Chyla
if you are inside solr, as it seems to be the case, you can do this QParserPlugin qplug = req.getCore().getQueryPlugin(LuceneQParserPlugin.NAME); QParser parser = qplug.createParser(PATIENT_GENDER:Male OR STUDY_DIVISION:\Cancer Center\, null, req.getParams(), req); Query q = parser.parse();

Re: unittest fail (sometimes) for float field search

2013-01-08 Thread Roman Chyla
apparently, it fails also with @SuppressCodecs(Lucene3x) roman On Tue, Jan 8, 2013 at 6:15 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi, I have a float field 'read_count' - and unittest like: assertQ(req(q, read_count:1.0), //doc/int[@name='recid'][.='9218920

Re: unittest fail (sometimes) for float field search

2013-01-08 Thread Roman Chyla
The test checks we are properly getting/indexing data - we index database and fetch parts of the documents separately from mongodb. You can look at the file here:

Re: unittest fail (sometimes) for float field search

2013-01-09 Thread Roman Chyla
8, 2013 at 7:34 PM, Roman Chyla roman.ch...@gmail.com wrote: The test checks we are properly getting/indexing data - we index database and fetch parts of the documents separately from mongodb. You can look at the file here: https://github.com/romanchyla/montysolr/blob

Re: Large data importing getting rollback with solr

2013-01-22 Thread Roman Chyla
hi, it is probably correct to revisit your design/requirements, but it you still find you need it, then there may be a different way DIH is using a writer to commit documents, you can detect errors inside these and try to recover - ie. in some situations, you want to commit, instead of calling

Re: Getting Lucense Query from Solr query (Or converting Solr Query to Lucense's query)

2013-02-04 Thread Roman Chyla
You could use LocalSolrQueryRequest to create the request, but it is not necessary, if all what you need is to get the lucene query parser, just do: import org.apache.lucene.queryparser.classic.QueryParser qp = new QueryParser(Version.LUCENE_40, defaultField, new SimpleAnalyzer()); Query q =

Re: Anyone else see this error when running unit tests?

2013-02-04 Thread Roman Chyla
Me too, it fails randomly with test classes. We use Solr4.0 for testing, no maven, only ant. --roman On 4 Feb 2013 20:48, Mike Schultz mike.schu...@gmail.com wrote: Yes. Just today actually. I had some unit test based on AbstractSolrTestCase which worked in 4.0 but in 4.1 they would fail

Re: what do you use for testing relevance?

2013-02-13 Thread Roman Chyla
, -- Steffen On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote: Hi, I do realize this is a very broad question, but still I need to ask it. Suppose you make a change into the scoring formula. How do you test/know/see what impact it had? Any framework out

Re: [ANN] vifun: tool to help visually tweak Solr boosting

2013-02-25 Thread Roman Chyla
Oh, wonderful! Thank you :) I was hacking some simple python/R scripts that can do a similar job for qf... the idea was to let the algorithm create possible combinations of params and compare that against the baseline. Would it be possible/easy to instruct the tool to harvest results for

Re: Formal Query Grammar

2013-02-27 Thread Roman Chyla
Or if you prefer EBNF, look here (but it differs slghtly from the grammar Jack linked to): https://github.com/romanchyla/montysolr/blob/master/contrib/antlrqueryparser/grammars/StandardLuceneGrammar.g roman On Wed, Feb 27, 2013 at 1:38 PM, Jack Krupansky j...@basetechnology.comwrote: Right

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Roman Chyla
hi Andy, It seems like a common type of operation and I would be also curious what others think. My take on this is to create a compressed intbitset and send it as a query filter, then have the handler decompress/deserialize it, and use it as a filter query. We have already done experiments with

Re: How can I limit my Solr search to an arbitrary set of 100,000 documents?

2013-03-08 Thread Roman Chyla
to be able to do 100,000 random access disk IOs in 2 seconds, let alone process the results. wunder On Mar 8, 2013, at 9:32 AM, Roman Chyla wrote: hi Andy, It seems like a common type of operation and I would be also curious what others think. My take on this is to create a compressed

How to plug a new ANTLR grammar

2011-09-13 Thread Roman Chyla
Hi, The standard lucene/solr parsing is nice but not really flexible. I saw questions and discussion about ANTLR, but unfortunately never a working grammar, so... maybe you find this useful: https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr In

Re: How to plug a new ANTLR grammar

2011-09-14 Thread Roman Chyla
Query from the queries produced in the tree parsing. Hope this helps. Peter On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy jason...@gmail.com wrote: I'd love to see the progress on this. On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, The standard lucene

Re: ANTLR SOLR query/filter parser

2011-09-22 Thread Roman Chyla
Hi, I agree that people can register arbitrary qparsers, however the question might have been understoo differently - about the ANLR parser that can handle what solr qparser does (and that one is looking at _query_: and similar stuff -- or at local params, which is what can be copypasted into the

Is there anything like MultiSearcher?

2011-02-05 Thread Roman Chyla
Dear Solr experts, Could you recommend some strategies or perhaps tell me if I approach my problem from a wrong side? I was hoping to use MultiSearcher to search across multiple indexes in Solr, but there is no such a thing and MultiSearcher was removed according to this post:

Re: Is there anything like MultiSearcher?

2011-02-05 Thread Roman Chyla
be also faster Cheers, roman On Sat, Feb 5, 2011 at 10:02 PM, Bill Bell billnb...@gmail.com wrote: Why not just use sharding across the 2 cores? On 2/5/11 8:49 AM, Roman Chyla roman.ch...@gmail.com wrote: Dear Solr experts, Could you recommend some strategies or perhaps tell me if I approach

multiple localParams for each query clause

2011-03-02 Thread Roman Chyla
Hi, Is it possible to set local arguments for each query clause? example: {!type=x q.field=z}something AND {!type=database}something I am pulling together result sets coming from two sources, Solr index and DB engine - however I realized that local parameters apply only to the whole query -

Re: multiple localParams for each query clause

2011-03-02 Thread Roman Chyla
-in-solr/ On 3/2/2011 10:24 AM, Roman Chyla wrote: Hi, Is it possible to set local arguments for each query clause? example: {!type=x q.field=z}something AND {!type=database}something I am pulling together result sets coming from two sources, Solr index and DB engine - however I realized

Re: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-08 Thread Roman Chyla
Hi, what you want to do is not that difficult, you can use json, eg. try: conn = urllib.urlopen(url, params) page = conn.read() rsp = simplejson.loads(page) conn.close() return rsp except Exception, e: log.error(str(e))

Re: 4.0.ALPHA vs 4.0 branch/trunk - what is best for upgrade?

2012-07-15 Thread Roman Chyla
, I'd update what you have to look at /solr/collection1 rather than simply /solr. It's still the default core, so simple URLs without the core name will still work. It won't affect HTTP communication. Just file system location. On Jul 14, 2012, at 9:54 PM, Roman Chyla wrote: Hi

java.lang.AssertionError: System properties invariant violated.

2012-07-17 Thread Roman Chyla
Hello, (Please excuse cross-posting, my problem is with a solr component, but the underlying issue is inside the lucene test-framework) I am porting 3x unittests to the solr/lucene trunk. My unittests are OK and pass, but in the end fail because the new rule checks for modifier properties. I

Re: java.lang.AssertionError: System properties invariant violated.

2012-07-18 Thread Roman Chyla
Thank you! I haven't really understood the LuceneTestCase.classRules before this. roman On Wed, Jul 18, 2012 at 3:11 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I am porting 3x unittests to the solr/lucene trunk. My unittests are : OK and pass, but in the end fail because the new

Re: using Solr to search for names

2012-07-22 Thread Roman Chyla
Or for names that are more involved, you can use special tokenizer/filter chain and index different variants of the name into one index example: https://github.com/romanchyla/montysolr/blob/solr-trunk/contrib/adsabs/src/java/org/apache/lucene/analysis/synonym/AuthorSynonymFilter.java roman On

Re: Batch Search Query

2013-03-28 Thread Roman Chyla
Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping segments, this algorithm will dramatically reduce the number of segments you

Re: Batch Search Query

2013-03-28 Thread Roman Chyla
. On Thu, Mar 28, 2013 at 11:16 AM, Roman Chyla roman.ch...@gmail.com wrote: Apologies if you already do something similar, but perhaps of general interest... One (different approach) to your problem is to implement a local fingerprint - if you want to find documents with overlapping

Re: Query Parser OR AND and NOT

2013-04-15 Thread Roman Chyla
should be: -city:H* OR zip:30* On Mon, Apr 15, 2013 at 12:03 PM, Peter Schütt newsgro...@pstt.de wrote: Hallo, I do not really understand the query language of the SOLR-Queryparser. I use SOLR 4.2 und I have nearly 20 sample address records in the SOLR-Database. I only use the q

Re: Query Parser OR AND and NOT

2013-04-15 Thread Roman Chyla
http://labs.adsabs.harvard.edu/adsabs/search/?q=%28-abstract%3Ablack%29+AND+abstract%3Ahole*db_key=ASTRONOMYsort_type=DATE roman On Mon, Apr 15, 2013 at 12:25 PM, Peter Schütt newsgro...@pstt.de wrote: Hallo, Roman Chyla roman.ch...@gmail.com wrote in news:caen8dywjrl

Why filter query doesn't use the same query parser as the main query?

2013-04-16 Thread Roman Chyla
Hi, Is there some profound reason why the defType is not passed onto the filter query? Both query and filterQuery are created inside the QueryComponent, however differently: QParser parser = QParser.getParser(rb.getQueryString(), defType, req); QParser fqp = QParser.getParser(fq, null, req);

Re: Why filter query doesn't use the same query parser as the main query?

2013-04-17 Thread Roman Chyla
, Apr 16, 2013 at 9:44 PM, Roman Chyla roman.ch...@gmail.com wrote: Is there some profound reason why the defType is not passed onto the filter query? defType is a convenience so that the main query parameter q can directly be the user query (without specifying it's type like edismax). Filter

Re: List of Solr Query Parsers

2013-05-06 Thread Roman Chyla
Hi Jan, Please add this one http://29min.wordpress.com/category/antlrqueryparser/ - I can't edit the wiki This parser is written with ANTLR and on top of lucene modern query parser. There is a version which implements Lucene standard QP as well as a version which includes proximity operators,

Re: List of Solr Query Parsers

2013-05-06 Thread Roman Chyla
syntax is really something I think we should get into the default lucene parser. Can't wait to have a look at your code. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 15:41 skrev Roman Chyla roman.ch...@gmail.com: Hi Jan, Please add this one http

RE: Solr Cloud with large synonyms.txt

2013-05-07 Thread Roman Chyla
We have synonym files bigger than 5MB so even with compression that would be probably failing (not using solr cloud yet) Roman On 6 May 2013 23:09, David Parks davidpark...@yahoo.com wrote: Wouldn't it make more sense to only store a pointer to a synonyms file in zookeeper? Maybe just make the

RE: Solr Cloud with large synonyms.txt

2013-05-08 Thread Roman Chyla
David, have you seen the finite state automata the synonym lookup is built on? The lookup is very efficient and fast. You have a point though, it is going to fail for someone. Roman On 8 May 2013 03:11, David Parks davidpark...@yahoo.com wrote: I can see your point, though I think edge cases

Re: Portability of Solr index

2013-05-10 Thread Roman Chyla
Hi Mukesh, This seems like something lucene developers should be aware of - you have probably spent quiet some time to find problem/solution. Could you create a JIRA ticket? Roman On 10 May 2013 03:29, mukesh katariya mukesh.katar...@e-zest.in wrote: There is a problem with Base64 encoding.

Re: List of Solr Query Parsers

2013-05-22 Thread Roman Chyla
. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 19:58 skrev Roman Chyla roman.ch...@gmail.com: Hi Jan, My login is RomanChyla Thanks, Roman On 6 May 2013 10:00, Jan Høydahl jan@cominvent.com wrote: Hi Roman, This sounds great

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
You are right that starting to parse the query before the query component can get soon very ugly and complicated. You should take advantage of the flex parser, it is already in lucene contrib - but if you are interested in the better version, look at

Re: Prevention of heavy wildcard queries

2013-05-27 Thread Roman Chyla
overcome this issue? you can also try modifying the standard solr parser, or even the JavaCC generated classes I believe many people do just that (or some sort of preprocessing) roman On Mon, May 27, 2013 at 10:15 PM, Roman Chyla roman.ch...@gmail.com wrote: You are right that starting to parse

Re: Solr/Lucene Analayzer That Writes To File

2013-05-28 Thread Roman Chyla
You can store them and then use different analyzer chains on it (stored, doesn't need to be indexed) I'd probably use the collector pattern se.search(new MatchAllDocsQuery(), new Collector() { private AtomicReader reader; private int i = 0; @Override public boolean

Re: how are you handling killer queries?

2013-06-03 Thread Roman Chyla
I think you should take a look at the TimeLimitingCollector (it is used also inside SolrIndexSearcher). My understanding is that it will stop your server from consuming unnecessary resources. --roman On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: How are

Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the

Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch

Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Roman Chyla
the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master

Re: New operator.

2013-06-17 Thread Roman Chyla
Hello Yanis, We are probably using something similar - eg. 'functional operators' - eg. edismax() to treat everything inside the bracket as an argument for edismax, or pos() to search for authors based on their position. And invenio() which is exactly what you describe, to get results from

Re: Avoiding OOM fatal crash

2013-06-17 Thread Roman Chyla
I think you can modify the response writer and stream results instead of building them first and then sending in one go. I am using this technique to dump millions of docs in json format - but in your case you may have to figure out how to dump during streaming if you don't want to save data to

Re: UnInverted multi-valued field

2013-06-19 Thread Roman Chyla
On Wed, Jun 19, 2013 at 5:30 AM, Jochen Lienhard lienh...@ub.uni-freiburg.de wrote: Hi @all. We have the problem that after an update the index takes to much time for 'warm up'. We have some multivalued facet-fields and during the startup solr creates the messages: INFO: UnInverted

Re: cores sharing an instance

2013-06-29 Thread Roman Chyla
Cores can be reloaded, they are inside solrcore loader /I forgot the exact name/, and they will have different classloaders /that's servlet thing/, so if you want singletons you must load them outside of the core, using a parent classloader - in case of jetty, this means writing your own jetty

Re: cores sharing an instance

2013-07-01 Thread Roman Chyla
other cores then? thank you Roman On Jun 29, 2013, at 10:58 AM, Roman Chyla roman.ch...@gmail.com wrote: Cores can be reloaded, they are inside solrcore loader /I forgot the exact name/, and they will have different classloaders /that's servlet thing/, so if you want singletons you must

Re: Solr large boolean filter

2013-07-02 Thread Roman Chyla
Hello @, This thread 'kicked' me into finishing som long-past task of sending/receiving large boolean (bitset) filter. We have been using bitsets with solr before, but now I sat down and wrote it as a qparser. The use cases, as you have discussed are: - necessity to send lng list of ids as

Re: Solr large boolean filter

2013-07-02 Thread Roman Chyla
Wrong link to the parser, should be: https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/java/org/apache/solr/search/BitSetQParserPlugin.java On Tue, Jul 2, 2013 at 1:25 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello @, This thread 'kicked' me into finishing som long

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
for the readers and 'native' for the writer, which seems to work OK roman On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla roman.ch...@gmail.com wrote: I have auto commit after 40k RECs/1800secs. But I only tested with manual commit, but I don't see why it should work differently. Roman On 7 Jun 2013 20

Re: Solr large boolean filter

2013-07-02 Thread Roman Chyla
in solr as a content stream? It makes base64 compression not necessary. AFAIK url length is limited somehow, anyway. On Tue, Jul 2, 2013 at 9:32 PM, Roman Chyla roman.ch...@gmail.com wrote: Wrong link to the parser, should be: https://github.com/romanchyla/montysolr/blob/master/contrib

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
, and it works fine - no contention. Which version of Solr are you using? Perhaps there's been a change in behaviour? Peter On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com wrote: as i discovered, it is not good to use 'native' locktype in this scenario, actually there is a note

Re: Surround query parser not working?

2013-07-03 Thread Roman Chyla
Hi Niran, all, Please look at JIRA LUCENE-5014. There you will find a Lucene parser that does both analysis and span queries, equivalent to combination of lucene+surround, and much more The ticket needs your review. Roman

Re: What are the options for obtaining IDF at interactive speeds?

2013-07-03 Thread Roman Chyla
Hi Kathryn, I wonder if you could index all your terms as separate documents and then construct a new query (2nd pass) q=term:term1 OR term:term2 OR term:term3 and use func to score them *idf(other_field,field(term))* * * the 'term' index cannot be multi-valued, obviously. Other than that, if

Re: Two instances of solr - the same datadir?

2013-07-04 Thread Roman Chyla
, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com wrote: Interesting, we are running 4.0 - and solr will refuse the start (or reload) the core. But from looking at the code I am not seeing it is doing any writing - but I should digg more... Are you sure it needs to do writing

Re: SOLR 4.0 frequent admin problem

2013-07-04 Thread Roman Chyla
Yes :-) see SOLR-118, seems an old issue... On 4 Jul 2013 06:43, David Quarterman da...@corexe.com wrote: Hi, About once a week the admin system comes up with SolrCore Initialization Failures. There's nothing in the logs and SOLR continues to work in the application it's supporting and in

Re: Sending Documents via SolrServer as MapReduce Jobs at Solrj

2013-07-05 Thread Roman Chyla
I don't want to sound negative, but I think it is a valid question to consider - for the lack of information and certain mental rigidity may make it sound bad - first of all, it is probably not for few gigabytes of data and I can imagine that building indexes at the side when data lives is much

Re: What are the options for obtaining IDF at interactive speeds?

2013-07-08 Thread Roman Chyla
:35 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Kathryn, I wonder if you could index all your terms as separate documents and then construct a new query (2nd pass) q=term:term1 OR term:term2 OR term:term3 and use func to score them *idf(other_field,field(term

Re: joins in solr cloud - good or bad idea?

2013-07-08 Thread Roman Chyla
Hello, The joins are not the only idea, you may want to write your own function (ValueSource) that can implement your logic. However, I think you should not throw away the regex idea (as being slow), before trying it out - because it can be faster than the joins. Your problem is that the number

Re: solr way to exclude terms

2013-07-08 Thread Roman Chyla
One of the approaches is to index create a new field based on the stopwords (ie accept only stopwords :)) - ie. if the documents contains them, you index 1 - and use a q=applefq=bad_apple:0 This has many limitations (in terms of flexibility), but it will be superfast roman On Mon, Jul 8, 2013

Re: Solr large boolean filter

2013-07-08 Thread Roman Chyla
the feature described at http://www.elasticsearch.org/blog/terms-filter-lookup/ Would be a cool addition, IMHO. Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Jul 2, 2013 at 1:25 PM, Roman Chyla roman.ch...@gmail.com

Re: Best way to call asynchronously - Custom data import handler

2013-07-09 Thread Roman Chyla
Other than using futures and callables? Runnables ;-) Other than that you will need async request (ie. client). But in case sb else is looking for an easy-recipe for the server-side async: public void handleRequestBody(.) { if (isBusy()) { rsp.add(message, Batch processing is already

Re: amount of values in a multi value field - is denormalization always the best option?

2013-07-10 Thread Roman Chyla
On Wed, Jul 10, 2013 at 5:37 PM, Marcelo Elias Del Valle mvall...@gmail.com wrote: Hello, I have asked a question recently about solr limitations and some about joins. It comes that this question is about both at the same time. I am trying to figure how to denormalize my data so I

Re: Performance of cross join vs block join

2013-07-12 Thread Roman Chyla
Hi Mikhail, I have commented on your blog, but it seems I have done st wrong, as the comment is not there. Would it be possible to share the test setup (script)? I have found out that the crucial thing with joins is the number of 'joins' [hits returned] and it seems that the experiments I have

Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-15 Thread Roman Chyla
On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com wrote: Hello Erick, Join performance is most sensitive to the number of values in the field being joined on. So if you have lots and lots of distinct values in the corpus, join performance will be affected. Yep, we have a

Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-16 Thread Roman Chyla
cool Erick On Mon, Jul 15, 2013 at 6:52 PM, Roman Chyla roman.ch...@gmail.com wrote: On Sun, Jul 14, 2013 at 1:45 PM, Oleg Burlaca oburl...@gmail.com wrote: Hello Erick, Join performance is most sensitive to the number of values in the field being joined on. So

Re: Range query on a substring.

2013-07-16 Thread Roman Chyla
Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1* becomes behind the scenes: lucene (10|11|12|13|14|1abcd) the issue there is that it is a string range, but it is a range query

Re: Range query on a substring.

2013-07-16 Thread Roman Chyla
if it was useful for me. Thanks. Kind regards. On 16 July 2013 20:07, Roman Chyla roman.ch...@gmail.com wrote: Well, I think this is slightly too categorical - a range query on a substring can be thought of as a simple range query. So, for example the following query: lucene 1

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
Hi all, What I find very 'sad' is that Lucene/SOLR contain all the necessary components for handling multi-token synonyms; the Finite State Automaton works perfectly for matching these items; the biggest problem is IMO the old query parser which split things on spaces and doesn't know to be

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
the synonym phrase problem. Yes, progress is being made, but we're not there yet. -- Jack Krupansky -Original Message- From: Roman Chyla Sent: Wednesday, July 17, 2013 9:58 AM To: solr-user@lucene.apache.org Subject: Re: Searching w/explicit Multi-Word Synonym Expansion Hi all, What I

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
-the-shelf, today, no patches required.) -- Jack Krupansky -Original Message- From: Roman Chyla Sent: Wednesday, July 17, 2013 11:44 AM To: solr-user@lucene.apache.org Subject: Re: Searching w/explicit Multi-Word Synonym Expansion OK, let's do a simple test instead of making claims - take

Re: Searching w/explicit Multi-Word Synonym Expansion

2013-07-17 Thread Roman Chyla
Hi Dave, On Wed, Jul 17, 2013 at 2:03 PM, dmarini david.marini...@gmail.com wrote: Roman, As a developer, I understand where you are coming from. My issue is that I specialize in .NET, haven't done java dev in over 10 years. As an organization we're new to solr (coming from endeca) and

Re: ACL implementation: Pseudo-join performance Atomic Updates

2013-07-17 Thread Roman Chyla
a JIRA, there's no harm in it. Best Erick On Tue, Jul 16, 2013 at 1:32 PM, Roman Chyla roman.ch...@gmail.com wrote: Erick, I wasn't sure this issue is important, so I wanted first solicit some feedback. You and Otis expressed interest, and I could create the JIRA

Re: Getting a large number of documents by id

2013-07-18 Thread Roman Chyla
Look at speed of reading the data - likely, it takes long time to assemble a big response, especially if there are many long fields - you may want to try SSD disks, if you have that option. Also, to gain better understanding: Start your solr, start jvisualvm and attach to your running solr. Start

Re: short-circuit OR operator in lucene/solr

2013-07-22 Thread Roman Chyla
Deepak, I think your goal is to gain something in speed, but most likely the function query will be slower than the query without score computation (the filter query) - this stems from the fact how the query is executed, but I may, of course, be wrong. Would you mind sharing measurements you

Re: Performance of cross join vs block join

2013-07-22 Thread Roman Chyla
take some time. But I guess I should measure it. I haven't made notes so now I am having hard time backtracking :) roman It seems to me cross segment join works well. On Mon, Jul 22, 2013 at 3:08 AM, Roman Chyla roman.ch...@gmail.comwrote: ah, in case you know the solution, here ant output

Re: Processing a lot of results in Solr

2013-07-23 Thread Roman Chyla
Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available for streaming (it has its own UUID). I am dumping many GBs of data from solr in few minutes - your query + streaming

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
that streaming writer works? What does it stream docList or docSet? Thanks On Wed, Jul 24, 2013 at 5:57 AM, Roman Chyla roman.ch...@gmail.com wrote: Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes

Re: Processing a lot of results in Solr

2013-07-24 Thread Roman Chyla
acceptable (~ within minutes) ? Thanks, Matt On 7/23/13 6:57 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello Matt, You can consider writing a batch processing handler, which receives a query and instead of sending results back, it writes them into a file which is then available

Re: How to debug an OutOfMemoryError?

2013-07-24 Thread Roman Chyla
_One_ idea would be to configure your java to dump core on the oom error - you can then load the dump into some analyzers, eg. Eclipse, and that may give you the desired answers (I fortunately don't remember that from top of my head how to activate the dump, but google will give your the answer)

Re: Document Similarity Algorithm at Solr/Lucene

2013-07-24 Thread Roman Chyla
This paper contains an excellent algorithm for plagiarism detection, but beware the published version had a mistake in the algorithm - look for corrections - I can't find them now, but I know they have been published (perhaps by one of the co-authors). You could do it with solr, to create an index

Re: Using Solr to search between two Strings without using index

2013-07-25 Thread Roman Chyla
Hi, I think you are pushing it too far - there is no 'string search' without an index. And besides, these things are just better done by a few lines of code - and if your array is too big, then you should create the index... roman On Thu, Jul 25, 2013 at 9:06 AM, Rohit Kumar

Re: processing documents in solr

2013-07-27 Thread Roman Chyla
Dear list, I'vw written a special processor exactly for this kind of operations https://github.com/romanchyla/montysolr/tree/master/contrib/adsabs/src/java/org/apache/solr/handler/batch This is how we use it http://labs.adsabs.harvard.edu/trac/ads-invenio/wiki/SearchEngineBatch It is capable of

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Roman Chyla
Mikhail, If your solution gives lazy loading of solr docs /and thus streaming of huge result lists/ it should be big YES! Roman On 27 Jul 2013 07:55, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Otis, You gave links to 'deep paging' when I asked about response streaming. Let me

Re: paging vs streaming. spawn from (Processing a lot of results in Solr)

2013-07-27 Thread Roman Chyla
/m-khl/solr-patches/compare/streaming#L2R57 hence, no facets with streaming, yet as well as memory consumption. This test shows how it works https://github.com/m-khl/solr-patches/compare/streaming#L15R115 all other code purposed for distributed search. On Sat, Jul 27, 2013 at 4:44 PM, Roman

Re: processing documents in solr

2013-07-27 Thread Roman Chyla
On Sat, Jul 27, 2013 at 4:17 PM, Shawn Heisey s...@elyograg.org wrote: On 7/27/2013 11:38 AM, Joe Zhang wrote: I have a constantly growing index, so not updating the index can't be practical... Going back to the beginning of this thread: when we use the vanilla *:*+pagination approach,

Re: Solr-4663 - Alternatives to use same data dir in different cores for optimal cache performance

2013-07-28 Thread Roman Chyla
Hi, Yes, it can be done, if you search the mailing list for 'two solr instances same datadir', you will a post where i am describing our setup - it works well even with automated deployments how do you measure performance? I am asking before one reason for us having the same setup is sharing the

Measuring SOLR performance

2013-07-30 Thread Roman Chyla
Hello, I have been wanting some tools for measuring performance of SOLR, similar to Mike McCandles' lucene benchmark. so yet another monitor was born, is described here: http://29min.wordpress.com/2013/07/31/measuring-solr-query-performance/ I tested it on the problem of garbage collectors (see

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
solrjmeter.py, line 66, in error traceback.print_stack() Cannot contact: http://localhost:8983/solr complains about URL, clicking which leads properly to the admin page... solr 4.3.1, 2 cores shard Dmitry On Wed, Jul 31, 2013 at 3:59 AM, Roman Chyla roman.ch...@gmail.comwrote: Hello, I have

Re: Measuring SOLR performance

2013-07-31 Thread Roman Chyla
of default G1 as 'bad', and that these G1 parameters, even if they don't seem G1 specific, have real effect. Thanks, roman On Tue, Jul 30, 2013 at 11:01 PM, Shawn Heisey s...@elyograg.org wrote: On 7/30/2013 6:59 PM, Roman Chyla wrote: I have been wanting some tools for measuring performance

  1   2   >