WordDelimiterFilter preserveOriginal position increment

2012-10-23 Thread Jay Luker
Hi, I'm having an issue with the WDF preserveOriginal=1 setting and the matching of a phrase query. Here's an example of the text that is being indexed: ...obtained with the Southern African Large Telescope,SALT... A lot of our text is extracted from PDFs, so this kind of formatting junk is

Re: WordDelimiterFilter preserveOriginal position increment

2012-10-23 Thread Jay Luker
to not be a problem in 4.x. Thanks, --jay On Tue, Oct 23, 2012 at 10:45 AM, Shawn Heisey s...@elyograg.org wrote: On 10/23/2012 8:16 AM, Jay Luker wrote: From looking at the analysis debugger I can see that the WDF is getting the term Telescope,SALT and correctly splitting on the comma

Re: NumericRangeQuery: what am I doing wrong?

2011-12-15 Thread Jay Luker
On Wed, Dec 14, 2011 at 5:02 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I'm a little lost in this thread ... if you are programaticly construction a NumericRangeQuery object to execute in the JVM against a Solr index, that suggests you are writting some sort of SOlr plugin (or

NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Jay Luker
I can't get NumericRangeQuery or TermQuery to work on my integer id field. I feel like I must be missing something obvious. I have a test index that has only two documents, id:9076628 and id:8003001. The id field is defined like so: field name=id type=tint indexed=true stored=true required=true

Re: NumericRangeQuery: what am I doing wrong?

2011-12-14 Thread Jay Luker
On Wed, Dec 14, 2011 at 2:04 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, seems like it should work, but there are two things you might try: 1 just execute the query in Solr. id:1 TO 100]. Does that work? Yep, that works fine. 2 I'm really grasping at straws here, but it's

Re: RegexQuery performance

2011-12-12 Thread Jay Luker
On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson erickerick...@gmail.com wrote: My off-the-top-of-my-head notion is you implement a Filter whose job is to emit some special tokens when you find strings like this that allow you to search without regexes. For instance, in the example you give,

Re: RegexQuery performance

2011-12-10 Thread Jay Luker
appreciated. Thanks! --jay In other words, this could be an XY problem Best Erick On Thu, Dec 8, 2011 at 11:14 AM, Robert Muir rcm...@gmail.com wrote: On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker lb...@reallywow.com wrote: Hi, I am trying to provide a means to search our corpus

RegexQuery performance

2011-12-08 Thread Jay Luker
Hi, I am trying to provide a means to search our corpus of nearly 2 million fulltext astronomy and physics articles using regular expressions. A small percentage of our users need to be able to locate, for example, certain types of identifiers that are present within the fulltext (grant numbers,

Re: PatternTokenizer failure

2011-11-30 Thread Jay Luker
On Tue, Nov 29, 2011 at 9:37 AM, Michael Kuhlmann k...@solarier.de wrote: Jay, I think the problem is this: You're checking whether the character preceding the array of at least one whitespace is not a hyphen. However, when you've more than one whitespace, like this: foo- \n bar then

Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-30 Thread Jay Luker
I am having a similar issue with OffsetExceptions during highlighting. In all of the explanations and bug reports I'm reading there is a mention this is all the result of a problem with HTMLStripCharFilter. But my analysis chains don't (that I'm aware of) make use of HTMLStripCharFilter, so can

PatternTokenizer failure

2011-11-28 Thread Jay Luker
Hi all, I'm trying to use PatternTokenizer and not getting expected results. Not sure where the failure lies. What I'm trying to do is split my input on whitespace except in cases where the whitespace is preceded by a hyphen character. So to do this I'm using a negative look behind assertion in

Re: Document has fields with different update frequencies: how best to model

2011-06-11 Thread Jay Luker
. It does not seem external file field is the use case for this. On 10 June 2011 20:13, Jay Luker lb...@reallywow.com wrote: Take a look at ExternalFileField [1]. It's meant for exactly what you want to do here. FYI, there is an issue with caching of the external values introduced in v1.4

Re: Document has fields with different update frequencies: how best to model

2011-06-10 Thread Jay Luker
Take a look at ExternalFileField [1]. It's meant for exactly what you want to do here. FYI, there is an issue with caching of the external values introduced in v1.4 but, thankfully, resolved in v3.2 [2] --jay [1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

Re: Solr performance

2011-05-11 Thread Jay Luker
On Wed, May 11, 2011 at 7:07 AM, javaxmlsoapdev vika...@yahoo.com wrote: I have some 25 odd fields with stored=true in schema.xml. Retrieving back 5,000 records back takes a few secs. I also tried passing fl and only include one field in the response but still response time is same. What are

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Jay Luker
Hi Emyr, You could try using the extractOnly=true parameter [1]. Of course, you'll need to repost the extracted text manually. --jay [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only On Thu, May 5, 2011 at 9:36 AM, Emyr James emyr.ja...@sussex.ac.uk wrote: Hi All, I

tika/pdfbox knobs levers

2011-04-13 Thread Jay Luker
Hi all, I'm wondering if there are any knobs or levers i can set in solrconfig.xml that affect how pdfbox text extraction is performed by the extraction handler. I would like to take advantage of pdfbox's ability to normalize diacritics and ligatures [1], but that doesn't seem to be the default

Re: UIMA example setup w/o OpenCalais

2011-04-08 Thread Jay Luker
so by simply removing the OpenCalaisAnnotator from the execution pipeline commenting the line 124 of the file: solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml Hope this helps, Tommaso 2011/4/7 Jay Luker lb...@reallywow.com Hi, I'd would like

UIMA example setup w/o OpenCalais

2011-04-07 Thread Jay Luker
Hi, I'd would like to experiment with the UIMA contrib package, but I have issues with the OpenCalais service's ToS and would rather not use it. Is there a way to adapt the UIMA example setup to use only the AlchemyAPI service? I tried simply leaving out the OpenCalais api key but i get

Re: Highlight snippets for a set of known documents

2011-04-01 Thread Jay Luker
=foobarfq={!q.op=OR}(id:1 id:5 id:11) Regards Stefan On Thu, Mar 31, 2011 at 6:40 PM, Jay Luker lb...@reallywow.com wrote: Hi all, I'm trying to get highlight snippets for a set of known documents and I must being doing something wrong because it's only sort of working. Say my query is foobar

Help with parsing configuration using SolrParams/NamedList

2011-02-16 Thread Jay Luker
Hi, I'm trying to use a CustomSimilarityFactory and pass in per-field options from the schema.xml, like so: similarity class=org.ads.solr.CustomSimilarityFactory lst name=field_a int name=min500/int int name=max1/int float name=steepness0.5/float /lst lst

Re: Sending binary data as part of a query

2011-02-01 Thread Jay Luker
On Mon, Jan 31, 2011 at 9:22 PM, Chris Hostetter hossman_luc...@fucit.org wrote: that class should probably have been named ContentStreamUpdateHandlerBase or something like that -- it tries to encapsulate the logic that most RequestHandlers using COntentStreams (for updating) need to worry

Sending binary data as part of a query

2011-01-28 Thread Jay Luker
Hi all, Here is what I am interested in doing: I would like to send a compressed integer bitset as a query to solr. The bitset integers represent my document ids and the results I want to get back is the facet data for those documents. I have successfully created a QueryComponent class that,

Re: Using jetty's GzipFilter in the example solr.war

2010-11-15 Thread Jay Luker
On Sun, Nov 14, 2010 at 12:49 AM, Kiwi de coder kiwio...@gmail.com wrote: try to put u filter on top of web.xml (instead of middle or bottom), i try this few day and it just only a simple solution (not sure is a spec to put on top or is a bug) Thank you. An explanation of why this worked is

Using jetty's GzipFilter in the example solr.war

2010-11-13 Thread Jay Luker
Hi, I thought I'd try turning on gzip compression but I can't seem to get jetty's GzipFilter to actually compress my responses. I unpacked the example solr.war and tried adding variations of the following to the web.xml (and then rejar-ed), but as far as I can tell, jetty isn't actually

Re: documentCache clarification

2010-10-29 Thread Jay Luker
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter hossman_luc...@fucit.org wrote: The queryResultCache is keyed on Query,Sort,Start,Rows,Filters and the value is a DocList object ... http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html Unlike the Document objects in the

Re: documentCache clarification

2010-10-28 Thread Jay Luker
On Wed, Oct 27, 2010 at 9:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : schema.) My evidence for this is the documentCache stats reported by : solr/admin. If I request rows=10fl=id followed by : rows=10fl=id,title I would expect to see the 2nd request result in : a 2nd insert to

documentCache clarification

2010-10-27 Thread Jay Luker
Hi all, The solr wiki says this about the documentCache: The more fields you store in your documents, the higher the memory usage of this cache will be. OK, but if i have enableLazyFieldLoading set to true and in my request parameters specify fl=id, then the number of fields per document

Re: documentCache clarification

2010-10-27 Thread Jay Luker
On Wednesday 27 October 2010 16:39:44 Jay Luker wrote: Hi all, The solr wiki says this about the documentCache: The more fields you store in your documents, the higher the memory usage of this cache will be. OK, but if i have enableLazyFieldLoading set to true and in my request parameters

Re: Autocommit not happening

2010-07-23 Thread Jay Luker
For the sake of any future googlers I'll report my own clueless but thankfully brief struggle with autocommit. There are two parts to the story: Part One is where I realize my autoCommit config was not contained within my updateHandler. In Part Two I realized I had typed autocommit rather than