Hi,
I'm having an issue with the WDF preserveOriginal=1 setting and the
matching of a phrase query. Here's an example of the text that is
being indexed:
...obtained with the Southern African Large Telescope,SALT...
A lot of our text is extracted from PDFs, so this kind of formatting
junk is
to not be a problem in 4.x.
Thanks,
--jay
On Tue, Oct 23, 2012 at 10:45 AM, Shawn Heisey s...@elyograg.org wrote:
On 10/23/2012 8:16 AM, Jay Luker wrote:
From looking at the analysis debugger I can see that the WDF is
getting the term Telescope,SALT and correctly splitting on the
comma
On Wed, Dec 14, 2011 at 5:02 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
I'm a little lost in this thread ... if you are programaticly construction
a NumericRangeQuery object to execute in the JVM against a Solr index,
that suggests you are writting some sort of SOlr plugin (or
I can't get NumericRangeQuery or TermQuery to work on my integer id
field. I feel like I must be missing something obvious.
I have a test index that has only two documents, id:9076628 and
id:8003001. The id field is defined like so:
field name=id type=tint indexed=true stored=true required=true
On Wed, Dec 14, 2011 at 2:04 PM, Erick Erickson erickerick...@gmail.com wrote:
Hmmm, seems like it should work, but there are two things you might try:
1 just execute the query in Solr. id:1 TO 100]. Does that work?
Yep, that works fine.
2 I'm really grasping at straws here, but it's
On Sat, Dec 10, 2011 at 9:25 PM, Erick Erickson erickerick...@gmail.com wrote:
My off-the-top-of-my-head notion is you implement a
Filter whose job is to emit some special tokens when
you find strings like this that allow you to search without
regexes. For instance, in the example you give,
appreciated.
Thanks!
--jay
In other words, this could be an XY problem
Best
Erick
On Thu, Dec 8, 2011 at 11:14 AM, Robert Muir rcm...@gmail.com wrote:
On Thu, Dec 8, 2011 at 11:01 AM, Jay Luker lb...@reallywow.com wrote:
Hi,
I am trying to provide a means to search our corpus
Hi,
I am trying to provide a means to search our corpus of nearly 2
million fulltext astronomy and physics articles using regular
expressions. A small percentage of our users need to be able to
locate, for example, certain types of identifiers that are present
within the fulltext (grant numbers,
On Tue, Nov 29, 2011 at 9:37 AM, Michael Kuhlmann k...@solarier.de wrote:
Jay,
I think the problem is this:
You're checking whether the character preceding the array of at least one
whitespace is not a hyphen.
However, when you've more than one whitespace, like this:
foo- \n bar
then
I am having a similar issue with OffsetExceptions during highlighting.
In all of the explanations and bug reports I'm reading there is a
mention this is all the result of a problem with HTMLStripCharFilter.
But my analysis chains don't (that I'm aware of) make use of
HTMLStripCharFilter, so can
Hi all,
I'm trying to use PatternTokenizer and not getting expected results.
Not sure where the failure lies. What I'm trying to do is split my
input on whitespace except in cases where the whitespace is preceded
by a hyphen character. So to do this I'm using a negative look behind
assertion in
. It does not
seem external file field is the use case for this.
On 10 June 2011 20:13, Jay Luker lb...@reallywow.com wrote:
Take a look at ExternalFileField [1]. It's meant for exactly what you
want to do here.
FYI, there is an issue with caching of the external values introduced
in v1.4
Take a look at ExternalFileField [1]. It's meant for exactly what you
want to do here.
FYI, there is an issue with caching of the external values introduced
in v1.4 but, thankfully, resolved in v3.2 [2]
--jay
[1]
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
On Wed, May 11, 2011 at 7:07 AM, javaxmlsoapdev vika...@yahoo.com wrote:
I have some 25 odd fields with stored=true in schema.xml. Retrieving back
5,000 records back takes a few secs. I also tried passing fl and only
include one field in the response but still response time is same. What are
Hi Emyr,
You could try using the extractOnly=true parameter [1]. Of course,
you'll need to repost the extracted text manually.
--jay
[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only
On Thu, May 5, 2011 at 9:36 AM, Emyr James emyr.ja...@sussex.ac.uk wrote:
Hi All,
I
Hi all,
I'm wondering if there are any knobs or levers i can set in
solrconfig.xml that affect how pdfbox text extraction is performed by
the extraction handler. I would like to take advantage of pdfbox's
ability to normalize diacritics and ligatures [1], but that doesn't
seem to be the default
so by simply removing the OpenCalaisAnnotator from
the execution pipeline commenting the line 124 of the file:
solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml
Hope this helps,
Tommaso
2011/4/7 Jay Luker lb...@reallywow.com
Hi,
I'd would like
Hi,
I'd would like to experiment with the UIMA contrib package, but I have
issues with the OpenCalais service's ToS and would rather not use it.
Is there a way to adapt the UIMA example setup to use only the
AlchemyAPI service? I tried simply leaving out the OpenCalais api key
but i get
=foobarfq={!q.op=OR}(id:1 id:5 id:11)
Regards
Stefan
On Thu, Mar 31, 2011 at 6:40 PM, Jay Luker lb...@reallywow.com wrote:
Hi all,
I'm trying to get highlight snippets for a set of known documents and
I must being doing something wrong because it's only sort of working.
Say my query is foobar
Hi,
I'm trying to use a CustomSimilarityFactory and pass in per-field
options from the schema.xml, like so:
similarity class=org.ads.solr.CustomSimilarityFactory
lst name=field_a
int name=min500/int
int name=max1/int
float name=steepness0.5/float
/lst
lst
On Mon, Jan 31, 2011 at 9:22 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
that class should probably have been named ContentStreamUpdateHandlerBase
or something like that -- it tries to encapsulate the logic that most
RequestHandlers using COntentStreams (for updating) need to worry
Hi all,
Here is what I am interested in doing: I would like to send a
compressed integer bitset as a query to solr. The bitset integers
represent my document ids and the results I want to get back is the
facet data for those documents.
I have successfully created a QueryComponent class that,
On Sun, Nov 14, 2010 at 12:49 AM, Kiwi de coder kiwio...@gmail.com wrote:
try to put u filter on top of web.xml (instead of middle or bottom), i try
this few day and it just only a simple solution (not sure is a spec to put
on top or is a bug)
Thank you.
An explanation of why this worked is
Hi,
I thought I'd try turning on gzip compression but I can't seem to get
jetty's GzipFilter to actually compress my responses. I unpacked the
example solr.war and tried adding variations of the following to the
web.xml (and then rejar-ed), but as far as I can tell, jetty isn't
actually
On Thu, Oct 28, 2010 at 7:27 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
The queryResultCache is keyed on Query,Sort,Start,Rows,Filters and the
value is a DocList object ...
http://lucene.apache.org/solr/api/org/apache/solr/search/DocList.html
Unlike the Document objects in the
On Wed, Oct 27, 2010 at 9:13 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: schema.) My evidence for this is the documentCache stats reported by
: solr/admin. If I request rows=10fl=id followed by
: rows=10fl=id,title I would expect to see the 2nd request result in
: a 2nd insert to
Hi all,
The solr wiki says this about the documentCache: The more fields you
store in your documents, the higher the memory usage of this cache
will be.
OK, but if i have enableLazyFieldLoading set to true and in my request
parameters specify fl=id, then the number of fields per document
On Wednesday 27 October 2010 16:39:44 Jay Luker wrote:
Hi all,
The solr wiki says this about the documentCache: The more fields you
store in your documents, the higher the memory usage of this cache
will be.
OK, but if i have enableLazyFieldLoading set to true and in my request
parameters
For the sake of any future googlers I'll report my own clueless but
thankfully brief struggle with autocommit.
There are two parts to the story: Part One is where I realize my
autoCommit config was not contained within my updateHandler. In
Part Two I realized I had typed autocommit rather than
29 matches
Mail list logo