Well, the source code is all there, if you need to know _exactly_. Run it
under Debug. Run it under paid IntelliJ with Chronos if you will be doing
it a lot.
Same with Admin to Solr, just open a developer console in the browser and
you have every web call documented just when you want them.
Roxana,
You've been asked a couple of times by several people to explain your
business needs (level higher than Solr itself). As it is, you are
slowly getting deeper and deeper into Solr's internals, where there
might be an easier question if we know what you are trying to achieve.
It is your
It is a very general question. So, the general answer is yes.
To get a sample of what's possible, I recommend you check out Solr
Revolution presentations from this year and presentations+video from
last year. There were at least a couple that you may find interesting.
Definitely 5.x. Lots of new goodies. It is true that some of the
startup scripts are different and the example schemas could be
slightly confusing if following a book, but I think it is well worth
starting on a good foot. Just remember, no "collection1" anymore, all
cores/collections are explicit.
t me at an example that could get me
> started that would be a great help.
>
> Thanks
>
> Alan.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: 22 October 2015 15:43
> To: solr-user
> Subject: Re: Select sibling data v
Begging at the Dev list is probably more efficient, though I am sure
most of them are hanging around here as well.
Regards,
Alex.
P.s. Sorry, I wish I could help. Not a committer.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 23 October
multiple roots (start tag in epilog?).
>
> Looks like I need to dig a bit deeper
>
> Regards,
> Alan.
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: 23 October 2015 12:00
> To: solr-user
> Subject: Re: Select siblin
com> wrote:
> Hi Alex,
>
> What's the title of your book? An amazon link would be useful too.
>
> Thanks!
> Rob
>
> On Fri, Oct 23, 2015 at 2:50 PM, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Definitely 5.x. Lots of new goodies. It is true t
You need to tell the second call which documents to update. Are you doing
that?
There may also be a wrinkle in the URP order, but let's get the first step
working first.
On 22 Oct 2015 12:59 pm, "Roxana Danger"
wrote:
> yes, it's working now... but I can not use
I don't think DIH supports siblings. Have you thought of using XSLT
processor before sending XML to Solr. Or using it instead of DIH
during the update (not a well know part of Solr):
You are doing things out of order. It's DIH, URP, then indexer. Any
attempt to subvert that order for the record being indexed will end in
problems.
Have you considered doing a dual path? Index, then update. Of course,
your fields all need to be stored for that.
Also, perhaps you need to rethink
When you run a full-import, Solr will try to delete old documents
before importing the new ones. If there is several top-level entities,
they step on each other foot.
Use preImportDeleteQuery to avoid that (as per
olr/reed_jobs/update/details?commit=true
> but it returns immediately with status 0 but does not execute the update...
> How should the update be called for reindex/update all the imported docs.
> with my chain?
>
>
> Best regards,
> Roxana
>
>
> On 22 October 2015 at 14:14, A
On 20 October 2015 at 10:26, Lee Carroll wrote:
> B*ll*cks, before posting I spent an hour searching for issues, honest.
> Soon as I post within seconds I find
>
> https://issues.apache.org/jira/browse/SOLR-5800
We are always glad to be of help. Including by
Sounds like a mission impossible given the number of inner joins.
However, what are you _actually_ trying to do? Are you trying to
reindex the data? Do you actually have the data to reindex?
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
This sounds like an attempt to create an auto-complete using n-grams
in text. In which case, Ted Sullivan's writing might be of relevance:
http://lucidworks.com/blog/author/tedsullivan/
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
It went very well. Lots of interesting talks.
I believe you (Mr. Bell) were even mentioned by Ted Sullivan for
voting for his Jira proposal on the AutophrasingFilter. The talk was
extremely interesting and I intend to follow up on it. :-)
The slides are starting to come up already. Mine are at:
I suspect these questions should go the Lucene Dev list instead. This
one is more for those who build on top of standard Solr.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 16 October 2015 at 12:07, Ryan Josal
Could you use the new nested facets syntax? http://yonik.com/solr-subfacets/
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 11 October 2015 at 09:51, Peter Sturge wrote:
> Been trying to coerce Group
What about Streaming Expressions? Could they be used here? Disclaimer:
I have not used them myself yet.
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On
I don't think that particular functionality is anything directly to do
with Solr?
You will have server component that will index web page (I am
guessing) into Solr. That same component can generate preview image.
Your frontend UI will get the URL/id from Solr and display the related
image.
Solr
Hi Mark,
Have you gone through a Solr tutorial yet? If/when you do, you will
see you don't need to code any of this. It is configured as part of
the web-facing total offering which are tweaked by XML configuration
files (or REST API calls). And most of the standard pipelines are
already
>
> At this point, I can't even figure out how to narrow down my confusion so
> that I can post concise questions to the group. But I'll get there
> eventually, starting with removing the wordbreak checker for the time-being.
> Your response was encouraging, at least.
>
> Mark
Have you tried just having two separate endpoints each with its own
definition of DIH and URP? Then, you just hit those end-points one at
a time in whatever order you need.
Seems easier than a custom switching logic.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a
file?
> Thank you very much,
> Roxana
>
>
>
> On 30 September 2015 at 14:48, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Have you tried just having two separate endpoints each with its own
>> definition of DIH and URP? Then, you just hit those end-
Mark,
Thank you for your valuable feedback. The newbie's views are always appreciated.
Admin Admin UI command is designed for creating a collection based on
the configuration you already have. Obviously, it makes that point
somewhat less than obvious.
To create a new collection with
I think (I lost the library link) you would need to build a bridge by
doing a custom Analyzer or Tokenizer and then using the library under
the covers. Would be a nice contribution to open-source if you managed
to achieve that.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and
How about you do indexing on a completely different node and then swap
the index into production using Solr aggregate aliases?
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection
The problem here is that deleting existing content is
But they would still compete for the servlet engine's threads. Putting
them on different ports will not change anything. Now, if you wanted
to put them on different network interfaces, that could be something.
But I do not think it is possible, as the select and update are both
just configuration
You may find the following articles interesting:
http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html
( a whole epic journey)
https://dzone.com/articles/indexing-chinese-solr
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a
Sanity check. Did you restart Solr or reloaded the core after you
updated your schema definition? In the Admin UI, in the Schema
Browser, you should be able to see all the fields you defined. Are
those fields there?
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a
You could probably do this as a RequestUpdateProcessor (a custom one)
that would take your submitted document, run a query and expand it to
a bunch of documents. So, do the ID mapping internally. But you would
need the ID/uniqueKeys.
Definitely nothing out of the box, that I can think of.
ces of those terms for any leaflet
> document.
> Could you give me a clue about how is the best way to perform it?
> Perhaps, the best way is (as Walter suggests) to do all the queries every
> time, as needed.
> Regards,
>
> Francisco
>
> El jue., 10 de sept. de 2015 a la(s)
Can you tell us a bit more about the business case? Not the current
technical one. Because it is entirely possible Solr can solve the
higher level problem out of the box without you doing manual term
comparisons.In which case, your problem scope is not quite right.
Regards,
Alex.
Solr
A sanity check question. Was this test done with a completely new
index after you enabled docvalues? Not just "delete all" but actually
deleted index directory and rebuilt from scratch? If it still happens
after such a thorough cleanup, it might be a bug.
Regards,
Alex.
Solr Analyzers,
Could you make a small index from scratch using a subset of data and
see if the problem happens anyway? If yes, you have a test case. If
no, you may need to do a full rebuild to be fully assured.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
Both version seem to be painful in that they will retrieve the URL content
multiple times. The first version is definitely wrong. The second version
is probably wrong because both inner and outer entities are having the same
name. I would try giving different name to the inner entity and seeing if
What about DIH's own XSL pre-processor? It is XSL param on
https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-TheXPathEntityProcessor
No other ideas, unfortunately, I don't
You can define any number of the handler end-point definitions.
Also, you can pass the update chain name as part of the URL
parameters. So, it could be different for each call if you want.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
Yes please.:
http://www.amazon.com/Solr-Troubleshooting-Maintenance-Alexandre-Rafalovitch/dp/1491920149/
:-)
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 4 September 2015 at 10:30, Yonik Seeley <ysee...@gmail.
Yonik,
Is this all visible on query debug level? Would it be effective to ask
to run both queries with debug enabled and to share the expanded query
value? Would that show up the differences between Lucene
implementations you described?
(Looking for troubleshooting tips to reuse).
Regards,
So, basically for each car, you want to generate a query with the same
parameter (e.g. make) and then say where in the results for that
query, your particular car would be. Right?
I think the only way is to run the query and to see where the car is
in the result. So, a custom code of some sort.
That's a good point. What is the query sorting on?
Shayan, can you give an example of a query with sorting/etc shown.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 3 September 2015 at 16:24, Chris Hostetter
Put the IgnoreCommit on the default handler to stop clients from
forcing the commit:
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/IgnoreCommitOptimizeUpdateProcessorFactory.html
Then have a separate normal handler and send your real commits through
that if you
FQ has to calculate the result bit set for every document to be able
to cache it. Q will only calculate it for the documents it matches on
and there is some intersection hopping going on.
Are you seeing this performance hit on first query only or or every
one? I would expect on first query only
ntly indexed as
> fieldType=text_general.
>
>
>
> true
> content
> false
> content
> solr.processor.Lookup3Signature
>
>
>
>
>
> Regards,
> Edwin
>
>
> On 3 September 2015 at 09:46, Alexandre Rafalovitch <arafa...@gmail.com>
And that's because you have an incomplete chain. If you look at the
full example in solrconfig.xml, it shows:
true
id
false
name,features,cat
solr.processor.Lookup3Signature
Notice, the last two processors.
Have you looked at Admin Web UI in details yet? When you look at the
"Overview" page, on the right hand side, it lists a bunch of
directories. You want one that says "Instance". Then, your
solrconfig.xml is in "conf" directory under that.
Regards,
Alex.
P.s. Welcome!
Solr Analyzers,
http://blog.griddynamics.com/2015/08/scoring-join-party-in-solr-53.html
shows how to keep updates in a separate core. Notice that it is an
intermediate-level article for query syntax.
For persian text analysis, there is a pre-built analyser defiition in
the techproducts example, start from that.
On 1 September 2015 at 08:29, Mikhail Khludnev
wrote:
> Last check to check, make sure that you don't have deleted document in the
> index for a while. You can check in at SolrAdmin.
What's the significance of that particular advice? Is something in the
join including
gave
>> > >> but put results into a separate string field. Then, you group on that
>> > >> field. You cannot actually group on the long text field, that would
>> > >> kill any performance. So a signature is your proxy.
>> > >>
>>
ch.
> and i want some filter for persian.
> that pre-built text_fa doesn't satisfied me.have you better perisan filter
> than that?or a soulotion to have this filter in persian?
> tnx.
>
> On Tue, Sep 1, 2015 at 5:21 AM, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
&
On 1 September 2015 at 09:10, Mikhail Khludnev
wrote:
>> Not many
>> people know about it, may help to disambiguate the syntax.
>>
> Oh. C'mon! it's announced for ages http://yonik.com/solr/query-syntax/
Not everybody reads and keeps track of every feature of Solr.
Is this for multi-datacenter? If so, you may want to review Apple's
presentation at the last Solr Revolution:
https://www.youtube.com/watch?v=_Erkln5WWLw=2=PLU6n9Voqu_1FM8nmVwiWWDRtsEjlPqhgP
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
search or other functions like
> highlighting?
>
> Yes, the content must be in my index, unless I do a copyField to do
> de-duplication on that field.. Will that help?
>
> Regards,
> Edwin
>
>
> On 1 September 2015 at 10:04, Alexandre Rafalovitch <arafa...@gmail.com&
Can't you just treat it as String?
Also, do you actually want those documents in your index in the first
place? If not, have you looked at De-duplication:
https://cwiki.apache.org/confluence/display/solr/De-Duplication
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a
If you use DataImportHandler, you can combine LineEntityProcessor with
RegexTransformer to split each line into a bunch of fields:
Erik's version might be better with tabs though to avoid CSV's
requirements on escaping comas, quotes, etc. And maybe trim those
fields a bit either in awk or in URP inside Solr.
But it would definitely work.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
This is both very specific and very general question at the same time.
The way indexing and search are both done is via analyzer chains, as
defined in your schema. So, you need to check what the definition is
for the field you search and then play with that.
There is Analysis screen in the Web
Have you seen:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201212.mbox/%3c1354991310424-4025359.p...@n3.nabble.com%3E
https://wiki.apache.org/solr/SpatialForTimeDurations
https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
Regards,
Alex.
Solr
The standard answer is that exposing the API is a REALLY bad idea. To
start from, you can issue the delete commands through the API. And
they can be escaped in multiple different ways.
Plus, you have admin UI there as well to manipulate the cores as well
as to see the configuration files for
Thanks for the email from the future. It is good to start to prepare
for 5.3.1 now that 5.3 is nearly out.
Joking aside (and assuming Solr 5.2.1), what exactly are you trying to
achieve? Solr should not actually be exposed to the users directly. It
should be hiding in a backend only visible to
These look like requirements for a generic Solr search, maybe with
focus on proximity and/or phrase matching. Perhaps some white-listing
filter if you have a fixed set of words you care about. E.g. with
KeepWordFilter in the analyzer chain.
It should work (at first glance). copyField does support wildcards.
Do you have a field called text? Also, your field name and field
type text have the same name. Not sure it is the best idea.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
Are you by any chance doing store=true on the fields you want to search?
If so, you may want to switch to just index=true. Of course, they will
then not come back in the results, but do you really want to sling
huge content fields around.
The other option is to do lazyLoading=true and not
Sorry Venkat, this is pushing beyond my immediate knowledge. You'd
just need to experiment.
But the document still looks a bit wrong, specifically I don't
understand where those extra 366 values are coming from. It should
be just a two-dimensional coordinates, first one for start of the
range,
I can't find the discussion/presentation about it (about 2 years ago),
but basically you can use LatLong geographic field to do this.
You represent start date/time on X axis and end date/time on Y axes.
Then, for search you intersect it with a rectangle of your desired
check dates.
Hopefully
If you can use + and -, please do so. That's what Lucene uses
under the covers (MUST, SHOULD, MUST NOT). Anything else is mapping to
that.
You can also enable the debug flag on your queries and see exactly how
the other forms (e.g. AND) are mapped to the underlying Lucene
queries.
Regards,
On 21 August 2015 at 15:32, vaedama sudheer.u...@gmail.com wrote:
presentDays: [ [01 15 366 366], [13, 16, 366, 366], [19, 25, 366, 366] ]
This does not look right. Your January 1 2015 should map to a single
number, representing 'X' in the coordinates. Your January 15 2015
should map to another
A transformer on the outer entity will run before the inner entity is
invoked. So, you might be able to remove the list of files to ignore
before the inner entity starts extracting from them.
You could also pre-generate a list of files by doing ls/find with your
requirements and then just read
These look right.
Then, you just play around with mapping. Your dates to coordinates
could be as granular as you want as long as they fit into data type.
And with this being school, your epochs might be smaller (e.g.
semesters) and kept as a separate number.
Regards,
Alex.
Solr
If this is for a quick test, have you tried just faceting on that
field with document ID set through query? Facet returns the
indexed/tokenized items.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 20 August 2015 at
Reload will get the new schema definitions. But all the indexed
content will stay as is and will probably start causing problems if
you changed analyzer definitions seriously.
You probably will have to reindex from scratch/external source.
Sorry.
Solr Analyzers, Tokenizers, Filters, URPs
I am not sure I understand the problem statement. Is it speed? Memory
usage? Something very specific about SolrCloud?
To me it seems the problem is that your 'fq' _are_ getting cached when
you may not want them as the list is different every time. You could
disable that cache.
Or you could try
Have you tried this with Cache=false?
https://wiki.apache.org/solr/CommonQueryParameters#Caching_of_filters
Because the internal representation of the field value already may be
doing what you want. And the caching of non-repeating filters is what
slowing it down.
I would just do that as a
This is beyond my direct area of expertise, but one way to look at
this would be:
1) Create new collections offline. Down to each of the 6000 clients
having its own private collection (embedded SolrJ/server). Or some
sort of mini-hubs, e.g. a server per N clients.
2) Bring those collections into
on values in other fields. And then just order by
it.
Is that right?
On Fri, Aug 14, 2015 at 10:58 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
Clarification: In the client that is doing the _indexing_/sending data
to Solr. Not the one doing the querying.
And custom URP if you can't change
From the teaching to fish category of advice (since I don't know the
actual answer).
Did you try Analysis screen in the Admin UI? If you check Verbose
output mark, you will see all the offsets and can easily confirm the
detailed behavior for yourself.
Regards,
Alex.
Solr Analyzers,
I would not be surprised if default value is assigned AFTER all the
copy field is done. That would make a lot more sense.
So, you may want to try setting that default value earlier in the
indexing process. Specifically, by creating a custom
UpdateRequestProcessor chain and using DefaultValue URP:
What's the search string? Or is the search string irrelevant and
that's just your compulsory ordering.
Assuming anything that searches has to be returned and has to fit into
that order, I would frankly just map your special codes all together
to some sort of 'sort order' number.
So, Code=C =
what you are saying about mapping Code to numbers. But can you
help with some examples of actual solr queries on how to do this?
Thanks
On Fri, Aug 14, 2015 at 2:46 PM, Alexandre Rafalovitch arafa...@gmail.com
wrote:
What's the search string? Or is the search string irrelevant and
that's
, URPs and even a newsletter:
http://www.solr-start.com/
On 14 August 2015 at 23:57, Alexandre Rafalovitch arafa...@gmail.com wrote:
My suggestion was to do the mapping in the client, before you hit
Solr. Or in a custom UpdateRequestProcessor. Because only your client
app knows the order you
On 13 August 2015 at 12:19, Scott Derrick sc...@tnstaafl.net wrote:
If i specify a search q=foo bar , Is there a way to set a default field if
a field is not given?
You want 'df' parameter, unless I misunderstood the question? Íf you are
using default query parser (e.g. not eDisMax, etc),
Correct. In fact, faceting pulls its values normally from the indexed
terms anyway. It completely ignores stored.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 13 August 2015 at 19:49, Nagasharath sharathrayap...@gmail.com wrote:
If I just
Did you look at release notes for Solr versions after your own?
I am pretty sure some similar things were identified and/or resolved
for 5.x. It may not help if you cannot migrate, but would at least
give a confirmation and maybe workaround on what you are facing.
Regards,
Alex.
Solr
Setup new core instance directory:
/var/solr/data/demo
...
Failed to create core 'demo' due to: Error CREATEing SolrCore 'demo': Unable
to create core [demo] Caused by: /var/solr/data/demo/data
Was one of these entries typed by hand? Because I see 'data/demo' and
'demo/data'. Which does not
(shooting in the dark)
What does your data directory looks like? File sizes, etc. And which
Operating System. 4Gb is when Windows FAT filesystem has a size
limit, but it really should not be that.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
I thought the Embedded server was good for a scenario where you wanted
quickly to build a core with lots of documents locally. And then, move
the core into production and swap it in. So you minimize the network
traffic.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a
I wonder if that's also something that could be resolved by having a
custom Network level handler, on a pure Java level.
I see to vaguely recall it was possible.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 5 August 2015
What do you get at just http://localhost:8080/ ?
My guess would be that you may have already had something else on that
port and your Solr instance did not actually start.
If in doubt, I would test that by bringing your Solr instance down and
trying to revisit the URL. You should get a generic
Did you re-index and commit completely after the definition switch?
Looks like internal representation conflict.
Regards,
Alex.
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/
On 4 August 2015 at 11:31, wwang525 wwang...@gmail.com wrote:
Hi
Are you watching lucene-dev list? The discussion is happening there.
In short, the preparations have started, but there are things to
cleanup and no RCs have been out yet. I don't think even a branch has
been cut yet.
So, a while to go still.
Solr Analyzers, Tokenizers, Filters, URPs and
That's still a VERY open question. The answer is Yes, but the details
depend on the shape and source of your data. And the search you are
anticipating.
Is this a lot of entries with small number of fields. Or a -
relatively - small number of entries with huge field counts. Do you
need to
Seems simple enough that the source answers all the questions:
https://github.com/apache/lucene-solr/blob/lucene_solr_4_9/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishPossessiveFilter.java#L66
It just looks for a couple of versions of apostrophe followed by s or S.
Just to reconfirm, are you indexing file content? Because if you are,
you need to be aware most of the PDF do not extract well, as they do
not have text flow preserved.
If you are indexing PDF files, I would run a sample through Tika
directly (that's what Solr uses under the covers anyway) and
Well,
If it is just file names, I'd probably use SolrJ client, maybe with
Java 8. Read file names, split the name into parts with regular
expressions, stuff parts into different field names and send to Solr.
Java 8 has FileSystem walkers, etc to make it easier.
You could do it with DIH, but it
They are not that predictable. Somebody has to volunteer to be a
release manager and then there is a flurry of cleanups, release
candidates, etc.
You can see all that on the Lucene-Dev mailing list. For example, a
5.3 has been proposed (as an idea) on July 30th. But not much happened
since. But
Have you tried copyField with different field type for different
fields yet? That would be my first step. Make the copied field
indexed-only, not stored for efficiency.
And you can then either search against that copied field directly or
use eDisMax against both fields and give that field a
So, what you want is to duplicate a specific token, rename one of the
copies, and inject it with the same offset as the original. So GATE =
gate, _gate but gate=gate.
That, to me, is a custom token filter. You can probably use
KeywordRepeatFilterFactory as a base:
Thank you for the update.
The MSWord format changed significantly from .doc to .docx so has a
different parser I suspect. I would not be surprised if old
binary-format parser would miss something exotic in the documents
(e.g. content of text boxes or frames).
Regards,
Alex.
Solr
801 - 900 of 1977 matches
Mail list logo