I've got a very difficult project to tackle. I've been tasked with using
schemaless mode to index json files that we receive. The structure of the
json files will always be very different as we're receiving files from
different customers totally unrelated to one another. We are attempting to
build
Bumping this.
I'm seeing the error mentioned earlier in the thread - Unable to download
segment filename completely. Downloaded 0!=size often in my logs. I'm
dealing with a situation where maxDoc count is growing at a faster rate
than numDocs and is now almost twice as large. I'm not optimizing
I'm writing a custom update request handler that will poll a hot
directory for Solr xml files and index anything it finds there. The custom
class implements Runnable, and when the run method is called the loop
starts to do the polling. How can I tell Solr to load this class on startup
to fire off
=com.bestbuy.search.foundation.solr.DynamicIndexerEventListener /
Then in the newSearcher() method I startup up the thread for my polling
UpdateRequestHandler.
This seems to work, but if anyone has a better (or more tested) approach
please let us know.
-Jay
On Mon, Jul 9, 2012 at 2:33 PM, Jay Hill jayallenh
On Fri, Feb 24, 2012 at 3:31 PM, Jay Hill jayallenh...@gmail.com wrote:
I have a situation where I want to show the term counts as is done in the
TermsComponent, but *only* for terms that are *matched* in a query, so I
get something returned like this (pseudo code):
q=title:(golf swing)
doc
I have a situation where I want to show the term counts as is done in the
TermsComponent, but *only* for terms that are *matched* in a query, so I
get something returned like this (pseudo code):
q=title:(golf swing)
doc
title: golf legends show how to improve your golf swing on the golf course
I have a project where we need to search 1B docs and still have results
700ms. The problem is, we are using geofiltering and that is happening *
before* the queries, so we have to geofilter on the 1B docs to restrict our
set of docs first, and then do the query on a name field. But it seems that
I'm on a project where we have 1B docs sharded across 20 servers. We're not
in production yet and we're doing load tests now. We're sending load to hit
100qps per server. As the load increases we're seeing query times
sporadically increasing to 10 seconds, 20 seconds, etc. at times. What
we're
We're on the trunk:
4.0-2011-10-26_08-46-59 1189079 - hudson - 2011-10-26 08:51:47
Client timeouts are set to 4 seconds.
Thanks,
-Jay
On Thu, Jan 26, 2012 at 1:40 PM, Mark Miller markrmil...@gmail.com wrote:
On Jan 26, 2012, at 1:28 PM, Jay Hill wrote:
I've tried setting the following
if a response wasn't received w/in the timeAllowed, and if
partialResults is true, then that shard would not be waited on for results.
is that correct?
thanks,
-jay
On Thu, Jan 26, 2012 at 2:23 PM, Jay Hill jayallenh...@gmail.com wrote:
We're on the trunk:
4.0-2011-10-26_08-46-59 1189079 - hudson
What does /no_coord mean in the dismax scoring output? I've looked
through the wiki mail archives, lucidfind, and can't find any reference.
--
¡jah!
UnInvertedField is similar to Lucene's FieldCache, except, while the
FieldCache cannot work with multivalued fields, UnInvertedField is designed
for that very purpose. So since your f_dcperson field is multivalued, by
default you use UnInvertedField. You're not doing anything wrong, that's
default
I've worked with a lot of different Solr implementations, and one area that
is emerging more and more is using Solr in combination with other big data
solutions. My company, Lucid Imagination, has added a two-day course to our
upcoming Lucene Revolution conference, Scaling Search with Big Data and
I don't think I understand what you're trying to do. Are you trying to
preserve all facets after a user clicks on a facet, and thereby triggers a
filter query, which excludes the other facets? If that's the case, you can
use local parameters to tag the filter queries so they are not used for the
Looks good, thanks Tom.
-Jay
On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom tburt...@umich.eduwrote:
Thanks everyone.
I updated the wiki. If you have a chance please take a look and check to
make sure I got it right on the wiki.
Dismax works by first selecting the highest scoring sub-query of all the
sub-queries that were run. If I want to search on three fields, manu, name
and features, I can configure dismax like this:
requestHandler name=search_dismax class=solr.SearchHandler
lst name=defaults
str
As Hoss mentioned earlier in the thread, you can use the statistics page
from the admin console to view the current number of segments. But if you
want to know by looking at the files, each segment will have a unique
prefix, such as _u. There will be one unique prefix for every segment in
the
You mentioned that dismax does not support wildcards, but edismax does. Not
sure if dismax would have solved your other problems, or whether you just
had to shift gears because of the wildcard issue, but you might want to have
a look at edismax.
-Jay
http://www.lucidimagination.com
On Mon, Jan
You can always try something like this out in the analysis.jsp page,
accessible from the Solr Admin home. Check out that page and see how it
allows you to enter text to represent what was indexed, and text for a
query. You can then see if there are matches. Very handy to see how the
various
Removing those components is not likely to impact performance very much, if
at all. I would focus on other areas when tuning performance, such as
looking memory usage and configuration, query design, etc. But there isn't
any harm in removing them either. Why not do some load tests with the
I'm having trouble getting the core CREATE command to work with relative
paths in the solr.xml configuration.
I'm working with a layout like this:
/opt/solr [this is solr.solr.home: $SOLR_HOME]
/opt/solr/solr.xml
/opt/solr/core0/ [this is the template core]
/opt/solr/core0/conf/schema.xml [etc.]
A merge factor of 100 is very high and out of the norm. Try starting with a
value of 10. I've never seen a running system with a value anywhere near
this high.
Also, what is your setting for ramBufferSizeMB?
-Jay
On Tue, Aug 17, 2010 at 10:46 AM, rajini maski rajinima...@gmail.comwrote:
yeah
Working with SolrJ I'm doing a query using the StatsComponent, and the
stats.facet parameter. I'm not able to set multiple fields for the
stats.facet parameter using SolrJ. Here is the query I'm trying to create:
I was wondering about the production readiness of the new-in-trunk spatial
functionality. Is anyone using this in a production environment?
-Jay
I've done a lot of recency boosting to documents, and I'm wondering why you
would want to do that at index time. If you are continuously indexing new
documents, what was recent when it was indexed becomes, over time less
recent. Are you unsatisfied with your current performance with the boost
I've got a situation where I'm looking to build an auto-suggest where any
term entered will lead to suggestions. For example, if I type wine I want
to see suggestions like this:
french *wine* classes
*wine* book discounts
burgundy *wine*
etc.
I've tried some tricks with shingles, but the only
The fieldNorm is computed like this: fieldNorm = lengthNorm * documentBoost
* documentFieldBoosts
and the lengthNorm is: lengthNorm = 1/(numTermsInField)**.5
[note that the value is encoded as a single byte, so there is some precision
loss]
So the values are not pre-set for the lengthNorm, but
Yes, if omitNorms=true, then no lengthNorm calculation will be done, and the
fieldNorm value will be 1.0, and lengths of the field in question will not
be a factor in the score.
To see an example of this you can do a quick test. Add two text fields,
and on one omitNorms:
field name=foo
Yes, it will be recorded and available to view after the presentation.
-Jay
On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton
bernadette.hough...@deakin.edu.au wrote:
Yonk, can you please advise whether this event will be recorded and
available for later download? (It starts 5am our time
Looks like multi-threaded support was added to the DIH recently:
http://issues.apache.org/jira/browse/SOLR-1352
-Jay
On Fri, Feb 19, 2010 at 6:27 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
Glen may be referring to LuSql indexing with multiple threads?
Does/can DIH do that, too?
Set the tie parameter to 1.0. This param is set between 0.0 (pure
disjunction maximum) and 1.0 (pure disjunction sum):
http://wiki.apache.org/solr/DisMaxRequestHandler#tie_.28Tie_breaker.29
-Jay
On Thu, Feb 18, 2010 at 4:24 AM, bharath venkatesh
bharathv6.proj...@gmail.com wrote:
Hi ,
With a mergeFactor set to anything 1 you would never have only one segment
- unless you optimized. So Lucene will never naturally merge all the
segments into one. Unless, I suppose, the mergeFactor was set to 1, but I've
never tested that. It's hard to picture how that would work.
If I
Thanks for clearing that up guys, I misspoke slightly. It's just that, in a
running system, it's probably very rare that there is only a single segment
for any meaningful length of time. Unless that merge-down-to-one occurs
right when indexing stops there will almost always be a new (small)
My colleague at Lucid Imagination, Tom Hill, will be presenting a free
webinar focused on analysis in Lucene/Solr. If you're interested, please
sign up and join us.
Here is the official notice:
We'd like to invite you to a free webinar our company is offering next
Thursday, 28 January, at 2PM
A couple of follow up questions:
- What type of garbage collector is in use?
- How often are you optimizing the index?
- In solrconfig.xml what is the setting for mainIndexramBufferSizeMB?
- Right before and after you see this pause, check the output of
http://host:port/solr/admin/system,
It's definitely still an issue. I've seen this with at least four different
Solr implementations. It clearly seems to be a problem when there is a large
field cache. It would be bad enough if the stats.jsp was just slow to load
(usually takes 1 to 2 minutes), but when monitoring memory usage with
Actually my cases were all with customers I work with, not just one case. A
common practice is to monitor cache stats to tune the caches properly. Also,
noting the warmup times for new IndexSearchers, etc. I've worked with people
that have excessive auto-warm count values which is causing
The version of Tika in the 1.4 release definitely parses the most current
Office formats (.docx, .pptx, etc.) and they index as expected.
-Jay
On Mon, Jan 4, 2010 at 6:02 PM, Peter Wolanin peter.wola...@acquia.comwrote:
You must have been searching old documentation - I think tika 0,3+ has
I've noticed this as well, usually when working with a large field cache. I
haven't done in-depth analysis of this yet, but it seems like when the stats
page is trying to pull data from a large field cache it takes quite a long
time.
Are you doing a lot of sorting? If so, what are the field types
Also, what is your heap size and the amount of RAM on the machine?
I've also noticed that, when watching memory usage through JConsole or
YourKit while loading the stats page, the memory usage spikes dramatically -
are you seeing this as well?
-Jay
On Thu, Dec 24, 2009 at 9:12 AM, Jay Hill
I'm on a project where I'm trying to determine the size of the field cache.
We're seeing lots of memory problems, and I suspect that the field cache is
extremely large, but I'm trying to get exact counts on what's in the field
cache.
One thing that struck me as odd in the output of the stats.jsp
, Dec 19, 2009 at 11:37 AM, Yonik Seeley
yo...@lucidimagination.comwrote:
On Sat, Dec 19, 2009 at 2:25 PM, Jay Hill jayallenh...@gmail.com wrote:
One thing that struck me as odd in the output of the stats.jsp page is
that
the field cache always shows a String type for a field, even
Oh, forgot to add (just to keep the thread complete), the field is being
used for a sort, so it was able to use TrieDoubleField.
Thanks again,
-Jay
On Sat, Dec 19, 2009 at 12:21 PM, Jay Hill jayallenh...@gmail.com wrote:
This field is of class type solr.SortableDoubleField.
I'm actually
I don't think your queries are actually nested queries. Nested queries key
off of the magic field name _query_. You're right however that there is
very little in the way of documentation of examples of nested queries. If
you haven't seen this blog about them yet you might find this a helpful
There is a text_rev field type in the example schema.xml file in the
official release of 1.4. It uses the ReversedWildcardFilterFactory to revers
a field. You can do a copyField from the field you want to use for leading
wildcard searches to a field using the text_rev field, and then do a regular
The replication admin page on slaves used to have an auto-reload set to
reload every few seconds. In the official 1.4 release this doesn't seem to
be working, but it does in a nightly build from early June. Was this changed
on purpose or is this a bug? I looked through CHANGES.txt to see if
Here is a brief example of how to use SolrJ with the
ExtractingRequestHandler:
ContentStreamUpdateRequest req = new
ContentStreamUpdateRequest(/update/extract);
req.addFile(fileToIndex);
req.setParam(literal.id, getId(fileToIndex));
You can set up multiple request handlers each with their own configuration
file. For example, in addition to the config you listed you could add
something like this:
requestHandler name=/dataimport-two
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str
So assuming you set up a few sample sort queries to run in the firstSearcher
config, and had very low query volume during that ten minutes so that there
were no evictions before a new Searcher was loaded, would those queries run
by the firstSearcher be passed along to the cache for the next
Have a look at the VelocityResponseWriter (
http://wiki.apache.org/solr/VelocityResponseWriter). It's in the contrib
area, but the wiki has instructions on how to move it into your core Solr.
Solr uses response writers to return results. The default is XML but
responses can be returned in JSON,
1.4 has a good chance of being released next week. There was a hope that it
might make it this week, but another bug in Lucene 2.9.1 was found, pushing
things back just a little bit longer.
-Jay
http://www.lucidimagination.com
On Thu, Oct 29, 2009 at 11:43 AM, beaviebugeater
://issues.apache.org/jira/browse/SOLR-1501
On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill jayallenh...@gmail.com wrote:
In the past setting rows=n with the full-import command has stopped the
DIH
importing at the number I passed in, but now this doesn't seem to be
working. Here is the command I'm using
Use copyField to copy to a field with a field type like this:
fieldType name=special class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter
You could use separate DIH config files for each of your three tables. This
might be overkill, but it would keep them separate. The DIH is not limited
to one request handler setup, so you could create a unique handler for each
case with a unique name:
requestHandler name=/indexer/table1
Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar
and then hit url: http://localhost:8983/solr/core0/admin/ or
http://localhost:8983/solr/core1/admin/
-Jay
http://www.lucidimagination.com
On Fri, Oct 9, 2009 at 1:17 PM, Jason Rutherglen jason.rutherg...@gmail.com
wrote:
I
: Started SocketConnector @ 0.0.0.0:8983
And http://localhost:8983/solr/admin yields a 404 error.
On Fri, Oct 9, 2009 at 1:27 PM, Jay Hill jayallenh...@gmail.com wrote:
Shouldn't that be: java -Dsolr.solr.home=multicore -jar start.jar
and then hit url: http://localhost:8983/solr/core0/admin
In the past setting rows=n with the full-import command has stopped the DIH
importing at the number I passed in, but now this doesn't seem to be
working. Here is the command I'm using:
curl '
http://localhost:8983/solr/indexer/mediawiki?command=full-importrows=100'
But when 100 docs are imported
approaches are to use either the TermsComponent (new in Solr
1.4) or faceting.
On Wed, Oct 7, 2009 at 1:51 AM, Jay Hill jayallenh...@gmail.com wrote:
Have a look at a blog I posted on how to use EdgeNGrams to build an
auto-suggest tool:
http://www.lucidimagination.com/blog/2009/09/08/auto
Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
deprecated in favor of:
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
in 1.4?
-Jay
http://www.lucidimagination.com
On Wed, Oct 7, 2009 at 1:44 AM, Shalin Shekhar Mangar
Have a look at a blog I posted on how to use EdgeNGrams to build an
auto-suggest tool:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
You could easily add filter queries to this approach. Ffor example, the
query used in the blog could add
When working with SolrJ I have typically batched a Collection of
SolrInputDocument objects before sending them to the Solr server. I'm
working with the latest nightly build and using the ExtractingRequestHandler
to index documents, and everything is working fine. Except I haven't been
able to
For security reasons (say I'm indexing very sensitive data, medical records
for example) is there a way to encrypt data that is stored in Solr? Some
businesses I've encountered have such needs and this is a barrier to them
adopting Solr to replace other legacy systems. Would it require a
Use: ?q=*:*
-Jay
http://www.lucidimagination.com
On Mon, Sep 14, 2009 at 4:18 PM, Jonathan Vanasco jvana...@2xlp.com wrote:
I'm using Solr for seach and faceted browsing
Is it possible to have solr search for 'everything' , at least as far as q
is concerned ?
The request handlers I've
With dismax you can use q.alt when the q param is missing:
q.alt=*:*
should work.
-Jay
On Mon, Sep 14, 2009 at 5:38 PM, Jonathan Vanasco jvana...@2xlp.com wrote:
Thanks Jay Matt
I tried *:* on my app, and it didn't work
I tried it on the solr admin, and it did
I checked the solr config
The two jar files are all you should need, and the configuration is correct.
However I noticed that you are on Solr 1.3. I haven't tested the Lucid
KStemmer on a non-Lucid-certified distribution of 1.3. I have tested it on
recent versions of 1.4 and it works fine (just tested with the most recent
Will do Shalin.
-Jay
http://www.lucidimagination.com
On Fri, Sep 11, 2009 at 9:23 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
Jay, it would be great if you can add this example to the Solrj wiki:
http://wiki.apache.org/solr/Solrj
On Fri, Sep 11, 2009 at 5:15 AM, Jay Hill
field as a snippet.
On Thu, Sep 10, 2009 at 7:45 PM, Jay Hill jayallenh...@gmail.com wrote:
Set up the query like this to highlight a field named content:
SolrQuery query = new SolrQuery();
query.setQuery(foo);
query.setHighlight(true).setHighlightSnippets(1); //set other params
high lighted, even if the search term only occurs in the
first line of a 300 page field. I'm not sure if mergeContinuous will
do that, or if it will miss everything after the last line that
contains the search term.
On Fri, Sep 11, 2009 at 10:42 AM, Jay Hill jayallenh...@gmail.com wrote:
It's
RequestHandlers are configured in solrconfig.xml. If no components are
explicitly declared in the request handler config the the defaults are used.
They are:
- QueryComponent
- FacetComponent
- MoreLikeThisComponent
- HighlightComponent
- StatsComponent
- DebugComponent
If you wanted to have a
All you have to do is use the start and rows parameters to get the
results you want. For example, the query for the first page of results might
look like this,
?q=solrstart=0rows=10 (other params omitted). So you'll start at the
beginning (0) and get 10 results. They next page would be
If you need an alternative to using the TermsComponent for auto-suggest,
have a look at this blog on using EdgeNGrams instead of the TermsComponent.
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
-Jay
http://www.lucidimagination.com
On Wed,
Set up the query like this to highlight a field named content:
SolrQuery query = new SolrQuery();
query.setQuery(foo);
query.setHighlight(true).setHighlightSnippets(1); //set other params as
needed
query.setParam(hl.fl, content);
QueryResponse queryResponse
Unfortunately you can't sort on a multi-valued field. In order to sort on a
field it must be indexed but not multi-valued.
Have a look at the FieldOptions wiki page for a good description of what
values to set for different use cases:
http://wiki.apache.org/solr/FieldOptionsByUseCase
-Jay
This seems to work:
?q=field\ name:something
Probably not a good idea to have field names with whitespace though.
-Jay
2009/8/28 Marcin Kuptel marcinkup...@gmail.com
Hi,
Is there a way to query solr about fields which names contain whitespaces?
Indexing such data does not cause any
wrote:
On Aug 7, 2009, at 5:23pm, Jay Hill wrote:
I'm using the MoreLikeThisHandler with a content stream to get documents
from my index that match content from an html page like this:
http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi
?f=/c/a/2009/08/06
I'm using the MoreLikeThisHandler with a content stream to get documents
from my index that match content from an html page like this:
http://localhost:8080/solr/mlt?stream.url=http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2009/08/06/SP5R194Q13.DTLmlt.fl=bodyrows=4debugQuery=true
But, not
affected and
not
a
resultSet. DIH expects one and hence the exception.
Cheers
Avlesh
On Tue, Aug 4, 2009 at 1:49 AM, Jay Hill jayallenh...@gmail.com
wrote:
Is it possible for the DataImportHandler to update records in the
table
it
is querying? For example, say I have
Is it possible for the DataImportHandler to update records in the table it
is querying? For example, say I have a query like this in my entity:
query=select field1, field2, from someTable where hasBeenIndexed=false
Is there a way I can mark each record processed by updating the
hasBeenIndexed
Check the system request handler: http://localhost:8983/solr/admin/system
Should look something like this:
lst name=lucene
str name=solr-spec-version1.3.0.2009.07.28.10.39.42/str
str name=solr-impl-version1.4-dev 797693M - jayhill - 2009-07-28
10:39:42/str
str name=lucene-spec-version2.9-dev/str
I'm doing some testing with field collapsing, and early results look good.
One thing seems odd to me however. I would expect to get back one block of
results, but I get two - the first one contains the collapsed results, the
second one contains the full non-collapsed results:
result name=response
I am trying to run full and delta imports with the commit=false option, but
it doesn't seem to take effect - after the import a commit always happens no
matter what params I send. I've looked at the source and unless I'm missing
something it doesn't seem to process the commit param.
Here's the
We had the same thing to deal with recently, and a great solution was posted
to the list. Create a stopwords filter on the field your using for your
spell checking, and then populate a custom stopwords file with known
misspelled words:
fieldType name=textSpell class=solr.TextField
My bad, I had a configuration setting overriding this value. Sorry for the
mistake.
-Jay
On Wed, Jul 15, 2009 at 12:07 PM, Jay Hill jayallenh...@gmail.com wrote:
I am trying to run full and delta imports with the commit=false option, but
it doesn't seem to take effect - after the import
Actually, my good after all. The parameter does not take effect. If
commit=false is passed in a commit still happens.
Will open and JIRA and supply a patch shortly.
-Jay
On Wed, Jul 15, 2009 at 5:50 PM, Jay Hill jayallenh...@gmail.com wrote:
My bad, I had a configuration setting overriding
We're building a spell index from a field in our main index with the
following configuration:
searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
str name=namedefault/str
str name=fieldspell/str
Francis, your question is a little vague. Are you looking for the
configuration for connecting the DIH to a JNDI datasource set up in
Weblogic?
dataSource
name=dsDb
jndiName=java:comp/env/jdbc/myWeblogicDatasource
type=JdbcDataSource
user=/
-Jay
On Mon, Jul 6,
Just to be sure: You mentioned that you adjusted schema.xml - did you
re-index after making your changes?
-Jay
On Wed, Jul 8, 2009 at 7:07 AM, Yang Lin beckl...@gmail.com wrote:
Thanks for your reply. But it works not.
Yang
2009/7/8 Yao Ge yao...@gmail.com
Try with fl=* or fl=*,score
I haven't tried this myself, but it sounds like what you're looking for is
enabling remote streaming:
http://wiki.apache.org/solr/ContentStream#head-7179a128a2fdd5dde6b1af553ed41735402aadbf
As the link above shows you should be able to enable remote streaming like
this: requestParsers
Mathieu, have a look at Solr's DataImportHandler. It provides a
configuration-based approach to index different types of datasources
including relational databases and XML files. In particular have a look at
the XpathEntityProcessor (
Thanks Noble, I gave those examples a try.
If I use field column=body xpath=/book/body/chapter/p / I only get
the text from the last p element, not from all elements.
If I use field column=body xpath=/book/body/chapter flatten=true/
or field column=body xpath=/book/body/chapter/ flatten=true/ I
It is not multivalued. The intention is to get all text under they body
element into one body field in the index that is not multivalued.
Essentially everything within the body element minus the markup.
Thanks,
-Jay
On Thu, Jul 2, 2009 at 8:55 AM, Fergus McMenemie fer...@twig.me.uk wrote:
I'm on the trunk, built on July 2: 1.4-dev 789506
Thanks,
-Jay
On Thu, Jul 2, 2009 at 11:33 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Thu, Jul 2, 2009 at 11:38 PM, Mark Miller markrmil...@gmail.com
wrote:
Shalin Shekhar Mangar wrote:
It selects all matching nodes.
Thanks Fergus, setting the field to multivalued did work:
field column=body xpath=/book/body/chapter/p flatten=true/
gets all the p elements as multivalue fields in the body field.
The only thing is, the body field is used by some other content sources, so
I have to look at the implications
I'm using the DIH to index records from a relational database. No problems,
everything works great. But now, due to the size of index (70GB w/ 25M+
docs) I need to shard and want the DIH to distribute documents evenly
between two shards. Current approach is to modify the sql query in the
config
I'm using the XPathEntityProcessor to parse an xml structure that looks like
this:
book
authorJoe Smith/author
titleWorld Atlas/title
body
chapter
pContent I want is here/p
pMore content I want is here./p
pStill more content here./p
I'm having some trouble getting the PlainTextEntityProcessor to populate a
field in an index. I'm using the TemplateTransformer to fill 2 fields, and
have a timestamp field in schema.xml, and these fields make it into the
index. Only the plaintText data is missing. Here is my configuration:
Regarding being able to search SCHOLKOPF (o with no umlaut) and match
SCHÖLKOPF (with umlaut) try using the ISOLatin1AccentFilterFactory in your
analysis chain:
filter class=solr.ISOLatin1AccentFilterFactory /
This filter removes accented chars and replaces them with non-accented
In order to get the the values you want for the service field you will need
to change the fieldType definition in schema.xml for service to use
something that doesn't alter your original values. Try the string
fieldType to start and look at the fieldType definition for string. I'm
guessing you
Use the fl param to ask for only the fields you need, but also keep hl=true.
Something like this:
http://localhost:8080/solr/select/?q=bearversion=2.2start=0rows=10indent=onhl=truefl=id
Note that fl=id means the only field returned in the XML will be the id
field.
Highlights are still returned
Try using the admin analysis tool
(http://host:port/solr/admin/analysis.jsp)
too see what the analysis chain is doing to your query. Enter the field name
(question in your case) and the Field value (Index) customize (since
that's what's in the document). For Field value (Query) enter customer.
1 - 100 of 107 matches
Mail list logo