Hi,
We recently started testing solr 5, our indexer creates mapreduce job that
uses solrj5 to index documents to our SolrCloud. Until now, we used solr
4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5.
The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed
with
Hi,
I have a requirement to create the concatenated token of all the tokens
created from the last item of my analyzer chain.
*Suppose my analyzer chain is :*
* tokenizer class=solr.WhitespaceTokenizerFactory / filter
class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1
Shenghua(Daniel) Wan wansheng...@gmail.com wrote:
Actually, I am currently interested in how to boost merging/optimizing
performance of single solr instance.
We have the same challenge (we build static 900GB shards one at a time and the
final optimization takes 8 hours with only 1 CPU core at
Can I ask you why you need to concatenate the tokens ? Maybe we can find a
better solution to concat all the tokens in one single big token .
I find it difficult to understand the reasons behind tokenising, token
filtering and then un-tokenizing again :)
It would be great if you explain a little
Hi,
I was testing the highlight feature and played with the techproducts
example.
It appears that the highlighting works on Mozilla Firefox, but not on
Google Chrome.
For your information
Benjamin
Did you look in the example schema files? None of them have
_version_ set as docValues.
Best,
Erick
On Tue, Jun 16, 2015 at 1:44 AM, forest_soup tanglin0...@gmail.com wrote:
For the _version_ field in the schema.xml, do we need to set it be
docValues=true?
field name=_version_ type=long
e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)
typo error
e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training)
With Regards
Aman Tandon
On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com
wrote:
We has some business logic to
The suggesters are built to return whole fields. You _might_
be able to add multiple fragments to a multiValued
entry and get fragments, I haven't tried that though
and I suspect that actually you'd get the same thing..
This is an XY problem IMO. Please describe exactly what
you're trying to
yep seems that’s the answer. The highlighting is done separately by the
rails app, so I’ll look into proper solr highlighting.
thanks a lot for the use of your ears, much improved understanding!
cheers,
Alistair
--
mov eax,1
mov ebx,0
int 80h
On 16/06/2015 16:33, Erick Erickson
I agree with Allesandro the behavior you're describing
is _not_ correct at all given your description. So either
1 There's something interesting about your configuration
that doesn't seem important that you haven't told us,
although what it could be is a mystery to me too ;)
2 it's
Hmmm. First, highlighting should work here. If you have it configured
to work on the dc.description field.
As to whether the phrase management changes is near enough, I
pretty much guarantee it is. This is where the admin/analysis page can
answer this type of question authoritatively since it's
Sounds like a question better asked in one of the Cloudera support
forums, 'cause all I can do is guess ;).
I suppose, theoretically, that you could check out the Solr5
code and substitute the httpclient-4.2.5.jar in the build system,
recompile and go, but that's totally a guess based on zero
yes prolly not a bug. The highlighting is on but nothing is highlighted.
Perhaps this text is triggering it?
'consider the impacts of land management changes’
that would seem reasonable. It’s not a direct match so no highlighting
(the highlighting does work on a direct match) but 'management
Hi, there's a guy who's already asked a question similar to this and I'm
basically going off what he did here. It's exactly what I'm doing which is
taking a file path from a database and using TikaEntityProcessor to analyze
the document. The link to his question is here.
We has some business logic to search the user query in user intent or
finding the exact matching products.
e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)
As we can see it is phrase query so it will took more time than the single
stemmed token query. There are also 5-7
On 6/16/2015 9:24 AM, Erick Erickson wrote:
Sounds like a question better asked in one of the Cloudera support
forums, 'cause all I can do is guess ;).
I suppose, theoretically, that you could check out the Solr5
code and substitute the httpclient-4.2.5.jar in the build system,
recompile and
Hi, Toke,
Did you try MapReduce with solr? I think it should be a good fit for your
use case.
On Tue, Jun 16, 2015 at 5:02 AM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:
Shenghua(Daniel) Wan wansheng...@gmail.com wrote:
Actually, I am currently interested in how to boost
This might be an issue with your stemmer. management being stemmed to
manage, changes being stemmed to change then the terms match. You
can use the solr admin UI to test your indexing and query analysis
chains to see if this is happening.
On 6/16/2015 3:22 AM, Alistair Young wrote:
Hiya,
I thought it might be useful to list the logging errors as well. Here they
are. There are just three.
WARN FileDataSourceFileDataSource.basePath is empty. Resolving to:
/home/paden/Downloads/solr-5.1.0/server/.
ERRORDocBuilder
Exception while processing: file document :
: For the _version_ field in the schema.xml, do we need to set it be
: docValues=true?
you *can* add docValues, but it is not required.
There is an open discussion about wether we should add docValues to
the _version_ field (or even switch completely to indexed=false) in this
jira...
Hadoop has a switch that lets you use your jar rather than the one hadoop
carries.
google for HADOOP_OPTS
good luck.
On Tue, Jun 16, 2015 at 7:23 AM, adfel70 adfe...@gmail.com wrote:
Hi,
We recently started testing solr 5, our indexer creates mapreduce job that
uses solrj5 to index documents
: Have you tried this syntax ?
:
: facet=truefacet.field={!ex=st key=terms facet.limit=5
: facet.prefix=ap}query_termsfacet.field={!key=terms2
: facet.limit=1}query_termsrows=0facet.mincount=1
:
: This seems the proper syntax, I found it here :
yeah, local params are supported for specifying
I think it makes it bold on bold, which won't be particularly visible.
On Tue, Jun 16, 2015, at 06:52 AM, Sznajder ForMailingList wrote:
Hi,
I was testing the highlight feature and played with the techproducts
example.
It appears that the highlighting works on Mozilla Firefox, but not on
Thanks guys. The syntax facet.field={!key=abc
facet.limit=10}facetFieldName works.
On Tue, Jun 16, 2015 at 11:22 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:
: Have you tried this syntax ?
:
: facet=truefacet.field={!ex=st key=terms facet.limit=5
:
Hi,
Any guesses, how could I achieve this behaviour.
With Regards
Aman Tandon
On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com
wrote:
e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training)
typo error
e.g. Intent for solr training: fq=id:(234 456 545)
The long content is from when I tried to index PDF files. As some PDF files
has alot of words in the content, it will lead to the *UTF8 encoding is
longer than the max length 32766 error.*
I think the problem is the content size of the PDF file exceed 32766
characters?
I'm trying to accomplish
Have you looked at spellchecker? Because that sound much more like
what you're asking about than suggester.
Spell checking is more what you're asking for, have you even looked at that
after it was suggested?
bq: Also, when I do a search, it shouldn't be returning whole fields,
but just to return
Yes I've looked at that before, but I was told that the newer version of
Solr has its own suggester, and does not need to use spellchecker anymore?
So it's not necessary to use the spellechecker inside suggester anymore?
Regards,
Edwin
On 17 June 2015 at 11:56, Erick Erickson
Hi,
We have some master data and some content data. Master data would be things
like userid, name, email id etc.
Our content data for example is a blog.
The blog has certain fields which are comma separated ids that point to the
master data.
E.g. UserIDs of people who have commented on a
Hi Erick,
Thank you so much, it will be helpful for me to learn how to save the state
of token. I has no idea of how to save state of previous tokens due to this
it was difficult to generate a concatenated token in the last.
So is there anything should I read to learn more about it.
With
I really question the premise, but have a look at:
https://issues.apache.org/jira/browse/SOLR-7193
Note that this is not committed and I haven't reviewed
it so I don't have anything to say about that. And you'd
have to implement it as a custom Filter.
Best,
Erick
On Tue, Jun 16, 2015 at 5:55
: You can get raw query (and other debug information) with debug=true
: paramter.
more specifically -- if you are writting a custom SearchComponent, and
want to access the underlying Query object produced by the parsers that
SolrIndexSearcher has executed, you can do so the same way the debug
I think your advice on future incremental update is very useful. I will
keep eye on that.
Actually, I am currently interested in how to boost merging/optimizing
performance of single solr instance.
Parallelism at MapReduce level does not help merging/optimizing much,
unless Solr/Lucene
For the _version_ field in the schema.xml, do we need to set it be
docValues=true?
field name=_version_ type=long indexed=true stored=true/
As we noticed there are FieldCache for _version_ in the solr stats:
http://lucene.472066.n3.nabble.com/file/n4212123/IMAGE%245A8381797719FDA9.jpg
--
Hi,
You can get raw query (and other debug information) with debug=true
paramter.
Regards,
Tomoko
2015-06-16 8:10 GMT+09:00 KNitin nitin.t...@gmail.com:
Hi,
We have a few custom solrcloud components that act as value sources inside
solrcloud for boosting items in the index. I want to get
in line :
2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
Thanks Benedetti,
I've change to the AnalyzingInfixLookup approach, and it is able to start
searching from the middle of the field.
However, is it possible to make the suggester to show only part of the
content
Ok. Thank you Chris.
It is a custom Query parser.
I will check my Query parser on where it inject the slop 1.
On Tue, Jun 16, 2015 at 3:26 AM, Chris Hostetter hossman_luc...@fucit.org
wrote:
: I encounter this peculiar case with solr 4.10.2 where the parsed query
: doesnt seem to be logical.
Hi Phanindra,
Have you tried this syntax ?
facet=truefacet.field={!ex=st key=terms facet.limit=5
facet.prefix=ap}query_termsfacet.field={!key=terms2
facet.limit=1}query_termsrows=0facet.mincount=1
This seems the proper syntax, I found it here :
https://issues.apache.org/jira/browse/SOLR-4717
Is
Hi Ariya,
I think Hossman specified you that the slop 1 is fine in your use case :)
Of course in the case using span queries was what you were expecting !
Cheers
2015-06-16 10:13 GMT+01:00 ariya bala ariya...@gmail.com:
Ok. Thank you Chris.
It is a custom Query parser.
I will check my Query
Hiya,
I've been looking for documentation that would point to where I could modify or
explain why 'near neighbours' are returned from a phrase search. If I search
for:
manage change
I get back a document that contains this will help in your management of lots
more words... changes. It's
Can you show us how the query is parsed ?
You didn't tell us nothing about the query parser you are using.
Enable the debugQuery=true will show you how the query is parsed and this
will be quite useful for us.
Cheers
2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk:
Hiya,
For the fieldCache, what determines the entries_count?
Is each search request containing a sort on an non-docValues field
contribute one entry to the entries_count?
For example, search A ( q=owner:1sort=maildate asc ) and search b (
q=owner:2sort=maildate asc ) will contribute 2 field cache
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding
the document is relevant. debug output is:
lst name=debug
str name=rawquerystringdc.description:manage change/str
str name=querystringdc.description:manage change/str
str
According to your debug you are using a default Lucene Query Parser.
This surprise me as i would expect with that query a match with distance 0
between the 2 terms .
Are you sure nothing else is that field that matches the phrase query ?
From the documentation
Lucene supports finding words are
44 matches
Mail list logo