mapreduce job using soirj 5

2015-06-16 Thread adfel70
Hi, We recently started testing solr 5, our indexer creates mapreduce job that uses solrj5 to index documents to our SolrCloud. Until now, we used solr 4.10.3 with solrj 4.8.0. Our hadoop dist is cloudera 5. The problem is, solrj5 is using httpclient-4.3.1 while hadoop is installed with

How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi, I have a requirement to create the concatenated token of all the tokens created from the last item of my analyzer chain. *Suppose my analyzer chain is :* * tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory catenateAll=1 splitOnNumerics=1

Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Toke Eskildsen
Shenghua(Daniel) Wan wansheng...@gmail.com wrote: Actually, I am currently interested in how to boost merging/optimizing performance of single solr instance. We have the same challenge (we build static 900GB shards one at a time and the final optimization takes 8 hours with only 1 CPU core at

Re: How to create concatenated token

2015-06-16 Thread Alessandro Benedetti
Can I ask you why you need to concatenate the tokens ? Maybe we can find a better solution to concat all the tokens in one single big token . I find it difficult to understand the reasons behind tokenising, token filtering and then un-tokenizing again :) It would be great if you explain a little

Highlight in Velocity UI on Google Chrome

2015-06-16 Thread Sznajder ForMailingList
Hi, I was testing the highlight feature and played with the techproducts example. It appears that the highlighting works on Mozilla Firefox, but not on Google Chrome. For your information Benjamin

Re: Do we need to add docValues=true to _version_ field in schema.xml?

2015-06-16 Thread Erick Erickson
Did you look in the example schema files? None of them have _version_ set as docValues. Best, Erick On Tue, Jun 16, 2015 at 1:44 AM, forest_soup tanglin0...@gmail.com wrote: For the _version_ field in the schema.xml, do we need to set it be docValues=true? field name=_version_ type=long

Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545) title:(solr training) With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:13 PM, Aman Tandon amantandon...@gmail.com wrote: We has some business logic to

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
The suggesters are built to return whole fields. You _might_ be able to add multiple fragments to a multiValued entry and get fragments, I haven't tried that though and I suspect that actually you'd get the same thing.. This is an XY problem IMO. Please describe exactly what you're trying to

Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yep seems that’s the answer. The highlighting is done separately by the rails app, so I’ll look into proper solr highlighting. thanks a lot for the use of your ears, much improved understanding! cheers, Alistair -- mov eax,1 mov ebx,0 int 80h On 16/06/2015 16:33, Erick Erickson

Re: phrase matches returning near matches

2015-06-16 Thread Erick Erickson
I agree with Allesandro the behavior you're describing is _not_ correct at all given your description. So either 1 There's something interesting about your configuration that doesn't seem important that you haven't told us, although what it could be is a mystery to me too ;) 2 it's

Re: phrase matches returning near matches

2015-06-16 Thread Erick Erickson
Hmmm. First, highlighting should work here. If you have it configured to work on the dc.description field. As to whether the phrase management changes is near enough, I pretty much guarantee it is. This is where the admin/analysis page can answer this type of question authoritatively since it's

Re: mapreduce job using soirj 5

2015-06-16 Thread Erick Erickson
Sounds like a question better asked in one of the Cloudera support forums, 'cause all I can do is guess ;). I suppose, theoretically, that you could check out the Solr5 code and substitute the httpclient-4.2.5.jar in the build system, recompile and go, but that's totally a guess based on zero

Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
yes prolly not a bug. The highlighting is on but nothing is highlighted. Perhaps this text is triggering it? 'consider the impacts of land management changes’ that would seem reasonable. It’s not a direct match so no highlighting (the highlighting does work on a direct match) but 'management

TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
Hi, there's a guy who's already asked a question similar to this and I'm basically going off what he did here. It's exactly what I'm doing which is taking a file path from a database and using TikaEntityProcessor to analyze the document. The link to his question is here.

Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
We has some business logic to search the user query in user intent or finding the exact matching products. e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) As we can see it is phrase query so it will took more time than the single stemmed token query. There are also 5-7

Re: mapreduce job using soirj 5

2015-06-16 Thread Shawn Heisey
On 6/16/2015 9:24 AM, Erick Erickson wrote: Sounds like a question better asked in one of the Cloudera support forums, 'cause all I can do is guess ;). I suppose, theoretically, that you could check out the Solr5 code and substitute the httpclient-4.2.5.jar in the build system, recompile and

Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Shenghua(Daniel) Wan
Hi, Toke, Did you try MapReduce with solr? I think it should be a good fit for your use case. On Tue, Jun 16, 2015 at 5:02 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Shenghua(Daniel) Wan wansheng...@gmail.com wrote: Actually, I am currently interested in how to boost

Re: phrase matches returning near matches

2015-06-16 Thread Terry Rhodes
This might be an issue with your stemmer. management being stemmed to manage, changes being stemmed to change then the terms match. You can use the solr admin UI to test your indexing and query analysis chains to see if this is happening. On 6/16/2015 3:22 AM, Alistair Young wrote: Hiya,

Re: TikaEntityProcessor Not Finding My Files

2015-06-16 Thread Paden
I thought it might be useful to list the logging errors as well. Here they are. There are just three. WARN FileDataSourceFileDataSource.basePath is empty. Resolving to: /home/paden/Downloads/solr-5.1.0/server/. ERRORDocBuilder Exception while processing: file document :

Re: Do we need to add docValues=true to _version_ field in schema.xml?

2015-06-16 Thread Chris Hostetter
: For the _version_ field in the schema.xml, do we need to set it be : docValues=true? you *can* add docValues, but it is not required. There is an open discussion about wether we should add docValues to the _version_ field (or even switch completely to indexed=false) in this jira...

Re: mapreduce job using soirj 5

2015-06-16 Thread Shenghua(Daniel) Wan
Hadoop has a switch that lets you use your jar rather than the one hadoop carries. google for HADOOP_OPTS good luck. On Tue, Jun 16, 2015 at 7:23 AM, adfel70 adfe...@gmail.com wrote: Hi, We recently started testing solr 5, our indexer creates mapreduce job that uses solrj5 to index documents

Re: Facet on same field in different ways

2015-06-16 Thread Chris Hostetter
: Have you tried this syntax ? : : facet=truefacet.field={!ex=st key=terms facet.limit=5 : facet.prefix=ap}query_termsfacet.field={!key=terms2 : facet.limit=1}query_termsrows=0facet.mincount=1 : : This seems the proper syntax, I found it here : yeah, local params are supported for specifying

Re: Highlight in Velocity UI on Google Chrome

2015-06-16 Thread Upayavira
I think it makes it bold on bold, which won't be particularly visible. On Tue, Jun 16, 2015, at 06:52 AM, Sznajder ForMailingList wrote: Hi, I was testing the highlight feature and played with the techproducts example. It appears that the highlighting works on Mozilla Firefox, but not on

Re: Facet on same field in different ways

2015-06-16 Thread Phanindra R
Thanks guys. The syntax facet.field={!key=abc facet.limit=10}facetFieldName works. On Tue, Jun 16, 2015 at 11:22 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : Have you tried this syntax ? : : facet=truefacet.field={!ex=st key=terms facet.limit=5 :

Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi, Any guesses, how could I achieve this behaviour. With Regards Aman Tandon On Tue, Jun 16, 2015 at 8:15 PM, Aman Tandon amantandon...@gmail.com wrote: e.g. Intent for solr training: fq=id: 234, 456, 545 title(solr training) typo error e.g. Intent for solr training: fq=id:(234 456 545)

Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
The long content is from when I tried to index PDF files. As some PDF files has alot of words in the content, it will lead to the *UTF8 encoding is longer than the max length 32766 error.* I think the problem is the content size of the PDF file exceed 32766 characters? I'm trying to accomplish

Re: Solr's suggester results

2015-06-16 Thread Erick Erickson
Have you looked at spellchecker? Because that sound much more like what you're asking about than suggester. Spell checking is more what you're asking for, have you even looked at that after it was suggested? bq: Also, when I do a search, it shouldn't be returning whole fields, but just to return

Re: Solr's suggester results

2015-06-16 Thread Zheng Lin Edwin Yeo
Yes I've looked at that before, but I was told that the newer version of Solr has its own suggester, and does not need to use spellchecker anymore? So it's not necessary to use the spellechecker inside suggester anymore? Regards, Edwin On 17 June 2015 at 11:56, Erick Erickson

Joins with comma separated values

2015-06-16 Thread Advait Suhas Pandit
Hi, We have some master data and some content data. Master data would be things like userid, name, email id etc. Our content data for example is a blog. The blog has certain fields which are comma separated ids that point to the master data. E.g. UserIDs of people who have commented on a

Re: How to create concatenated token

2015-06-16 Thread Aman Tandon
Hi Erick, Thank you so much, it will be helpful for me to learn how to save the state of token. I has no idea of how to save state of previous tokens due to this it was difficult to generate a concatenated token in the last. So is there anything should I read to learn more about it. With

Re: How to create concatenated token

2015-06-16 Thread Erick Erickson
I really question the premise, but have a look at: https://issues.apache.org/jira/browse/SOLR-7193 Note that this is not committed and I haven't reviewed it so I don't have anything to say about that. And you'd have to implement it as a custom Filter. Best, Erick On Tue, Jun 16, 2015 at 5:55

Re: Raw lucene query for a given solr query

2015-06-16 Thread Chris Hostetter
: You can get raw query (and other debug information) with debug=true : paramter. more specifically -- if you are writting a custom SearchComponent, and want to access the underlying Query object produced by the parsers that SolrIndexSearcher has executed, you can do so the same way the debug

Re: solr/lucene index merge and optimize performance improvement

2015-06-16 Thread Shenghua(Daniel) Wan
​I think your advice on future incremental update is very useful. I will keep eye on that. Actually, I am currently interested in how to boost merging/optimizing performance of single solr instance. Parallelism at MapReduce level does not help merging/optimizing much, unless Solr/Lucene

Do we need to add docValues=true to _version_ field in schema.xml?

2015-06-16 Thread forest_soup
For the _version_ field in the schema.xml, do we need to set it be docValues=true? field name=_version_ type=long indexed=true stored=true/ As we noticed there are FieldCache for _version_ in the solr stats: http://lucene.472066.n3.nabble.com/file/n4212123/IMAGE%245A8381797719FDA9.jpg --

Re: Raw lucene query for a given solr query

2015-06-16 Thread Tomoko Uchida
Hi, You can get raw query (and other debug information) with debug=true paramter. Regards, Tomoko 2015-06-16 8:10 GMT+09:00 KNitin nitin.t...@gmail.com: Hi, We have a few custom solrcloud components that act as value sources inside solrcloud for boosting items in the index. I want to get

Re: Solr's suggester results

2015-06-16 Thread Alessandro Benedetti
in line : 2015-06-16 4:43 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Thanks Benedetti, I've change to the AnalyzingInfixLookup approach, and it is able to start searching from the middle of the field. However, is it possible to make the suggester to show only part of the content

Re: Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-16 Thread ariya bala
Ok. Thank you Chris. It is a custom Query parser. I will check my Query parser on where it inject the slop 1. On Tue, Jun 16, 2015 at 3:26 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I encounter this peculiar case with solr 4.10.2 where the parsed query : doesnt seem to be logical.

Re: Facet on same field in different ways

2015-06-16 Thread Alessandro Benedetti
Hi Phanindra, Have you tried this syntax ? facet=truefacet.field={!ex=st key=terms facet.limit=5 facet.prefix=ap}query_termsfacet.field={!key=terms2 facet.limit=1}query_termsrows=0facet.mincount=1 This seems the proper syntax, I found it here : https://issues.apache.org/jira/browse/SOLR-4717 Is

Re: Phrase query get converted to SpanNear with slop 1 instead of 0

2015-06-16 Thread Alessandro Benedetti
Hi Ariya, I think Hossman specified you that the slop 1 is fine in your use case :) Of course in the case using span queries was what you were expecting ! Cheers 2015-06-16 10:13 GMT+01:00 ariya bala ariya...@gmail.com: Ok. Thank you Chris. It is a custom Query parser. I will check my Query

phrase matches returning near matches

2015-06-16 Thread Alistair Young
Hiya, I've been looking for documentation that would point to where I could modify or explain why 'near neighbours' are returned from a phrase search. If I search for: manage change I get back a document that contains this will help in your management of lots more words... changes. It's

Re: phrase matches returning near matches

2015-06-16 Thread Alessandro Benedetti
Can you show us how the query is parsed ? You didn't tell us nothing about the query parser you are using. Enable the debugQuery=true will show you how the query is parsed and this will be quite useful for us. Cheers 2015-06-16 11:22 GMT+01:00 Alistair Young alistair.yo...@uhi.ac.uk: Hiya,

What contribute to a Solr core's FieldCache entry_count?

2015-06-16 Thread forest_soup
For the fieldCache, what determines the entries_count? Is each search request containing a sort on an non-docValues field contribute one entry to the entries_count? For example, search A ( q=owner:1sort=maildate asc ) and search b ( q=owner:2sort=maildate asc ) will contribute 2 field cache

Re: phrase matches returning near matches

2015-06-16 Thread Alistair Young
it¹s a useful behaviour. I¹d just like to understand where it¹s deciding the document is relevant. debug output is: lst name=debug str name=rawquerystringdc.description:manage change/str str name=querystringdc.description:manage change/str str

Re: phrase matches returning near matches

2015-06-16 Thread Alessandro Benedetti
According to your debug you are using a default Lucene Query Parser. This surprise me as i would expect with that query a match with distance 0 between the 2 terms . Are you sure nothing else is that field that matches the phrase query ? From the documentation Lucene supports finding words are