Re: get all tokens from TokenStream in my custom filter

2017-11-20 Thread Emir Arnautović
Hi Kumar > Emir , i need all tokens of query in incrementToken() function not only > current token That was just an example - the point was that you need to set attributes - you can read all tokens from previous stream, do whatever needed with them and when ready, set attributes and return

Solr - How to Clear the baseDir folder after the DIH import

2017-11-20 Thread Karan Saini
Hi guys, Solr Version :: 6.6.1 I am able to import the pdf files into the Solr system using the DIH and performs the indexing as expected. But i wish to clear the folder C:/solr-6.6.1/server/solr/core_K2_Depot*/Depot* after the successful finish of the indexing process. Please suggest, if there

Re: Leading wildcard searches very slow

2017-11-20 Thread Emir Arnautović
Hi Sundeep, The simplified explanation is that terms are indexed to be more prefix search friendly (and that is why Amrit suggested that you index term reversed if you want leading wildcard). If you use leading wildcard, there is no structure to limit terms that can be matched and engine has to

Re: Solr LTR plugin - Training

2017-11-20 Thread ilayaraja
Yes, that works. Thanks. - --Ilay -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Solr cloud in kubernetes

2017-11-20 Thread Björn Häuser
Hi Raja, we are using solrcloud as a statefulset and every pod has its own storage attached to it. Thanks Björn > On 20. Nov 2017, at 05:59, rajasaur wrote: > > Hi Bjorn, > > Im trying a similar approach now (to get solrcloud working on kubernetes). I > have run

Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Nawab Zada Asad Iqbal
@rick I see many indexing config, but i don't see any config related to query (i.e., number of threads etc.) in solrconfig. What will be the relevant part for this area? In jetty threadpool is set to 1. @Toke: I have a webserver which uses solr for querying, this i guess is pretty typical.

Re: Fwd: CVE-2017-3163 - SOLR-5.2.1 version

2017-11-20 Thread Rick Leir
Pad Read the CVE. Do you have an affected version of Solr? Do you have the replication feature enabled in solrconfig.xml? Note that it might be enabled by default. Test directory traversal on your system: can you read files remotely? No? Then you are finished. A better plan: upgrade to a newer

Issue facing with spell text field containing hyphen

2017-11-20 Thread Chirag Garg
Hi Team, I am facing issue for string containing hyphen when searched in spell field. My solr core is solr-6.6.0 Points to reproduce:- Eg:- 1. My search string is "spider-man". 2. When I do a search in solr with query spell:*spider-*. It shows numDocs=0 even though content is present. 3 . But

Fwd: CVE-2017-3163 - SOLR-5.2.1 version

2017-11-20 Thread padmanabhan gonesani
Please help me here -- Forwarded message -- From: padmanabhan gonesani Date: Mon, Nov 13, 2017 at 5:12 PM Subject: CVE-2017-3163 - SOLR-5.2.1 version To: gene...@lucene.apache.org Hi Team, *Description:* Apache Solr could allow a remote attacker to

Trailing wild card searches very slow in Solr

2017-11-20 Thread Sundeep T
Hi, We have several indexed string fields which is not tokenized and does not have docValues enabled. When we do trailing wildcard searches on these fields they are running very slow. We were thinking that since this field is indexed, such queries should be running pretty quickly. We are using

Re: Trailing wild card searches very slow in Solr

2017-11-20 Thread Erick Erickson
You already asked that question and got several answers, did you not see them? If you did see them, what is unclear? Best, Erick On Mon, Nov 20, 2017 at 9:33 AM, Sundeep T wrote: > Hi, > > We have several indexed string fields which is not tokenized and does not > have

Merging of index in Solr

2017-11-20 Thread Zheng Lin Edwin Yeo
Hi, Does anyone knows how long usually the merging in Solr will take? I am currently merging about 3.5TB of data, and it has been running for more than 28 hours and it is not completed yet. The merging is running on SSD disk. I am using Solr 6.5.1. Regards, Edwin

Deep Paging with cursorMark throws error

2017-11-20 Thread Webster Homer
I am developing an application that uses cursorMark deep paging. It's a java client using solrj client. Currently the client is created with Solr 6.2 solrj jars, but the test server is a solr 7.1 server I am getting this error: Error from server at http://XX:8983/solr/sial-catalog-product:

Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Rick Leir
Nawab Why it would be good to share the solrconfigs: I had a suspicion that you might be using the same solrconfig for version 7 and 4.5. That is unlikely to work well. But I could be way off base. Rick -- Sorry for being brief. Alternate email is rickleir at yahoo dot com

Solr regex phrase query syntax

2017-11-20 Thread Chuming Chen
Hi All, According to http://lucene.apache.org/core/7_1_0/core/org/apache/lucene/util/automaton/RegExp.html. Lucene supports repeat expressions. repeatexp ::= repeatexp ? (zero or one occurrence) | repeatexp * (zero or more occurrences) | repeatexp +

Re: Solr7: Very High number of threads on aggregator node

2017-11-20 Thread Toke Eskildsen
Nawab Zada Asad Iqbal wrote: > I have a webserver which uses solr for querying, this i guess is pretty > typical. At times, there are 50 users sending queries at a given second. > Sometimes, the queries take a few second to finish (i.e., if the max across > all shards is 5

Re: Trailing wild card searches very slow in Solr

2017-11-20 Thread Sundeep T
Hi Erick. I initially asked this question regarding leading wildcards. This was a typo, and what I meant was trailing wild card queries were slow. So queries like text:'hello*" are slow. We were expecting since the string field is already indexed, the searches should be fast, but that seems to be

Re: Trailing wild card searches very slow in Solr

2017-11-20 Thread Erick Erickson
At first glance you have a mis-configured setup. The most glaring issue is that you're trying to search a 150G index in 1G of memory. bq: String field (not tokenized) is docValues=true, indexed=true and stored=true OK, this is kind of unusual to query but if the field just contains single tokens

Re: Do i need to reindex after changing similarity setting

2017-11-20 Thread Walter Underwood
Similarity is query time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 20, 2017, at 4:57 PM, Nawab Zada Asad Iqbal wrote: > > Hi, > > I want to switch to Classic similarity instead of BM25 (default in solr7). > Do I need

Do i need to reindex after changing similarity setting

2017-11-20 Thread Nawab Zada Asad Iqbal
Hi, I want to switch to Classic similarity instead of BM25 (default in solr7). Do I need to reindex all cores after this? Or is it only a query time setting? Thanks Nawab

Re: Issue facing with spell text field containing hyphen

2017-11-20 Thread Rick Leir
Chirag Some scattered clues: StandardTokenizer splits on punctuation, so your spell field might not contain spider-man. When you do a wildcard search, the analysis chain can be different from what you expected. Cheers -- Rick On November 20, 2017 9:58:54 AM EST, Chirag Garg

OutOfMemoryError in 6.5.1

2017-11-20 Thread Walter Underwood
When I ran load benchmarks with 6.3.0, an overloaded cluster would get super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start getting OOMs. That is really bad, because it means we need to reboot every node in the cluster. Also, the JVM OOM hook isn’t running the process

Re: Error when indexing EML files in Solr 7.1.0

2017-11-20 Thread Zheng Lin Edwin Yeo
Hi, Any updates regarding the error? Regards, Edwin On 16 November 2017 at 10:21, Zheng Lin Edwin Yeo wrote: > Hi Karthik, > > Thanks for the update. > > I see from the JIRA that it is still unresolved, meaning we can't index > EML files to Solr 7.1.0 for the time

Re: Trailing wild card searches very slow in Solr

2017-11-20 Thread Sundeep T
Hi Erick, Thanks for the reply. Here are more details on our setup - *Setup/schema details -* 100 million doc solr core String field (not tokenized) is docValues=true, indexed=true and stored=true Field is almost unique in the index, around 80 million are unique no commits on index all

Re: Deep Paging with cursorMark throws error

2017-11-20 Thread Webster Homer
As I suspected this was a bug in my code. We use KIE Drools to configure our queries, and there was a conflict between two rules. On Mon, Nov 20, 2017 at 4:09 PM, Webster Homer wrote: > I am developing an application that uses cursorMark deep paging. It's a > java client

Re: Issue facing with spell text field containing hyphen

2017-11-20 Thread Chirag garg
Hi Rick, Actually my spell field also contains text with hyphen i.e. it contains "spider-man" even then also i am not able to search it. Regards, Chirag -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: OutOfMemoryError in 6.5.1

2017-11-20 Thread Bernd Fehling
Hi Walter, you can check if the JVM OOM hook is acknowledged by JVM and setup in the JVM. The options are "-XX:+PrintFlagsFinal -version" You can modify your bin/solr script and tweak the function "launch_solr" at the end of the script. Replace "-jar start.jar" with "-XX:+PrintFlagsFinal

Re: Solr regex phrase query syntax

2017-11-20 Thread Mikhail Khludnev
Hello, Chuming. It doesn't. The closest thing is to create TermAutomatonQuery. On Mon, Nov 20, 2017 at 11:03 PM, Chuming Chen wrote: > Hi All, > > According to http://lucene.apache.org/core/7_1_0/core/org/apache/lucene/ > util/automaton/RegExp.html. Lucene supports repeat

Re: Trailing wild card searches very slow in Solr

2017-11-20 Thread Erick Erickson
Well, define "slow". Conceptually a large OR clause is created that contains all the terms that start with the indicated text. (actually a PrefixQuery should be formed). That said, I'd expect hello* to be reasonably fast as not many terms _probably_ start with 'hello'. Not the same at all for,

Re: How to get a solr core to persist

2017-11-20 Thread Amanda Shuman
Hi Shawn, I did as you suggested and created the core by hand - I copied the files from the existing core, including the index files (data directory) and changed the core.properties file to the new core name (core_new) and restarted. Now I'm having a different issue - it says it is Optimized but

Re: A problem of tracking the commits of Lucene using SHA num

2017-11-20 Thread TOM
Dear Shawn and Chris, Thanks very much for your replies and helps. And so sorry for my mistakes of first-time use of Mailing Lists. On 11/9/2017 5:13 PM, Shawn wrote: > Where did this information originate? My SHA data come from the paper On the Naturalness of Buggy Code(Baishakhi Ray, et al.

Re: Help with complex boolean search queries

2017-11-20 Thread Gajendra Dadheech
Hey Ankit, Try this tool for a better view of your debug output, and then if you have any specific question, do let me know : http://splainer.io/ On Sun, Oct 29, 2017 at 2:34 AM, Ankit Shah wrote: > Hi, > I am new to the solr community, and have this weird problem with