Re: FreeTextSuggester throwing error "token must not contain separator byte"

2017-07-24 Thread govind nitk
Hi Angel, please share the freesuggester defined in the config. I guess you might have mentioned whitespace as separator in the freesuggester definition as : Which is creaing the trouble. On Tue, Jul 25, 2017 at 9:01 AM, Erick Erickson wrote: > The shingle filter

Re: FreeTextSuggester throwing error "token must not contain separator byte"

2017-07-24 Thread Erick Erickson
The shingle filter may use space as the separator between shingles that it generates. The admin/ analysis page is your friend. On Jul 24, 2017 2:45 PM, "Angel Todorov" wrote: > Hi Rick, > > Yep, that's really weird, because I am using the StandardTokenizerFactory, > which

Lucene index corruption and recovery

2017-07-24 Thread Putul S
While trying to upgrade 100G index from Solr 4 to 5, check index (actually updater) indicates that the index is corrupted. Hence, I ran check index to fix the index which showed broken segment warning and then deleted those documents. I then ran index update on the fixed index which upgraded fine

Re: index version - replicable versus searching

2017-07-24 Thread Erick Erickson
Actually, I'm surprised that the slave returns the new document and I suspect that there's actually a commit on the master, but no new searcher is being opened. On replication, the slave copies all _closed_ segments from the master whether or not they have been opened for searching. Hmmm, a

index version - replicable versus searching

2017-07-24 Thread Stanonik, Ronald
I'm testing replication on solr 5.5.0. I set up one master and one slave. The index versions match; that is, master(replicable), master(searching), and slave(searching) are the same. I make a change to the index on the master, but do not commit yet. As expected, the version master(replicable)

Re: FreeTextSuggester throwing error "token must not contain separator byte"

2017-07-24 Thread Angel Todorov
Hi Rick, Yep, that's really weird, because I am using the StandardTokenizerFactory, which is supposed to remove whitespace. Also tried the WhitespaceTokenizerFactory. I'll have a look at other analyzers or if nothing works maybe implement my own. I am using a Shingle filter right after the

Re: FreeTextSuggester throwing error "token must not contain separator byte"

2017-07-24 Thread Rick Leir
Angel, The 20 byte is an ASCII space character, which is a separator in most contexts. Breaking the buffer at spaces, you can see 6 non-space tokens. Have a look at your analysis chain and see why you are getting this. Cheers -- Rick On July 24, 2017 4:27:00 PM EDT, Angel Todorov

FreeTextSuggester throwing error "token must not contain separator byte"

2017-07-24 Thread Angel Todorov
Hi guys, I am trying to setup the FreeTextSuggester/ Lookup Factory in a suggester definition in SOLR. Unfortunately while the index is building, I am encountering the following errors: *"msg":"tokens must not contain separator byte; got token=[30 20 30 20 32 20 72 20 61 6c 6c 65 6e 20 72] but

Re: How to use javacc with QueryParser.jj

2017-07-24 Thread Nawab Zada Asad Iqbal
I guess, I finally found the answer here: http://codegouge.blogspot.com/2014/01/modifying-solr-queryparser.html " If you're doing development in Solr trunk and want to adjust the QueryParser, take a look at the JavaCC grammar file at

How to use javacc with QueryParser.jj

2017-07-24 Thread Nawab Zada Asad Iqbal
[Subject changed for reposting] Good morning, If I want to change something in the lucene-solr/solr/core/src/java /org/apache/solr/parser/QueryParser.jj, what is the workflow to generate the new Java code? Thanks Nawab On Fri, Jul 21, 2017 at 7:33 PM, Nawab Zada Asad Iqbal

Re: mm = 1 and multi-field searches (update)

2017-07-24 Thread Michael Joyner
We are using qf= as in: QF: plain_abstract_en^0.1 plain_abstract_text_general^0.5 plain_abstract_text_ws^2 plain_subhead_text_ws^2 plain_subhead_text_general^0.5 plain_subhead_en^0.1 plain_title_text_ws^2 plain_title_text_general^0.5 plain_title_en^0.1 keywords_text_ws^2

Re: LambdaMART XML model to JSON

2017-07-24 Thread Ryan Yacyshyn
Hi Alessandro, Ok no prob. The script-based approach seems to work just fine for me right now! Ryan On Mon, 24 Jul 2017 at 18:56 alessandro.benedetti wrote: > hi Ryan, > the issue you mentioned was mine : > https://sourceforge.net/p/lemur/feature-requests/144/ > > My

Re: atomic updates in conjunction with optimistic concurrency

2017-07-24 Thread Susheel Kumar
luceneMatchVersion is something that cause differences. You may want to try 6.3 version to be on same page as Amrit's code. On Fri, Jul 21, 2017 at 6:08 PM, Hendrik Haddorp wrote: > Thanks for trying to reproduce my issue. > > I'm using a Solr Cloud, my collection was

Re: Graph Visualizing tool

2017-07-24 Thread mganeshs
Hi, Thanks for suggestion. But my csv is based on the documents which has node_id and edges in the same document. But the tool which you suggested looks like asking for two different entries for nodes separately and edges separately. My documents looks like this node_id, in_edges_ss ( multi

Re: Apache Solr 4.10.x - Collection Reload times out

2017-07-24 Thread alessandro.benedetti
1) nope, no big tlog or replaying problem 2) Solr just seem freezed. Not responsive and nothing in the log. Now I just tried just to restart after the Zookeeper config deploy and on restart the log complety freezes and the instances don't come up... If I clean the indexes and then start, this

Re: LambdaMART XML model to JSON

2017-07-24 Thread alessandro.benedetti
hi Ryan, the issue you mentioned was mine : https://sourceforge.net/p/lemur/feature-requests/144/ My bad It got lost in sea of "To Dos" . I still think it could be a good contribution to the library, but at the moment I think going with a custom script/app to do the transformation is the way to

Re: DIH, multiple sources, cores and search: single core with multiple entities or single core per source with search across multiple cores?

2017-07-24 Thread Rick Leir
Giovanni, Start with your search results page and work back from there. Decide what fields you want to display in a results page, then plan for your Solr document to contain all these fields. Now you will need a program to ingest the data from whatever database, and create documents for Solr.

DIH, multiple sources, cores and search: single core with multiple entities or single core per source with search across multiple cores?

2017-07-24 Thread Giovanni De Stefano
Hello guys, I need to index content coming from different sources (db, filesystems, …). Those sources share most fields, only a few are specific to the source. Content coming from different sources changes at different rates. Some sources will generate hundreds of thousands of documents, some

Re: LambdaMART XML model to JSON

2017-07-24 Thread Ryan Yacyshyn
Here's something that'll create a JSON model that can be directly uploaded into Solr: https://github.com/ryac/lambdamart-xml-to-json It'll map the feature IDs to the names found in the feature-store as well. I had this error when uploading model: Model type does not exist