Limit indexed documents.
Hello i have a few questions for indexing data. Existing some hardware or software limits for indexing data? And is some maximum of indexed documents? Thanks for your answers. -- View this message in context: http://lucene.472066.n3.nabble.com/Limit-indexed-documents-tp4212913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenizer or Filter ?
I just used Solr UI Analyzer for my test, or must i indexed it firstly? I used this XML code in my schema: fieldType name=direction1 class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=lt;d1gt;.*lt;/d1gt; replacement=/ tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType This is my result: http://lucene.472066.n3.nabble.com/file/n4179496/dir1.png -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179496.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenizer or Filter ?
Jack, thanks for help, but if i used PatternReplaceCharFilterFactory for example for this : d1text d1/d1d2text d2/d2d1text d1/d1d2text 2 ok/d2 then at output i only get segment d2text 2 ok/d2 when is d2 text d2/d2 between marks d1 ./d1.d2.../d2 d1.../d1so the filter probably takes only first d1 and last d1 and if is something between it so the filter it don't skip it and replace it by space too, when i set at replacement space. So not better used the update processor ? If you are described it well in your book then i will buy it. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179477.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenizer or Filter ?
Oh yeah, that is it. Thank you very much for your patience. And a last question at the end what type regEx Solr actually using ? POSIX or PCRE ? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179505.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenizer or Filter ?
Thanks Jack for your advice. Can you please explain me little more, how it works? From Apache Wiki it's not to clear for me. I can write some javaScript code when i want filtering some data ? In this case i have d1bla bla bla/d1 d2 bla bla bla /d2 d1bla bla bla /d1 and i want filtering d2 bla bla bla /d2, But in other case i want filtering all d1 /d1 then i suppose i used it at indexed data and filtering from them? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4179173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tokenizer or Filter ?
I'm used the same regex and it doesn't work unfortunately. Or should I somehow change the regex? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346p4178389.html Sent from the Solr - User mailing list archive at Nabble.com.
Tokenizer or Filter ?
Hello, i have a question what i have to use tokenizer or filter ? I need separate 2 chanels. I wrote this here earlier, but realize it with solr basic tools it is not probably possible. And i',m trying to write own tool for this task. I have this input d1Hello/d1d2Hello/d2d1How are you ?/d1d2Fine and you're?/d2 d1 - direction1 d2 - direction2 and i want to output only d1 and between this result search some words, for example output should be: Output: [d1Hello/d1,d1How are you?/d1d1/d1] I wrote my idea in java, but i dont know where to incorporate it. If to Filter or Tokenizer and some advices how to start? I probably must extends some lucene library and include it easily modificated there isn't it ? Here is my code: package test1; import java.util.Arrays; public class Test1 { public static void main(String[] args) { String dialogue = d1Hello/d1d2Hello/d2d1How are you ?/d1d2Fine and you're?/d2 ; String[] input = dialogue.split((?=/d[12])\\d*(?=d[12])); int countD1 = 0; for (String input1 : input) { if (input1.startsWith(d1)) { countD1++; } } String [] d1 = new String[countD1]; int array = 0; for (String input1 : input) { if (input1.startsWith(d1)) { d1[array] = input1; array++; } } String d1Out = Arrays.toString(d1); System.out.println(d1Out); //Return s1Out } } Thanks for you advices. -- View this message in context: http://lucene.472066.n3.nabble.com/Tokenizer-or-Filter-tp4178346.html Sent from the Solr - User mailing list archive at Nabble.com.
Differentiate direction.
Hello, is possible differentiate direction in one field? I have a interview and i have there a tags d1Talking first person/d1 d2Talking second person/d2d1First person/d1d2Second person/d2 etc. When i want search olny reply from first person. Must i split on more fields, or should i use some delimiter by d1.../d1, or any other solutions? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Differentiate-direction-tp4174963.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Design optimal Solr Schema
Thanks for help, but how wrote Alex, I used synonm filter and it is what i want. When i wrote to synonym for example Hello, Hi. And sentence is Hello how are you and my query is Hi how are you, so that find it too. -- View this message in context: http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173690.html Sent from the Solr - User mailing list archive at Nabble.com.
Alternative synonymum
Hello, i want to searching in between transcripts of phone conversations. And the machine which is make transcript the conversation to text is making some alternatives. For example If we have sentence. Hello how are you. 1. Segment Hello Halo Hollow 2.Segment How Bow When i want for example search Halo how are you. So i for this example use synonym filter. For Hello set alternatives, Halo, Hollow ... It works, but if is at next segments the same word with other alternatives, for example How, Know, and i give it to synonym filter too at new line, then it now have word How all alternatives How, Know, Bow and if i search Hello Know, that found the sentence where is not Bow between alternatives too. In this case found the example sentence Hello how are you. First sentence has at word how alternative bow, but from the next alternative word is save value know too. Is possible treat this case, for example by the segments, when i know at 1 segment are specific words, use to in synonym. And at the further positions is the same but with other segment number. Thanks, i hope so you understand me, what i think. -- View this message in context: http://lucene.472066.n3.nabble.com/Alternative-synonymum-tp4173694.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Design optimal Solr Schema
Oh no, i want to answered to this topic, where you help me with the synonym filter: http://lucene.472066.n3.nabble.com/Alternative-searching-td4172339.html but i was opened this topic too and i checking my answer in google translator and copy it here. Now, i have a edit task, i do not have to search to specific time, but only in phrase, but with alternatives. Synonym filter is good idea, but if i have at specific word in more cases more altenatives, thats it the problem what i now dealing. I asked in this topic: http://lucene.472066.n3.nabble.com/Alternative-synonymum-td4173694.html Sorry for chaos. -- View this message in context: http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4173748.html Sent from the Solr - User mailing list archive at Nabble.com.
Alternative searching
Hello, is possible searching by Solr search alternative words from some field? For example if i want search some phrase from range: At first position i want to have probably in one field hello,hi,cheerio. At second my At third name At fourth is At fifth Tomas, John, Paul. And if i send query My Tomas~3 / My John~3 / Hi Paul~4 So that always will find the required query. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Alternative-searching-tp4172339.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Alternative searching
Ok and how do you think how i get data into to fields? And how it recognize so how it is one term? -- View this message in context: http://lucene.472066.n3.nabble.com/Alternative-searching-tp4172339p4172349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Alternative searching
Its ok, when i use the example by synonym filter, so it wokrs, but i don´t know how i have transfer this text to the schema. -- View this message in context: http://lucene.472066.n3.nabble.com/Alternative-searching-tp4172339p4172356.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Alternative searching
I think something like that: First Position Second Position Third Position Fourth Position Fift Position - -- --- Hello MyName Is Paul Hi Tomas Cherio John And it is like one sentence, and i thinks so it don¨t bee in more docs. -- View this message in context: http://lucene.472066.n3.nabble.com/Alternative-searching-tp4172339p4172361.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Design optimal Solr Schema
Thanks for your help. Ok i try it explain one more, sorry for my english. I need to some functions in my searching. 1.) I will have a lot of documents naturally and i want find out if is for example is phrase for example to 5 words apart. I used w:Good morning~5. (in example solr it works, but i don't know how do it at my project). 2.) Find some word(phrase) to a certain time, for example Good morning to time 5.25 3.) And if it is possible order of the words. I'm using solarium client for highlight and I want to highlight words in this order Hello How Are you for example, then in this field are words *hello* you are * how are you* and if the searching word is not in order, then skip it, but it not necessary, primary i have problem with first 2 points. How i make ideal schema and parse data for source file. I've done some demo with basic searching in one page i have form and results are links at files by id (i have id as filename) and when i clicked at link i set a parameter query and in result page i get a necessary data for display result. And result file is table with all rewrite interview whit highlighted results . Thanks for help. -- View this message in context: http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4166793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Design optimal Solr Schema
Oh yes, i want to display stored data in html file. I have 2 pages, at one page is form and i show here results. Result here is link (by ID) at file where is all conversation in second page. And how did you mean sepparate each conversation interaction ? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632p4166805.html Sent from the Solr - User mailing list archive at Nabble.com.
Design optimal Solr Schema
Hello i have problem with design of schema in Solr. I have a transcript of a telephone conversation in this format. I parse it at individual fields. I have this schema: ?xml version=1.0? add doc field name=id01.cn/field field name=t0br / 1br / 2br / 2 br / 3 br / /field field name=st0.00br / 1.54br / 1.54br / 1.54 br / 1.57 br / /field field name=et1.54br / 1.54br / 1.57br / 1.57 br / 1.7 br / /field field name=w_SILENCE_br / sbr / HELLObr / HALLO br / _DELETE_ br / /field field name=p0.00br / 1br / 1br / 2.06115e-009 br / 1 br / /field field name=c0br / 0br / 0br / 0 br / 0 br / /field /doc /add I displayed it in html document, and therefore i used the br /. This is a original document: T=0 ST=0.00 ET=1.54 W=_SILENCE_ P=0.00 C=0 T=1 ST=1.54 ET=1.54 W=s P=1 C=0 T=2 ST=1.54 ET=1.57 W=HELLO P=1 C=0 T=2 ST=1.54 ET=1.57 W=HALLO P=2.06115e-009 C=0 T=3 ST=1.57 ET=1.70 W=_DELETE_ P=1 C=0 T=3 ST=1.57 ET=1.70 W=NO P=2.06115e-009 C=0 T=4 ST=1.70 ET=2.12 W=HOW P=1 C=0 T=5 ST=2.12 ET=2.18 W=ARE_ P=0.25 C=0 T=5 ST=2.12 ET=2.18 W=_DELETE_ P=0.25 C=0 .. .. Id - filename T = Segment ST = Start time ET = End time W = Word P = Probability C = Chanel I want to search for example word which is to time 1.57 (w:HeLLO) AND (t:[0 TO 1.57]). But if i have all data in one field (t, st,et ...) then it doesn't work. It find all files where is hello a further time than 1.57. Do you have any ideas how it make it? Thanks a lot for your help. -- View this message in context: http://lucene.472066.n3.nabble.com/Design-optimal-Solr-Schema-tp4166632.html Sent from the Solr - User mailing list archive at Nabble.com.