from:"Ahmet Arslan"

Re: how to use HTMLStripCharFilter in solrJ?

2018-07-05 Thread Ahmet Arslan

Hi Arturas, 

Here are some things to try :

1) HTMLStripCharFilter stripper = new 
HTMLStripCharFilter(strReader.markSupported() ? strReader : new 
BufferedReader(strReader))

2) Consider using HTML Strip update processor factory. 

3) Create a custom Lucene analyzer using html strip char filter and white space 
tokenizer. Use the "invoking the analyzer" example given in 
http://lucene.apache.org/core/7_4_0/core/org/apache/lucene/analysis/package-summary.html

Ahmet



On Thursday, July 5, 2018, 9:53:58 AM GMT+3, Arturas Mazeika 
 wrote:





Hi Solr Folk,

What would be the easiest way to use some of the Solr and Lucene components
in SolrJ?

I am pretty amazed how much thought and careful engineering went into some
individual components to cover the wild real world effectively. And I
wonder whether one could re-use some of them in other context.

At the bottom, I wanted to strip the HTML code and store the output in solr
(with different reasons behind [0]). I approached the problem
pragmatically: googled with "HTMLStripCharFilter and example", got to [1].
checked which jar I need for that (solr-core), googled for pom dependencies
[2]. and integrated this into my solrj app:

                    StringReader strReader = new StringReader(content);
                    HTMLStripCharFilter stripper = new
HTMLStripCharFilter(new BufferedReader(strReader));
                    StringBuilder o = new StringBuilder();
                    char[] cbuf = new char[1024 * 10];
                    while (true) {
                        int count = stripper.read(cbuf);
                        if (count == -1)
                            break; // end of stream mark is -1
                        if (count > 0)
                            o.append(cbuf, 0, count);
                    }
                    stripper.close();
                    doc.addField("content_stripped", o.toString());


Dependencies were downloaded [3], and if I start the program nothing
happens (I have a feeling that a web server is being started).

Comments?

Cheers,
Arturas

References

[0] Reasons may vary from optimizing highlighting of the text for the end
user to exposing oneself to individual components of solr at the deepest
level, analysis of impact to algorithms like machine learning or data
management

[1]
https://www.programcreek.com/java-api-examples/index.php?api=org.apache.lucene.analysis.charfilter.HTMLStripCharFilter

[2] pom.xml:

  
        
            org.apache.solr
            solr-solrj
            7.3.0
        

        
            org.apache.solr
            solr-core
            7.3.0
        
    

[3]Included Jars:
hppc-0.7.3.jar already exists in destination.
jackson-annotations-2.5.4.jar already exists in destination.
jackson-core-2.5.4.jar already exists in destination.
jackson-databind-2.5.4.jar already exists in destination.
jackson-dataformat-smile-2.5.4.jar already exists in destination.
caffeine-2.4.0.jar already exists in destination.
guava-14.0.1.jar already exists in destination.
protobuf-java-3.1.0.jar already exists in destination.
t-digest-3.1.jar already exists in destination.
commons-cli-1.2.jar already exists in destination.
commons-codec-1.10.jar already exists in destination.
commons-collections-3.2.2.jar already exists in destination.
commons-configuration-1.6.jar already exists in destination.
commons-fileupload-1.3.2.jar already exists in destination.
commons-io-2.5.jar already exists in destination.
commons-lang-2.6.jar already exists in destination.
dom4j-1.6.1.jar already exists in destination.
gmetric4j-1.0.7.jar already exists in destination.
metrics-core-3.2.2.jar already exists in destination.
metrics-ganglia-3.2.2.jar already exists in destination.
metrics-graphite-3.2.2.jar already exists in destination.
metrics-jetty9-3.2.2.jar already exists in destination.
metrics-jvm-3.2.2.jar already exists in destination.
javax.servlet-api-3.1.0.jar already exists in destination.
tools.jar already exists in destination.
joda-time-2.2.jar already exists in destination.
log4j-1.2.17.jar already exists in destination.
eigenbase-properties-1.1.5.jar already exists in destination.
antlr4-runtime-4.5.1-1.jar already exists in destination.
calcite-core-1.13.0.jar already exists in destination.
calcite-linq4j-1.13.0.jar already exists in destination.
avatica-core-1.10.0.jar already exists in destination.
commons-exec-1.3.jar already exists in destination.
commons-lang3-3.6.jar already exists in destination.
commons-math3-3.6.1.jar already exists in destination.
curator-client-2.8.0.jar already exists in destination.
curator-framework-2.8.0.jar already exists in destination.
curator-recipes-2.8.0.jar already exists in destination.
hadoop-annotations-2.7.4.jar already exists in destination.
hadoop-auth-2.7.4.jar already exists in destination.
hadoop-common-2.7.4.jar already exists in destination.
hadoop-hdfs-2.7.4.jar already exists in destination.
htrace-core-3.2.0-incubating.jar already exists in

Re: coord in SolR 7

2018-02-18 Thread Ahmet Arslan

Hi Andreas,

Can weak AND (WAND) be used in your use case?

https://issues.apache.org/jira/browse/LUCENE-8135

Ahmet




On Monday, February 12, 2018, 1:44:38 PM GMT+3, Moll, Dr. Andreas 
 wrote: 





Hi,

I try to upgrade our SolR installation from SolR 5 to 7.
We use a customized similarity class that heavily depends on the coordination 
factor to scale the similarity for OR-queries with multiple terms.
Since SolR 7 this feature has been removed. Is there any hook to implement this 
in our own similarity class with SolR 7?

Best regards

Andreas Moll

Vertraulichkeitshinweis
Diese Information und jeder uebermittelte Anhang beinhaltet vertrauliche 
Informationen und ist nur fuer die Personen oder das Unternehmen bestimmt, an 
welche sie tatsaechlich gerichtet ist. Sollten Sie nicht der 
Bestimmungsempfaenger sein, weisen wir Sie darauf hin, dass die Verbreitung, 
das (auch teilweise) Kopieren sowie der Gebrauch der empfangenen E-Mail und der 
darin enthaltenen Informationen gesetzlich verboten sein kann und 
gegebenenfalls Schadensersatzpflichten ausloesen kann. Sollten Sie diese 
Nachricht aufgrund eines Uebermittlungsfehlers erhalten haben, bitten wir Sie 
den Sender unverzueglich hiervon in Kenntnis zu setzen.
Sicherheitswarnung: Bitte beachten Sie, dass das Internet kein sicheres 
Kommunikationsmedium ist. Obwohl wir im Rahmen unseres Qualitaetsmanagements 
und der gebotenen Sorgfalt Schritte eingeleitet haben, um einen 
Computervirenbefall weitestgehend zu verhindern, koennen wir wegen der Natur 
des Internets das Risiko eines Computervirenbefalls dieser E-Mail nicht 
ausschliessen.

Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Ahmet Arslan



Hi Zheng,

UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps them 
single token.

StandardTokenizer produce two or more tokens for an entity.

Please try them using the analysis page, use which one suits your requirements.

Ahmet



On Friday, November 24, 2017, 11:46:57 AM GMT+3, Zheng Lin Edwin Yeo 
 wrote: 





Hi,

I am indexing email addresses into Solr via EML files. Currently, I am
using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
found that we can also use UAX29URLEmailTokenizerFactory with
LowerCaseFilterFactory.

Does anyone have any recommendation on which Tokenizer is better?

I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.

Regards,
Edwin

Re: get all tokens from TokenStream in my custom filter

2017-11-19 Thread Ahmet Arslan

 
Hi Kumar,
I checked the code base and I couldn't find peek method either. However, I 
found LookaheadTokenFilter that may be useful to you.
I figured that this is a Lucene question and you can receive more answers in 
the Lucene user list.
Ahmet


On Sunday, November 19, 2017, 10:16:21 PM GMT+3, kumar gaurav 
<kg2...@gmail.com> wrote:  
 
 Hi friends 
very much thank you for your replies .
yet i could not solved the problem .
Emir , i need all tokens of query in incrementToken() function not only current 
token .
Modassar , if i am not ending or closing the stream . all tokens is blank and 
only last token is indexed .
Ahmet i could not find peek or advance method :(  

Please help me guys . 

On Fri, Nov 17, 2017 at 10:10 PM, Ahmet Arslan <iori...@yahoo.com> wrote:

 Hi Kumar,
If I am not wrong, I think there is method named something like peek(2) or 
advance(2).Some filters access tokens ahead and perform some logic.
AhmetOn Wednesday, November 15, 2017, 10:50:55 PM GMT+3, kumar gaurav 
<kg2...@gmail.com> wrote:  
 
 Hi

I need to get full field value from TokenStream in my custom filter class .

I am using this

stream.reset();
while (tStream.incrementToken()) {
    term += " "+charTermAttr.toString();
}
stream.end();
stream.close();

this is ending streaming . no token is producing if i am using this .

I want to get full string without hampering token creation .

Eric ! Are you there ? :)  Anyone Please help  ?

Re: get all tokens from TokenStream in my custom filter

2017-11-17 Thread Ahmet Arslan

 Hi Kumar,
If I am not wrong, I think there is method named something like peek(2) or 
advance(2).Some filters access tokens ahead and perform some logic.
AhmetOn Wednesday, November 15, 2017, 10:50:55 PM GMT+3, kumar gaurav 
 wrote:  
 
 Hi

I need to get full field value from TokenStream in my custom filter class .

I am using this

stream.reset();
while (tStream.incrementToken()) {
    term += " "+charTermAttr.toString();
}
stream.end();
stream.close();

this is ending streaming . no token is producing if i am using this .

I want to get full string without hampering token creation .

Eric ! Are you there ? :)  Anyone Please help  ?

Re: Keeping the index naturally ordered by some field

2017-10-01 Thread Ahmet Arslan



Hi Alex,

Lucene has this capability (borrowed from Nutch) under 
org.apache.lucene.index.sorter package.I think it has been integrated into 
Solr, but could not find the Jira issue.

Ahmet
 
 
 On Sunday, October 1, 2017, 10:22:45 AM GMT+3, alexpusch  
wrote: 





Hello,
We've got a pretty big index (~1B small docs). I'm interested in managing
the index so that the search results would be naturally sorted by a certain
numeric field, without specifying the actual sort field in query time.

My first attempt was using SortingMergePolicyFactory. I've found that this
provides only partial success. The results were occasionally sorted, but
overall there where 'jumps' in the ordering.

After some research I've found this excellent  blog post

  
that taught me that TieredMergePolicy merges non consecutive segments, and
thus creating several segments with interlacing ordering. I've tried
replacing the merge policy to LogByteSizeMergePolicy, but results are still
inconsistent.

The post is from 2011, and it's not clear to me whether today
LogByteSizeMergePolicy merges only consecutive segments, or it can merge non
consecutive segments as well.

Is there an approach that will allow me achieve this goal?

Solr version: 6.0

Thanks, Alex.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Help with Query/Function for conditional boost

2017-08-16 Thread Ahmet Arslan

Hi Shamik,

I belive 5-args map function can be used here. Here is a link which may inspire 
you.
http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet





On Wednesday, August 16, 2017, 11:06:28 PM GMT+3, Shamik Bandopadhyay 
 wrote:


Hi,

  I'm trying to create a function to boost dynamically boost a field based
on specific values for a searchable field. Here's an example:

I've the following query fields with default boost.

qf=text^2 title^4 command^8

Also, there's a default boost on the source field.

bq=source:help^10 source:forum^5

Among the searchable fields, command gets the highest preference. To add to
that,I would like to see boost results from source help further when a
query term exists in command field. With my current setting, documents from
forum are appearing at the top when a search term is found in command
field. Increasing the boost to source:help didn't make any difference.

Just wondering if it's possible to write a function which will
conditionally boost command field for documents tagged with source=help

if(termfreq(source,'help'), command^8, command^1)

The above function is just for reference to show what I'm trying to achieve.

Any pointers will be helpful.

Thanks,
Shamik

Re: QueryParser changes query by itself

2017-08-15 Thread Ahmet Arslan

Hi Bernd,

In LUCENE-3758, a new member field added into ComplexPhraseQuery class. But we 
didn't change its hashCode method accordingly. This caused anomalies in Solr, 
and Yonik found the bug and fixed hashCode. Your e-mail somehow reminded me 
this.
Could it be the QueryCache and hashCode method/implementation of Query 
subclasses.
May be your good and bad example is producing same hashCode? And this is 
confusing query cache in solr?
Can you disable the query cache, to test it?
By the way, which query parser are you using? I believe SynonymQuery is 
produced by BM25 similarity, right?

Ahmet


On Friday, August 11, 2017, 2:48:07 PM GMT+3, Bernd Fehling 
 wrote:


We just noticed a very strange problem with Solr 6.4.2 QueryParser.
The QueryParser changes the query by itself from time to time.
This happens if doing a search request reload several times at higher rate.

Good example:
...
textth:waffenhandel
  
...
textth:waffenhandel
textth:waffenhandel
  +SynonymQuery(Synonym(textth:"arms sales" 
textth:"arms trade"...
  +Synonym(textth:"arms sales" textth:"arms 
trade"...


Bad example:
...
textth:waffenhandel
  
...
textth:waffenhandel
textth:waffenhandel
  +textth:rss
  +textth:rss

As you can see in the bad example after several reloads the parsedquery changed 
to term "rss".
But the original querystring has no "rss" substring at all. That is really 
strange.

Anyone seen this before?

Single index, Solr 6.4.2.

Regards
Bernd

Re: RE: Comparison of Solr with Sharepoint Search

2017-08-14 Thread Ahmet Arslan

Hi,
https://manifoldcf.apache.org is used to crawl content from SharePoint and 
index into Solr.

Ahmet


On Monday, August 14, 2017, 9:05:20 PM GMT+3, jmahuang  
wrote:


Sir,

Can SOLR search existing SharePoint document libraries and lists?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Comparison-of-Solr-with-Sharepoint-Search-tp498534p4350502.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Token "states" not getting lemmatized by Solr?

2017-08-10 Thread Ahmet Arslan

Hi Omer,
Your analysis chain does not include a stem filter (lemmatizer)
Assuming you are dealing with English text, you can use KStemFilterFactory or 
SnowballFilterFactory.
Ahmet


On Thursday, August 10, 2017, 9:33:08 PM GMT+3, OTH  
wrote:


Hi,

Regarding 'analysis chain':

I'm using Solr 6.4.1, and in the managed-schema file, I find the following:

    
      
      
      
    
    
      
      
      
      
    
  


Regarding the Admin UI >> Analysis page:  I just tried that, and to be
honest, I can't seem to get much useful info out of it, especially in terms
of lemmatization.

For example, for any text I enter in it to "analyse", all it does is seem
to tell me which analysers (if that's the right term?) are being used for
the selected field / fieldtype, and for each of these analyzers, it would
give some very basic info, like text, raw_bytes, etc.  Eg, for the input
"united" in the "field value (index)" box, having "text_general" selected
for fieldtype, all I get is this:

ST
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
SF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
LCF
text
raw_bytes
start
end
positionLength
type
position
united
[75 6e 69 74 65 64]
0
6
1

1
Placing the mouse cursor on "ST", "SF", or "LCF" shows a tooltip saying
"org.apache.lucene.analysis.standard.StandardTokenizer",
"org...core.StopFilter", and "org...core.LowerCaseFilter", respectively.


So - should 'states' not be lemmatized to 'state' using these settings?
(If not, then I would need to figure out how to use a different lemmatizer)

Thanks

On Thu, Aug 10, 2017 at 10:28 PM, Erick Erickson 
wrote:

> saying the field is "text_general" is not sufficient, please post the
> analysis chain defined in your schema.
>
> Also the admin UI>>analysis page will help you figure out exactly what
> part of the analysis chain does what.
>
> Best,
> Erick
>
> On Thu, Aug 10, 2017 at 8:37 AM, OTH  wrote:
> > Hello,
> >
> > It seems for me that the token "states" is not getting lemmatized to
> > "state" by Solr.
> >
> > Eg, I have a document with the value "united states of america".
> > This document is not returned when the following query is issued:
> > q=name:state^1+name:america^1+name:united^1
> > However, all documents which contain the token "state" are indeed
> returned,
> > with the above query.
> > The "united states of america" document is returned if I change "state"
> in
> > the query to "states"; so:
> > q=name:states^1+name:america^1+name:united^1
> >
> > At first I thought maybe the lemmatization isn't working for some reason.
> > However, when I changed "united" in the query to "unite", then it did
> still
> > return the "united states of america" document:
> > q=name:states^1+name:america^1+name:unite^1
> > Which means that the lemmatization is working for the token "united", but
> > not for the token "states".
> >
> > The "name" field above is defined as "text_general".
> >
> > So it seems to me, that perhaps the default Solr lemmatizer does not
> > lemmatize "states" to "state"?
> > Can anyone confirm if this is indeed the expected behaviour?
> > And what can I do to change it?
> > If I need to put in a customer lemmatizer, then what would be the (best)
> > way to do that?
> >
> > Much thanks
> > Omer
>

Re: Indexing a CSV that contains double quotes

2017-08-07 Thread Ahmet Arslan

Hi Devon,
I mean this:
curl 'http://10.0.1.24:8983/solr/products/update?commit=true=;' 
--data-binary @solrItmList.csv -H 'Content-type:application/csv'
Ahmet

On Monday, August 7, 2017, 9:00:13 PM GMT+3, O'Shaughnessy, Devon 
<dev...@ulfoods.com> wrote:


 
 
 
Hi Ahmet,




I'm afraid I don't understand, do you think you could clarify a little bit?




Thanks,




 
 
 

 

 Devon O'Shaughnessy

Developer/Analyst

 Upper Lakes Foods

 p: 800.879.1265 | ext: 4135

 w: upperlakesfoods.com

  

 



 

 

 



 
From: Ahmet Arslan <iori...@yahoo.com.INVALID>
Sent: Monday, August 7, 2017 12:07:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing a CSV that contains double quotes 
 


 
Hi Devon,
I think you need to supply encapsulator=" parameter-value pair.
Ahmet


On Monday, August 7, 2017, 7:57:45 PM GMT+3, O'Shaughnessy, Devon 
<dev...@ulfoods.com> wrote:


  
 
 


Hello all,




I'm pretty new at Solr, having only worked with in a couple weeks, and I'm 
guessing I'm having a newbie problem of some sort. I'm a little confused about 
how Solr works with double quotes within strings. I'm uploading a CSV to Solr 
once a day containing some item data, some of which contains quotes, and I'm 
getting some errors. I'll do my best to explain my problem.




Here is my schema:




  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  




The command I am using to update the data:




curl 'http://10.0.1.24:8983/solr/products/update?commit=true' --data-binary 
@solrItmList.csv -H 'Content-type:application/csv'




This is the error I recieve in response:










If for some reason the image doesn't show, it's an XML response indicating an 
IOException with the message "CSVLoader: input=null, line=2014, can't read 
line: 2013 values={NO LINES AVAILABLE} with a code of 400.




Is the solr.log file, the java.io.IOException is explained further:




"(line 2013) invalid char between encapsulated token end delimiter"




Here is an example of my data that is coming from the CSV that is giving me 
trouble.




(Headings at the top of the CSV)

Item Number,Item Description,Item Combined,Item Status,Item Cat1,Cat1 
Description,Item Cat2,Cat2 Description,Item Cat3,Cat3 Description,Keywords




(Specific entry that Solr stops at.)

152600,YOGURT "PARFAIT PRO" LF,152600 YOGURT "PARFAIT PRO" 
LF,A,1002,Dairy,2231,Yogurt,11408,Yogurt Bulk,"PARFAIT INC FAT FOODS FREE GF 
GLUTEN INC LOW MILL MILLS PARFAIT PRO PRO" SMART SNACK VANILLA VAQNILLA YOGURT




Notice the double quotes in Item Description, Item Combined, and Keywords.




So the strange this is, if I remove the Keywords field from the schema and 
generate a CSV that does not include the Keywords data, but otherwise make no 
other changes, the data is able to load just fine, even though there are still 
double quotes in the Item Description and Item Combine fields.




I know there shouldn't be any double quotes in the data, which I am working on 
getting rectified, but I'm just wondering: why is this an issue with one of my 
fields but not others, seeing as they have the same data type?










Wow, this email ended up really long for such a simple question! Any 
enlightenment would be much appreciated.




Thanks,







 
 
 

 

 Devon O'Shaughnessy

Developer/Analyst

 Upper Lakes Foods

 p: 800.879.1265 | ext: 4135

 w: upperlakesfoods.com

Re: Indexing a CSV that contains double quotes

2017-08-07 Thread Ahmet Arslan

Hi Devon,
I think you need to supply encapsulator=" parameter-value pair.
Ahmet


On Monday, August 7, 2017, 7:57:45 PM GMT+3, O'Shaughnessy, Devon 
 wrote:


  
 
 


Hello all,




I'm pretty new at Solr, having only worked with in a couple weeks, and I'm 
guessing I'm having a newbie problem of some sort. I'm a little confused about 
how Solr works with double quotes within strings. I'm uploading a CSV to Solr 
once a day containing some item data, some of which contains quotes, and I'm 
getting some errors. I'll do my best to explain my problem.




Here is my schema:




  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  

  




The command I am using to update the data:




curl 'http://10.0.1.24:8983/solr/products/update?commit=true' --data-binary 
@solrItmList.csv -H 'Content-type:application/csv'




This is the error I recieve in response:










If for some reason the image doesn't show, it's an XML response indicating an 
IOException with the message "CSVLoader: input=null, line=2014, can't read 
line: 2013 values={NO LINES AVAILABLE} with a code of 400.




Is the solr.log file, the java.io.IOException is explained further:




"(line 2013) invalid char between encapsulated token end delimiter"




Here is an example of my data that is coming from the CSV that is giving me 
trouble.




(Headings at the top of the CSV)

Item Number,Item Description,Item Combined,Item Status,Item Cat1,Cat1 
Description,Item Cat2,Cat2 Description,Item Cat3,Cat3 Description,Keywords




(Specific entry that Solr stops at.)

152600,YOGURT "PARFAIT PRO" LF,152600 YOGURT "PARFAIT PRO" 
LF,A,1002,Dairy,2231,Yogurt,11408,Yogurt Bulk,"PARFAIT INC FAT FOODS FREE GF 
GLUTEN INC LOW MILL MILLS PARFAIT PRO PRO" SMART SNACK VANILLA VAQNILLA YOGURT




Notice the double quotes in Item Description, Item Combined, and Keywords.




So the strange this is, if I remove the Keywords field from the schema and 
generate a CSV that does not include the Keywords data, but otherwise make no 
other changes, the data is able to load just fine, even though there are still 
double quotes in the Item Description and Item Combine fields.




I know there shouldn't be any double quotes in the data, which I am working on 
getting rectified, but I'm just wondering: why is this an issue with one of my 
fields but not others, seeing as they have the same data type?










Wow, this email ended up really long for such a simple question! Any 
enlightenment would be much appreciated.




Thanks,







 
 
 

 

 Devon O'Shaughnessy

Developer/Analyst

 Upper Lakes Foods

 p: 800.879.1265 | ext: 4135

 w: upperlakesfoods.com

Re: Highlighting words with special characters

2017-07-19 Thread Ahmet Arslan

Hi,
Maybe name of the UAX29URLEMailTokenizer is deceiving you?It does *not* 
tokenize URLs and Emails. Actually it recognises them and emits them as a 
single token.
Ahmet

On Wednesday, July 19, 2017, 12:00:05 PM GMT+3, Lasitha Wattaladeniya 
 wrote:

Update,

I changed the UAX29URLEmailTokenizerFactory to StandardTokenizerFactory and
now it shows highlighted text fragments in the indexed email text.

But I don't understand this behavior. Can someone shed some light please

On 18 Jul 2017 14:18, "Lasitha Wattaladeniya"  wrote:

> Further more, ngram field has following tokenizer/filter chain in index
> and query
>
> UAX29URLEmailTokenizerFactory (only in index)
> stopFilterFactory
> LowerCaseFilterFactory
> ASCIIFoldingFilterFactory
> EnglishPossessiveFilterFactory
> StemmerOverrideFilterFactory (only in query)
> NgramTokenizerFactory (only in index)
>
> Regards,
> Lasitha
>
> On 18 Jul 2017 14:11, "Lasitha Wattaladeniya"  wrote:
>
>> Hi devs,
>>
>> I have setup solr highlighting with default setup (only changed the
>> fragsize to 0 to match any field length). It worked fine but recently I
>> discovered it doesn't highlight for words with special characters in the
>> middle.
>>
>> For an example, let's say I have indexed email address test.f...@ran.com
>> to a ngram field. And when I search for the partial text fsdg, I get the
>> results but it's not highlighted. It works in all other scenarios as
>> expected.
>>
>> The ngram field has termVectors, termPositions, termOffsets set to true.
>>
>> Can somebody please suggest me, what may be wrong here?
>>
>> (sorry for the unstructured text. Typed using a mobile phone )
>>
>> Regards
>> Lasitha
>>
>

Re: Solr Analyzer for Vietnamese

2017-07-13 Thread Ahmet Arslan


Hi Eirik,
I believe "icu tokenizer" does a decent job on text written in non-alphabets.
Ahmet

On Monday, May 22, 2017, 10:32:22 AM GMT+3, Eirik Hungnes 
 wrote:


Hi,

There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
to Lucene at the moment. Does anyone know if something like this exists
today or is planned for? We found this
https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not sure
if it's up to date. Any info highly appreciated!

Thanks,

Eirik

Re: How to get field names of dynamic field

2017-04-14 Thread Ahmet Arslan

Hi Midas,

LukeRequestHandler shows that information.

Ahmet

On Friday, April 14, 2017, 1:16:09 PM GMT+3, Midas A  
wrote:
Actually , i am looking for APi

On Fri, Apr 14, 2017 at 3:36 PM, Andrea Gazzarini  wrote:

> I can see those names in the "Schema  browser" of the admin UI, so I guess
> using the (lucene?) API it shouldn't be hard to get this info.
>
> I don' know if the schema api (or some other service) offer this service
>
> Andrea
>
> On 14 Apr 2017 10:03, "Midas A"  wrote:
>
> > Hi,
> >
> >
> > Can i get all the field created for dynamic field in solr .
> >
> > Like
> > my dynamic field is by_*
> >
> > and i have index
> > by_color
> > by_size ..
> > etc
> >
> > I want to retrieve all these field name .
> > Is there any way to do this  based on some query
> >
>

Re: KeywordTokenizer and multiValued field

2017-04-12 Thread Ahmet Arslan

I don't understand the first option, what is each value? Keyword tokenizer 
emits single token, analogous to string type.



On Wednesday, April 12, 2017, 7:45:52 PM GMT+3, Walter Underwood 
 wrote:
Does the KeywordTokenizer make each value into a unitary string or does it take 
the whole list of values and make that a single string?

I really hope it is the former. I can’t find this in the docs (including 
JavaDocs).

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan

Hi,
I cannot find it. However it should be something like 
q=hello={!frange l=0.5}query($q)

Ahmet
On Wednesday, April 12, 2017, 10:07:54 PM GMT+3, Ahmet Arslan 
<iori...@yahoo.com.INVALID> wrote:
Hi David,
A function query named "query" returns the score for the given subquery. 
Combined with frange query parser this is possible. I tried it in the past.I am 
searching the original post. I think it was Yonik's post.
https://cwiki.apache.org/confluence/display/solr/Function+Queries

Ahmet

On Wednesday, April 12, 2017, 9:45:17 PM GMT+3, David Kramer 
<david.kra...@shoebuy.com> wrote:
The idea is to not return poorly matching results, not to limit the number of 
results returned.  One query may have hundreds of excellent matches and another 
query may have 7. So cutting off by the number of results is trivial but not 
useful.

Again, we are not doing this for performance reasons. We’re doing this because 
we don’t want to show products that are not very relevant to the search terms 
specified by the user for UX reasons.

I had hoped that the responses would have been more focused on “it’ can’t be 
done” or “here’s how to do it” than “you don’t want to do it”.  I’m still left 
not knowing if it’s even possible. The one concrete answer of using frange 
doesn’t help as referencing score in either the q or the fq produces an 
“undefined field” error.

Thanks.

On 4/11/17, 8:59 AM, "Dorian Hoxha" <dorian.ho...@gmail.com> wrote:

    Can't the filter be used in cases when you're paginating in
    sharded-scenario ?
    So if you do limit=10, offset=10, each shard will return 20 docs ?
    While if you do limit=10, _score<=last_page.min_score, then each shard will
    return 10 docs ? (they will still score all docs, but merging will be
    faster)

    Makes sense ?

    On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti <a.benede...@sease.io
    > wrote:

    > Can i ask what is the final requirement here ?
    > What are you trying to do ?
    >  - just display less results ?
    > you can easily do at search client time, cutting after a certain amount
    > - make search faster returning less results ?
    > This is not going to work, as you need to score all of them as Erick
    > explained.
    >
    > Function query ( as Mikhail specified) will run on a per document basis (
    > if
    > I am correct), so if your idea was to speed up the things, this is not
    > going
    > to work.
    >
    > It makes much more sense to refine your system to improve relevancy if 
your
    > concern is to have more relevant docs.
    > If your concern is just to not show that many pages, you can limit that
    > client side.
    >
    >
    >
    >
    >
    >
    > -
    > ---
    > Alessandro Benedetti
    > Search Consultant, R Software Engineer, Director
    > Sease Ltd. - www.sease.io
    > --
    > View this message in context: http://lucene.472066.n3.
    > nabble.com/Filtering-results-by-minimum-relevancy-score-
    > tp4329180p4329295.html
    > Sent from the Solr - User mailing list archive at Nabble.com.
    >

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Ahmet Arslan

Hi David,
A function query named "query" returns the score for the given subquery. 
Combined with frange query parser this is possible. I tried it in the past.I am 
searching the original post. I think it was Yonik's post.
https://cwiki.apache.org/confluence/display/solr/Function+Queries

Ahmet

On Wednesday, April 12, 2017, 9:45:17 PM GMT+3, David Kramer 
 wrote:
The idea is to not return poorly matching results, not to limit the number of 
results returned.  One query may have hundreds of excellent matches and another 
query may have 7. So cutting off by the number of results is trivial but not 
useful.

Again, we are not doing this for performance reasons. We’re doing this because 
we don’t want to show products that are not very relevant to the search terms 
specified by the user for UX reasons.

I had hoped that the responses would have been more focused on “it’ can’t be 
done” or “here’s how to do it” than “you don’t want to do it”.  I’m still left 
not knowing if it’s even possible. The one concrete answer of using frange 
doesn’t help as referencing score in either the q or the fq produces an 
“undefined field” error.

Thanks.

On 4/11/17, 8:59 AM, "Dorian Hoxha"  wrote:

    Can't the filter be used in cases when you're paginating in
    sharded-scenario ?
    So if you do limit=10, offset=10, each shard will return 20 docs ?
    While if you do limit=10, _score<=last_page.min_score, then each shard will
    return 10 docs ? (they will still score all docs, but merging will be
    faster)

    Makes sense ?

    On Tue, Apr 11, 2017 at 12:49 PM, alessandro.benedetti  wrote:

    > Can i ask what is the final requirement here ?
    > What are you trying to do ?
    >  - just display less results ?
    > you can easily do at search client time, cutting after a certain amount
    > - make search faster returning less results ?
    > This is not going to work, as you need to score all of them as Erick
    > explained.
    >
    > Function query ( as Mikhail specified) will run on a per document basis (
    > if
    > I am correct), so if your idea was to speed up the things, this is not
    > going
    > to work.
    >
    > It makes much more sense to refine your system to improve relevancy if 
your
    > concern is to have more relevant docs.
    > If your concern is just to not show that many pages, you can limit that
    > client side.
    >
    >
    >
    >
    >
    >
    > -
    > ---
    > Alessandro Benedetti
    > Search Consultant, R Software Engineer, Director
    > Sease Ltd. - www.sease.io
    > --
    > View this message in context: http://lucene.472066.n3.
    > nabble.com/Filtering-results-by-minimum-relevancy-score-
    > tp4329180p4329295.html
    > Sent from the Solr - User mailing list archive at Nabble.com.
    >

Re: Filtering results by minimum relevancy score

2017-04-10 Thread Ahmet Arslan

Hi,
I remember that this is possible via frange query parser.But I don't have the 
query string at hand.
Ahmet
On Monday, April 10, 2017, 9:00:09 PM GMT+3, David Kramer 
 wrote:
I’ve done quite a bit of searching on this.  Pretty much every page I find says 
it’s a bad idea and won’t work well, but I’ve been asked to at least try it to 
reduce the number of completely unrelated results returned.  We are not trying 
to normalize the number, or display it as a percentage, and I understand why 
those are not mathematically sound.  We are relying on Solr for pagination, so 
we can’t just filter out low scores from the results.

I had assumed that you could use score in the filter query, but that doesn’t 
appear to be the case.  Is there a special way to reference it, or is there 
another way to attack the problem?  It seems like something that should be 
allowed and possible.

Thanks.

Re: How on EARTH do I remove 's in schema file?

2017-03-19 Thread Ahmet Arslan

Hi Donato,

How about using ApostropheFilterFactory ?

http://lucene.apache.org/core/6_4_2/analyzers-common/org/apache/lucene/analysis/tr/ApostropheFilter.html

Ahmet

On Sunday, March 19, 2017 4:08 PM, donato  wrote:

Then why is it not working? It doesn't make sense at all? And in the Tag field, 
it appears NOTHING is happening... It states something about a DefaultAnalyzer?

I am seriously at a loss... this seems like a simple solution that shouldn't be 
this hard! And it should be somewhat expected.

Why is the Tag field not using the analyzer and filters I have in place?

From: John Blythe [via Lucene] 
Sent: Sunday, March 19, 2017 9:04:29 AM
To: donato
Subject: Re: How on EARTH do I remove 's in schema file?

StandardTokenizer IS removing it. The token you see in each line is what is
passed _in_ to the tokenizer. The next line shows what came out.

On Sun, Mar 19, 2017 at 9:00 AM donato <[hidden 
email]> wrote:

> And here is my most recent schema.xml file after your suggestions... *
>  DOWNLOAD HERE*
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-on-EARTH-do-I-remove-s-in-schema-file-tp4325709p4325841.html
> Sent from the Solr - User mailing list archive at Nabble.com.

--
--
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | [hidden email]
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/How-on-EARTH-do-I-remove-s-in-schema-file-tp4325709p4325842.html
To unsubscribe from How on EARTH do I remove 's in schema file?, click 
here.
NAML

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-on-EARTH-do-I-remove-s-in-schema-file-tp4325709p4325843.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Distinguish exact match from wildcard match

2017-03-02 Thread Ahmet Arslan

Hi,

how about q=code_text:bolt*=code_text:bolt

Ahmet

On Thursday, March 2, 2017 4:41 PM, Сергей Твердохлеб  
wrote:



Hi,

is there way to separate exact match from wildcard match in solr response?
e.g. there are two documents: {code_text:bolt} and {code_text:bolter}. When
I search for "bolt" I want to get both results, but somehow grouped, so I
can determine either it was found with exact or non-exact match.

Thanks.

-- 
Regards,
Sergey Tverdokhleb

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Ahmet Arslan

Hi,

New default similarity is BM25. 
May be explicitly set similarity to tf-idf and see how it goes?

Ahmet


On Tuesday, February 21, 2017 4:28 AM, Fuad Efendi  wrote:
Hello,


Default TF-IDF performs poorly with the indexed 200 millions documents.
Query "Michael Jackson" may run 300ms, and "Michael The Jackson" over 3
seconds. eDisMax. Because default operator "OR" and stopword "The" we have
50-70 millions documents as a query result, and scoring is CPU intensive.
What to do? Our typical queries return over million documents, and response
times of simple queries ranges from 50 milliseconds to 5-10 seconds
depending on result set.

This was just an exaggerated example with stopword “the”, but even simplest
query “Michael Jackson” runs 300ms instead of 3ms just because huge number
of hits and TF-IDF calculations. Solr 6.3.


Thanks,

--

Fuad Efendi

(416) 993-2060

http://www.tokenizer.ca
Search Relevancy, Recommender Systems

Re: Stemming and accents

2017-02-10 Thread Ahmet Arslan

Hi,

I have experimented before, and found that Snowball is sensitive to 
accents/diacritics.
Please see for more details: 
http://www.sciencedirect.com/science/article/pii/S0306457315001053

Ahmet



On Friday, February 10, 2017 11:27 AM, Dominique Bejean 
 wrote:
Hi,

Is the SnowballPorterFilter sensitive to the accents for French for
instance ?

If I use both SnowballPorterFilter and ASCIIFoldingFilter, do I have to
configure ASCIIFoldingFilter after SnowballPorterFilter  ?

Regards.

Dominique
-- 
Dominique Béjean
06 08 46 12 43

Re: Dismax query special characters

2017-01-29 Thread Ahmet Arslan

Hi,

I don't think dismax recognizes AND OR.
Special characters for dismax are + - and quotes.

In your example, ampersand may causing you trouble. Due to URL encode stuff... 
Ahmet

On Sunday, January 29, 2017 12:17 AM, Jarosław Grązka 
 wrote:



Hi,

Reading Solr documentation about dismax query
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser i
understood dismax query parser can interpret following special chars:
AND,OR,+,-,quotes (for phrases) and should ignore all others like
||,NOT,&&,~,^ etc and treat them as simple strings.

But when i try query as follows:
{
  "limit" : 10,
  params:{
  defType:"dismax",
  q:"Difference && Java",
  q.op:"OR",
  qf:"body",
  indent: "on"
  }
}
This && opeartors works as AND.

I also got exceptions for this query:
{
  "limit" : 10,
  params:{
  defType:"dismax",
  q:"Difference && Java NOT",
  q.op:"OR",
  qf:"body",
  indent: "on"
  }
}

Did i misunderstand something? Shouldnt it treat 'NOT' as just String?

Re: Empty Highlight Problem - Solr 6.3.0

2016-12-24 Thread Ahmet Arslan

Hi,

Did you try increasing hl.maxAnalyzedChars ?

Ahmet



On Friday, December 23, 2016 10:47 PM, Furkan KAMACI  
wrote:
Hi All,

I'm trying highlighter component at Solr 6.3. I have a problem when I index
PDF files. I know that given keyword exists at result document (it is
returned as result because of a hit at document as well), highlighting
field is empty at response.

I'm suspicious about it happens documents which has large content. How can
I solve this problem. I've tried Standard Highlighter and FastVector
Highlighter (termVectors, termPositions, and termOffsets are enabled for hl
fields) but result is same?

Kind Regards,
Furkan KAMACI

Re: Stemming with SOLR

2016-12-15 Thread Ahmet Arslan

Hi,

KStemFilter returns legitimate English words, please use it.

Ahmet



On Thursday, December 15, 2016 6:17 PM, Lasitha Wattaladeniya 
 wrote:
Hello devs,

I'm trying to develop this indexing and querying flow where it converts the
words to its original form (lemmatization). I was doing bit of research
lately but the information on the internet is very limited. I tried using
hunspellfactory but it doesn't convert the word to it's original form,
instead it gives suggestions for some words (hunspell works for some
english words correctly but for some it gives multiple suggestions or no
suggestions, i used the en_us.dic provided by openoffice)

I know this is a generic problem in searching, so is there anyone who can
point me to correct direction or some information :)

Best regards,
Lasitha Wattaladeniya
Software Engineer

Mobile : +6593896893
Blog : techreadme.blogspot.com

Re: Searching for a term which isn't a part of an expression

2016-12-15 Thread Ahmet Arslan

Hi,

Span query family would be a pure query-time solution, SpanNotQuery in 
particular.

SpanNearQuery include = new SpanTermQuery(new Term(FIELD, "world");

SpanNearQuery exclude = new SpanNearQuery(new SpanQuery[] {
new SpanTermQuery(new Term(FIELD, "hello")),
new SpanTermQuery(new Term(FIELD, "world"))},
0,
true);

SpanQuery finalQuery = new SpanNotQuery(include, exclude)
This finalQuery supposed to retrieve documents that have the term "world" but 
not as a part of "hello world".

Is your list of phrases query dependent? If yes how many phrases per-query?

Or you have a global list of phrases?

Ahmet

On Thursday, December 15, 2016 10:32 AM, Dean Gurvitz <dea...@gmail.com> wrote:

Hi,
The list of phrases wil be relatively dynamic, so changing the indexing
process isn't a very good solution for us.

We also considered using a PostFilter or adding a SearchComponent to filter
out the "bad" results, but obviously a true query-time support would be a
lot better.

On Wed, Dec 14, 2016 at 10:52 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> Do you have a common list of phrases that you want to prohibit partial
> match?
> You can index those phrases in a special way, for example,
>
> This is a new world hello_world hot_dog tap_water etc.
>
> ahmet
>
>
> On Wednesday, December 14, 2016 9:20 PM, deansg <dea...@gmail.com> wrote:
> We would like to enable queries for a specific term that doesn't appear as
> a
> part of a given expression. Negating the expression will not help, as we
> still want to return items that contain the term independently, even if
> they
> contain full expression as well.
> For example, we would like to search for items that have the term "world"
> but not as a part of "hello world". If the text is: "This is a new world.
> Hello world", we would still want to return the item, as "world" appears
> independently as well as a part of "Hello world". However, we will not want
> to return items that only have the expression "hello world" in them.
> Does Solr support these types of queries? We thought about using regex, but
> since the text is tokenized I don't think that will be possible.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Searching-for-a-term-which-isn-t-a-part-of-an-
> expression-tp4309746.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Searching for a term which isn't a part of an expression

2016-12-14 Thread Ahmet Arslan

Hi,

Do you have a common list of phrases that you want to prohibit partial match?
You can index those phrases in a special way, for example,

This is a new world hello_world hot_dog tap_water etc.

ahmet


On Wednesday, December 14, 2016 9:20 PM, deansg  wrote:
We would like to enable queries for a specific term that doesn't appear as a
part of a given expression. Negating the expression will not help, as we
still want to return items that contain the term independently, even if they
contain full expression as well.
For example, we would like to search for items that have the term "world"
but not as a part of "hello world". If the text is: "This is a new world.
Hello world", we would still want to return the item, as "world" appears
independently as well as a part of "Hello world". However, we will not want
to return items that only have the expression "hello world" in them.
Does Solr support these types of queries? We thought about using regex, but
since the text is tokenized I don't think that will be possible.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Searching-for-a-term-which-isn-t-a-part-of-an-expression-tp4309746.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unicode Character Problem

2016-12-10 Thread Ahmet Arslan

Hi Furkan,

I am pretty sure this is a pdf extraction thing.
Turkish characters caused us trouble in the past during extracting text from 
pdf files.
You can confirm by performing manual copy-paste from original pdf file.

Ahmet


On Friday, December 9, 2016 8:44 PM, Furkan KAMACI  
wrote:
Hi,

I'm trying to index Turkish characters. These are what I see at my index (I
see both of them at different places of my content):

aç �klama
açıklama

These are same words but indexed different (same weird character at first
one). I see that there is not a weird character when I check the original
PDF file.

What do you think about it. Is it related to Solr or Tika?

PS: I use text_general for analyser of content field.

Kind Regards,
Furkan KAMACI

Re: Wildcard searches with space in TextField/StrField

2016-11-25 Thread Ahmet Arslan

Hi,

You could try this:

drop wildcard stuff altogether:
1) Employ edgengramfilter at index time.
2) Use plain searches at query time.

Ahmet

On Friday, November 25, 2016 4:59 PM, Sandeep Khanzode 
 wrote:
Hi All,

Can someone please assist with this query?

My data consists of:
1.] John Doe
2.] John V. Doe
3.] Johnson Doe
4.] Johnson V. Doe
5.] John Smith
6.] Johnson V. Smith
7.] Matt Doe
8.] Matt V. Doe
9.] Matt Doe
10.] Matthew V. Doe
11.] Matthew Smith

12.] Matthew V. Smith

Querying ...
(a) Matt/Matt* should return records 7-12
(b) John/John* should return records 1-6
(c) Doe/Doe* should return records 1-4, 7-10
(d) Smith/Smith* should return records 5,6,11,12
(e) V/V./V.*/V* should return records 2,4,6,8,10,12
(f) V. Doe/V. Doe* should return records 2,4,8,10
(g) John V/John V./John V*/John V.* should return record 2
(h) V. Smith/V. Smith* should return records 6,12

Any guidance would be appreciated!
I have tried ComplexPhraseQueryParser, but with a single token like Doe*, there 
is an error that indicates that the query is being identified as a prefix 
query. I may be missing something in the syntax.
 SRK 

On Thursday, November 24, 2016 11:16 PM, Sandeep Khanzode 
 wrote:

Hi All, Erick,
Please suggest. Would like to use the ComplexPhraseQueryParser for searching 
text (with wildcard) that may contain special characters.
For example ...John* should match John V. DoeJohn* should match Johnson 
SmithBruce-Willis* should match Bruce-WillisV.* should match John V. F. Doe
SRK 

On Thursday, November 24, 2016 5:57 PM, Sandeep Khanzode 
 wrote:

Hi,
This is the typical TextField with ... 

SRK 

On Thursday, November 24, 2016 1:38 AM, Reth RM  
wrote:

what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode 
 wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named 
"John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any 
results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 

Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

On Sunday, November 13, 2016 7:43 AM, Erick Erickson 
 wrote:

 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ 
Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
 wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
>  wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
>  wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as

Re: Problem with Han character in ICUFoldingFilter

2016-10-30 Thread Ahmet Arslan

Hi Eyal,

ICUFoldingFilter uses http://site.icu-project.org under the hood.
If you think there is a bug, it is better to ask its mailing list.

Ahmet



On Sunday, October 30, 2016 3:41 PM, "eyal.naam...@exlibrisgroup.com" 
 wrote:
Hi,

I was wondering if anyone ran into the following issue, or a similar one:
In Han script there are two separate characters - 宅 (FA04) and 宅 (5B85).
It seems that ICUFoldingFilter converts FA04 to 5B85, which results in the 
wrong character being indexed.
Does anyone have any idea if and how this can be resolved? Is there an option 
to add an exception rule to ICUFoldingFilter?
Thanks,
Eyal

Re: Solr 5.3.1 - Synonym is not working as expected

2016-10-25 Thread Ahmet Arslan

Hi,

If your index is pure Chinese, I would do the expansion on query time only.
Simply replace English query term with Chinese translations.

Ahmet



On Tuesday, October 25, 2016 12:30 PM, soundarya  
wrote:
We are using Solr 5.3.1 version as our search engine. This setup is provided
by 
the Bitnami cloud and the amazon AMI is ami-50a47e23.

We have a website which has content in Chinese. We use Nutch crawler to
crawl 
the entire website and index it to the Solr collection. We have configured
few 
fields including text field with Cinese tokenizers. When user search with 
Chinese characters, we are able to see the relevant results. We wanted to
see 
the same results when user types in English or Pinyin characters. So, we
have 
included synonym file and respective tokenizer added to the schema.xml file.
We 
are not able to get any results after doing these changes. Below is the 
configuration we did in schema.xml. The synonym file is a mapping of Chinese 
word with equivalent English and pinyin words.















The output with query debug is providing the below result. The synonym 
configured for the English word is actually picked, but we see no results. 
Below is the

"rawquerystring":"nasonex",
"querystring":"nasonex",
"parsedquery":"(text:nasonex text:内舒拿)/no_coord",
"parsedquery_toString":"text:nasonex text:内舒拿",
"QParser":"LuceneQParser"


Below is the output when we try to use the analysis tool.

ST

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1



SF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKWF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



LCF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1



CJKBF

text

raw_bytes

start

end

positionLength

type

position



nasonex

[6e 61 73 6f 6e 65 78]

0

7

1



1


内舒拿

[e5 86 85 e8 88 92 e6 8b bf]

0

7

1

SYNONYM

1





Please help us regarding this issue. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-3-1-Synonym-is-not-working-as-expected-tp4302913.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Lowercase all characters in String

2016-10-11 Thread Ahmet Arslan

Hi,

KeywordTokenizer and LowerCaseFilter should suffice. Optionally you can add 
TrimFilter too.

Ahmet


On Tuesday, October 11, 2016 5:24 PM, Zheng Lin Edwin Yeo 
 wrote:
Hi,

Would like to find out, what is the best way to lowercase all the text,
while preserving all the tokens.

As I need to preserve every character of the text (including symbols and
white space), I'm using String. However, I can't put the
LowerCaseFilterFactory in String.

I found that we can use WhitespaceTokenizerFactory, followed by
LowerCaseFilterFactory. Although WhitespaceTokenizerFactory can preserve
the symbols, it will still split on Whitespace, which is what we do not
want. This is because we may have words like 'One' and 'One Way'. If we use
the WhitespaceTokenizerFactory and search for 'One', it will return records
with 'One Way' too, which is what we do not want.

Is there other way which we can achieve this?

I'm using Solr 6.2.1.

Regards,
Edwin

Re: Preceding special characters in ClassicTokenizerFactory

2016-10-03 Thread Ahmet Arslan

Hi Andy,

WordDelimeterFilter has "types" option. There is an example file named 
wdftypes.txt in the source tree that preserves #hashtags and @mentions. If you 
follow this path, please use Whitespace tokenizer.

Ahmet



On Monday, October 3, 2016 9:52 PM, "Whelan, Andy"  wrote:
Hello,
I am guessing that what I am looking for is probably going to require extending 
StandardTokenizerFactory or ClassicTokenizerFactory. But I thought I would ask 
the group here before attempting this. We are indexing documents from an 
eclectic set of sources. There is, however, a heavy interest in computing and 
social media sources. So computer terminology and social media terms (terms 
beginning with hashes (#), @ symbols, etc.) are terms that we would like to 
have searchable.

We are considering the ClassicTokenizerFactory because we like the fact that it 
does not use the Unicode standard annex 
UAX#29 word boundary rules. 
It preserves email addresses, internet domain names, etc.  We would also like 
to use it as the tokenizer element of index and query analyzers that would 
preserve @< rest of token > or # patterns.

I have seen examples where folks are replacing the StandardTokenizerFactory in 
their analyzer with stream combinations made up of charFilters,  
WhitespaceTokenizerFactory, etc. as in the following article 
http://www.prowave.io/indexing-special-terms-using-solr/ to remedy such 
problems.

Example:
 
 
 
 
 
 
 
 
 
 
 
 


I am just wondering if anyone knew of a smart way (without extending classes) 
to actually preserve most of the ClassicTokenizerFactory functionality without 
getting rid of leading special characters? The "Solr In Action" book (page 179) 
claims that it is hard to extend the StandardTokenizerFactory. I'm assuming 
this is the same for ClassicTokenizerFactory.

Thanks
-Andrew

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan

Hi,

I think AutomatonQuery is used.
http://opensourceconnections.com/blog/2013/02/21/lucene-4-finite-state-automaton-in-10-minutes-intro-tutorial/
https://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/AutomatonQuery.html

Ahmet

On Thursday, September 8, 2016 3:54 PM, Sandeep Khanzode
<sandeep_khanz...@yahoo.com> wrote:

Hi,

Okay.

So it seems that the wildcard searches will perform a (sort-of) dictionary
search where they will inspect every (full keyword) token at search time, and
do a match instead of a match on pre-created index-time tokens with TextField.
However, the wildcard/fuzzy functionality will still be provided no matter the
approach...

SRK

On Thursday, September 8, 2016 5:05 PM, Ahmet Arslan
<iori...@yahoo.com.INVALID> wrote:

Hi,

EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or
starts with search.

Lets say, wildcard enumerates the whole inverted index, thus it may get slower
for very large databases.
With this one no index time manipulation is required.

EdgeNGram does its magic at index time, indexes a lot of tokens, all possible
prefixes.
Index size gets bigger, query time no wildcard operator required in this one.

Ahmet

On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode
<sandeep_khanz...@yahoo.com.INVALID> wrote:
Hello,
There are quite a few links that detail the difference between StrField and
TextField. Also links that explain that, even though the field is indexed, it
is not tokenized and stored as a single keyword, as can be verified by the
debug analysis on Solr admin and CURL debugQuery options.
What I am unable to understand is how a wildcard works on StrFields? For
example, if the name is "John Doe" and I search for "John*", I get that match.
Which means, that somewhere deep within, maybe a Trie or Dictionary
representation exists that allows this search with a partial string.
I would have assumed that wildcard would match on TextFields which allow
(Edge)NGramFilters, etc. -- SRK

Re: StrField with Wildcard Search

2016-09-08 Thread Ahmet Arslan

Hi,

EdgeNGram and Wildcard may be used to achieve the same goal: prefix search or 
starts with search.

Lets say, wildcard enumerates the whole inverted index, thus it may get slower 
for very large databases.
With this one no index time manipulation is required.

EdgeNGram does its magic at index time, indexes a lot of tokens, all possible 
prefixes.
Index size gets bigger, query time no wildcard operator required in this one.

Ahmet



On Thursday, September 8, 2016 12:35 PM, Sandeep Khanzode 
 wrote:
Hello,
There are quite a few links that detail the difference between StrField and 
TextField. Also links that explain that, even though the field is indexed, it 
is not tokenized and stored as a single keyword, as can be verified by the 
debug analysis on Solr admin and CURL debugQuery options.
What I am unable to understand is how a wildcard works on StrFields? For 
example, if the name is "John Doe" and I search for "John*", I get that match. 
Which means, that somewhere deep within, maybe a Trie or Dictionary 
representation exists that allows this search with a partial string.
I would have assumed that wildcard would match on TextFields which allow 
(Edge)NGramFilters, etc.  -- SRK

Re: changed query parsing between 4.10.4 and 5.5.3?

2016-09-07 Thread Ahmet Arslan

Hi,

The tilde in the former looks interesting. 
I think it related to proximity search.
What query parser is this? 

Ahmet



On Wednesday, September 7, 2016 10:52 AM, Bernd Fehling 
 wrote:
Hi list,

while going from SOLR 4.10.4 to 5.5.3 I noticed a change in query parsing.
4.10.4
text:star text:trek
  text:star text:trek
  (+((text:star text:trek)~2))/no_coord
  +((text:star text:trek)~2)

5.5.3
text:star text:trek
  text:star text:trek
  (+(+text:star +text:trek))/no_coord
  +(+text:star +text:trek)

There are very many new features and changes between this two versions.
It looks like a change in query parsing.
Can someone point me to the solr or lucene jira about the changes?
Or even give a hint how to get my "old" query parsing back?

Regards
Bernd

Re: Blank/Null value search in term filter

2016-09-05 Thread Ahmet Arslan



Hi Kishore,

Usually, query clause below is used for the task: 
(+*:* -queryField:[* TO *]) OR queryField:A

Ahmet




On Monday, September 5, 2016 2:33 PM, Kamal Kishore Aggarwal 
<kkroyal@gmail.com> wrote:
Thanks Ahmet for your response and nice suggestion.

But, I was looking if there any way out without making any configuration
change.

Please suggest.

On 02-Sep-2016 9:37 PM, "Ahmet Arslan" <iori...@yahoo.com> wrote:

>
>
> Hi Kishore,
>
> You can employ an impossible token value (say XX) for null values.
> This can be done via default value update processor factory.
> You index some placeholder token for null values.
> fq={!terms f='queryField' separator='|'}A|XX would fetche docs with A or
> null values.
> Ahmet
>
> On Friday, September 2, 2016 2:03 PM, Kamal Kishore Aggarwal <
> kkroyal@gmail.com> wrote:
>
>
>
> Hi,
>
> We are using solr 5.4.1.
>
> We are using term filter for multiple value matching purpose.
> Example: fq={!terms f='queryField' separator='|'}A|B
>
> A, B, C are the possible field values for solr field "queryField". There
> can docs with null values for the same field. Now, how can I create a term
> filter in above fashion that fetches docs with A or null values.
>
> Please suggest.
>
> Regards
> Kamal
>

Re: Blank/Null value search in term filter

2016-09-02 Thread Ahmet Arslan



Hi Kishore,

You can employ an impossible token value (say XX) for null values.
This can be done via default value update processor factory.
You index some placeholder token for null values.
fq={!terms f='queryField' separator='|'}A|XX would fetche docs with A or null 
values.
Ahmet

On Friday, September 2, 2016 2:03 PM, Kamal Kishore Aggarwal 
 wrote:



Hi,

We are using solr 5.4.1.

We are using term filter for multiple value matching purpose.
Example: fq={!terms f='queryField' separator='|'}A|B

A, B, C are the possible field values for solr field "queryField". There
can docs with null values for the same field. Now, how can I create a term
filter in above fashion that fetches docs with A or null values.

Please suggest.

Regards
Kamal

Re: Sorting non-english text

2016-08-25 Thread Ahmet Arslan

Hi,

I think there is a dedidated fieldType for this:

https://cwiki.apache.org/confluence/display/solr/Language+Analysis#LanguageAnalysis-UnicodeCollation

Ahmet

On Thursday, August 25, 2016 9:08 PM, Vasu Y <vya...@gmail.com> wrote:
Thank you Ahmet.

I have couple of questions on using CollationKeyAnalyzer:
1) Is it enough to specify this Analyzer in schema.xml as shown below or do
i need to pass any parameters like language etc.?
2) Do we need to define one CollationKeyAnalyzer  per language?
3) I also noticed that there is one more analyzer called
ICUCollationKeyAnalyzer; how does CollationKeyAnalyzer compare against
ICUCollationKeyAnalyzer in terms of memory usage & performance?
4) When looking at javadoc for CollationKeyAnalyzer, I noticed there are
some WARNINGS that says JVM vendor, version & patch, collation strength
needs to be same between indexing & query time. Does it mean, if for
example, I update JVM patch-version, then already indexed documents whose
indexed fields used CollationKeyAnalyzer needs to be re-indexed or else we
cannot query them?

Thanks,
Vasu

On Thu, Aug 25, 2016 at 7:59 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi Vasu,
>
> There is a field type or something like that (CollationKeyAnalyzer) for
> language specific sorting.
>
> Ahmet
>
>
>
> On Thursday, August 25, 2016 12:29 PM, Vasu Y <vya...@gmail.com> wrote:
> Hi,
> I have a text field which can contain values (multiple tokens) in English;
> to support sorting, I had  in schema.xml to copy this to a new
> field of type "lowercase" (defined as below).
> I also have text fields of type text_de, text_es, text_fr, ja, cn etc. I
> intend to do  to copy them to a new field of type "lowercase" to
> support sorting.
>
> Would this "lowercase" field type work well for sorting non-English fields
> that are non-tokenized (or are single-term) or do you suggest to use a
> different tokenizer & filter?
>
>  
>   positionIncrementGap="100">
>
>  
>  
>
> 
>
> Thanks,
> Vasu
>

Re: Sorting non-english text

2016-08-25 Thread Ahmet Arslan

Hi Vasu,

There is a field type or something like that (CollationKeyAnalyzer) for 
language specific sorting.

Ahmet



On Thursday, August 25, 2016 12:29 PM, Vasu Y  wrote:
Hi,
I have a text field which can contain values (multiple tokens) in English;
to support sorting, I had  in schema.xml to copy this to a new
field of type "lowercase" (defined as below).
I also have text fields of type text_de, text_es, text_fr, ja, cn etc. I
intend to do  to copy them to a new field of type "lowercase" to
support sorting.

Would this "lowercase" field type work well for sorting non-English fields
that are non-tokenized (or are single-term) or do you suggest to use a
different tokenizer & filter?

 
 
   
 
 
   


Thanks,
Vasu

Re: Wildcard search not working

2016-08-12 Thread Ahmet Arslan

Hi Christian,

Please use the following filter before/above the stemmer.


Plus, you may want to add :


  
  
  

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" 
<christian.ribe...@novartis.com> wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, 
honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the 
corresponding field:

...




 









...

What is wrong with this schema? Respectively, what should I change to be able 
to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
<christian.ribe...@novartis.com> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

2016-08-11 Thread Ahmet Arslan

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche 
depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are 
executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match 
it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" 
 wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser 
NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for 
instance searches with term 'roche' in a specific core. Everything fine, I am 
getting for instance two matches. I would expect at least the same number of 
matches with term 'r?che'. However, this does NOT happen. I am getting zero 
matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 
'roch*' works.

Switching debug mode brings following output:

"debug": {
"rawquerystring": "roch?",
"querystring": "roch?",
"parsedquery": "text:roch?",
"parsedquery_toString": "text:roch?",
"explain": {},
"QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Query optimization

2016-07-29 Thread Ahmet Arslan

Ups I forgot the link:
http://yonik.com/solr/paging-and-deep-paging/

On Friday, July 29, 2016 9:51 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
Hi Midas,

Please search 'deep paging' over the documentation, mailing list, etc.
Solr Deep Paging and Sorting

Ahmet

On Friday, July 29, 2016 9:21 AM, Midas A <test.mi...@gmail.com> wrote:

please reply .

On Fri, Jul 29, 2016 at 10:26 AM, Midas A <test.mi...@gmail.com> wrote:

> a) my index size is 10 gb   for higher start is query response got slow .
> what should i do to optimize this query for higher start value in query
>

Re: Query optimization

2016-07-29 Thread Ahmet Arslan

Hi Midas,

Please search 'deep paging' over the documentation, mailing list, etc.
Solr Deep Paging and Sorting

Ahmet
On Friday, July 29, 2016 9:21 AM, Midas A  wrote:

please reply .

On Fri, Jul 29, 2016 at 10:26 AM, Midas A  wrote:

> a) my index size is 10 gb   for higher start is query response got slow .
> what should i do to optimize this query for higher start value in query
>

Re: No need white space split

2016-07-25 Thread Ahmet Arslan

Hi,

May be you can simply use string field type?
Or KeywordTokenizerFactory?

Ahmet



On Monday, July 25, 2016 4:38 PM, Shashi Roushan  
wrote:
Hi All,

I am Shashi.

I am using Solr 6.1. I want to get result only when the hole word matched.
Actually I want to avoid whitespace split.

Whenever we search for "CORSAIR ValueSelect", I want the result only
"CORSAIR ValueSelect",currently I am getting one more result as "CORSAIR
XMS 2GB".

Can any one help me?

Re: Find part of long query in shorter fields

2016-07-21 Thread Ahmet Arslan

Hi,

If you want to disable operators altogether please use dismax instead of 
edismax.
In dismax, only + and - unary operators are supported, if i am not wrong.
I don't remember the situation of quotations for the phrase query.

Ahmet

On Tuesday, July 19, 2016 8:29 PM, CA  wrote:
Just for the records:

After realizing that with „defType=dismax“ I really do get the expected output 
I’ve found out what I need to change in my edismax configuration:

false

Then this will work:
> q=Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
> Charger
> // edismax with qf/pf : „name“ and „brand“ field
> 
Not returned anymore:
> name: "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh 
> Formula Cartrige (Compatible with Series 7,5,3) 2 pc“
> brand: Braun
> 
Best hit:
> name: "Braun 9095cc Series 9 Electric Shaver“
> brand: Braun

Actually, as I’d like to disable operators in the query altogether (if 
possible), I’m wondering whether I should not be using the old dismax in the 
first place.

Cheers,

Chantal

Re: Find part of long query in shorter fields

2016-07-16 Thread Ahmet Arslan

Hi Chantal,

Please see https://issues.apache.org/jira/browse/LUCENE-7148


ahmet



On Saturday, July 16, 2016 3:48 PM, CA  wrote:
Hello all,

our index contains product offers from online shops. The fields we are indexing 
have all rather short values: the name of the product, the brand, the price, 
category and some fields containing identifiers like ASIN, GTIN etc. if 
available. We do not index the description texts.

The regular user search uses the „edismax“ and queries the above mentioned 
fields which works fine for short inputs like „iphone 6s“.

Now, we have to support a different kind of query which won’t be user input but 
using complete product names like those we store ourselves but not necessarily 
names that are actually part of our data set. This means that the input query 
can be relatively long. The output of the query is planned to consist of a More 
Like This list. So, in effect the query should have at least one hit that is 
hopefully close enough, and the actual result will be a More Like This list 
sourced by that one hit.

I have tried to get this to work based on the „edismax“ setup for the regular 
user search but this does not work well when the input is longer than what we 
have stored as similar product. Here is an example:


## Step 1: Input (not stored in our index):
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
Charger“ (input to edismax without quotes)

(a) This input does not produce any results with our current edismax config 
(details at the end of the e-mail).
(b) When I relax the „mm“ parameter to "2<-1 5<-30% 8<10%“, I get one hit with 
the following name:
=> "Braun Series Clean CCR2 Cleansing Dock Cartridges Lemonfresh Formula 
Cartrige (Compatible with Series 7,5,3) 2 pc“


## Step 2: When I reduce the input manually to the following:
"Braun Series 9 9095CC Men's Electric Shaver“

The above shortened input returns a very good hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"


My Question:

Is it possible, and if so - how, to have the query input:
"Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew 
Charger“ (input to edismax without quotes)
return (also or only) the hit with the name:
=> "Braun 9095cc Series 9 Electric Shaver"
and maybe even give it a high score.

I have tried to use „explainOther“ (output see at the end of this e-mail) but I 
have a really hard time reading it. In some cases, I’m not even able to 
understand where one clause ends and the next one starts (is it possible to 
have it returned in several lines?). Maybe someone can give me a hint on how to 
use that output or knows of some documentation on the i-net that explains how 
to make good use of it?


Looking at the input string, I was wondering:

(A) Is relaxing the „mm“ parameter really the way to go?
(B) Should I create another name field in schema.xml that basically has a 
different query chain, discarding the last words of a query input if too long. 
Or maybe it’s possible to make tokens in the first part of the input more 
„important“ (though I’m not sure this is generally the case)? Should I remove 
some of the filters from the query chain (like the ShingleFilter)?
(C) Can I configure something else or should I not use edismax for this?


Thank you for reading this,
any insight is highly appreciated!

Chantal


***

Following are the field configuration for the name field, the configuration of 
the edismax handler, and the output of „explainOther“ for the above example.



SCHEMA.XML — „name" field:














SOLRCONFIG.XML — MLT/EDISMAX


 
 all
 edismax

 *:*
 id,brand,name,price,score,popularity
 0.1
 brand_split^6 name
 brand_split^10 name^10
 2-1 5-30% 810%
 10
 20

 xml

 false
 brand_split^6 name price
 brand_split name price
 details
 




DEBUG — EXPLAIN OTHER

The „other“ document with id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b has the 
title "Braun 9095cc Series 9 Electric Shaver"



edismax
brand_split^6 name
brand_split^10 name^10
2<-1 5<-30% 8<10%
10
20
0.1

Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean 
and Renew Charger

id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b





Braun Series Clean CCR2 Cleansing Dock Cartridges 
Lemonfresh Formula Cartrige (Compatible with
Series 7,5,3) 2 pc

773d4bdb341c4dc438c481ac80de5abde08d85bf
Braun
97.122955




Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and 
Renew Charger


Braun Series 9 9095CC Men's Electric Shaver

Re: Filter Query that matches all values of a field

2016-07-04 Thread Ahmet Arslan

Hi Vasu,


This question appears occasionally in the mailing list.
Please see https://issues.apache.org/jira/browse/LUCENE-7148

ahmet


On Monday, July 4, 2016 9:10 PM, Vasu Y  wrote:



Hi,
I have a single type field that can contain zero or more values (comma
separated values). This field stores some sort of access value.

In the filter, I am given a list of allowed values for the field and a
document must be considered if all values contained in its field must be
present in the allowed values specified in the filter.
How can i write filter query for this?

To illustrate this further:
If a field "field1" in a document contains (a1, a3, a5) values.

   1. Case #1) If the allowed values specified in the filter are (a1, a3,
   a4, a6) --> the document should not be considered since user doesn’t have
   access to “a5”.
   2. Case #2) If the allowed values specified in the filter are (a2, a4,
   a6) --> the document should not be considered since user doesn’t have
   access to “a1, a3, a5”.
   3. Case #3) If the allowed values specified in the filter are (a1, a3,
   a5) --> the document should be considered since user has access to all
   values of the field.
   4. Case #4) If the allowed values specified in the filter are (a1, a2,
   a3, a4, a5, a6) --> the document should be considered since user has access
   to all values of the field (and some more values also).


Thanks,
Vasu

Re: Data import handler in techproducts example

2016-07-02 Thread Ahmet Arslan

Hi Jonas,

Search for the 
solr-dataimporthandler-*.jar place it under a lib directory (same level as the 
solr.xml file) along with the mysql jdbc driver (mysql-connector-java-*.jar)

Please see:
https://cwiki.apache.org/confluence/display/solr/Lib+Directives+in+SolrConfig




On Saturday, July 2, 2016 9:56 PM, Jonas Vasiliauskas 
 wrote:
Hey,

I'm quite new to solr and java environments. I have a goal for myself to 
import some data from mysql database in techproducts (core) example.

I have setup data import handler (DIH) for techproducts based on 
instructions here https://wiki.apache.org/solr/DIHQuickStart , but looks 
like solr doesn't load DIH libraries, could someone please explain in 
quick words on how to check if DIH is loaded and if not - how can I load 
it ?

Stacktrace is here: http://pastebin.ca/3654347

Thanks,

Re: an advice: why not to add a searching model for mailing list

2016-07-02 Thread Ahmet Arslan

Hi Kent,


There are already two search systems for the task:
http://find.searchhub.org
http://search-lucene.com

Is this what you mean by saying 'search model'?
Ahmet



On Saturday, July 2, 2016 6:43 PM, Kent Mu  wrote:



hi all,
I wonder why not do add a searching model for mailing list, so that we can
filter and query the usage info by searching the specified words quickly.

Best Regards!
Kent

Re: Sorting & searching on the same field

2016-06-23 Thread Ahmet Arslan

Hi Jay,

I don't think it can be combined.
Mainly because: searching requires a tokenized field.
Sorting requires a single value (token) to be meaningful.

Ahmet



On Thursday, June 23, 2016 7:43 PM, Jay Potharaju  wrote:
Hi,
I would like to have 1 field that can used for both searching and case
insensitive sorting. As far as i know the only way to do is to have two
fields one for searching (text_en) and one for sorting(lowercase & string).
Any ideas how the two can be combined into 1 field.


-- 
Thanks
Jay Potharaju

Re: How do we get terms suggestion from SuggestComponent?

2016-06-21 Thread Ahmet Arslan

Hi,

With grams parameter of FreeTextLookupFactory, no?

Ahmet



On Tuesday, June 21, 2016 1:19 PM, solr2020  wrote:
Thanks Ahmet.

It is working fine. Now i would like to get suggestions for multiple terms.
How do i get suggestions for multiple terms?


 


   
   
   
   
  


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-we-get-terms-suggestion-from-SuggestComponent-tp4283399p4283584.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: How do we get terms suggestion from SuggestComponent?

2016-06-20 Thread Ahmet Arslan

Hi,

I think :

FreeTextLookupFactory
DocumentDictionaryFactory
3
content

Ahmet



On Monday, June 20, 2016 3:51 PM, solr2020  wrote:
Hi,

I am using solr.SuggestComponent for auto suggestion, it works fine. But the
problem is, it returns the whole field value as suggestion instead of terms.
But my requirement is term needs to be returned as suggestion. How do we
achieve this with solr.SuggestComponent?

Thanks.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-we-get-terms-suggestion-from-SuggestComponent-tp4283399.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Phrase query proximity parameter doe not show up in parsed query string

2016-06-20 Thread Ahmet Arslan

Hi,

I think synonym_edismax is not part of solr.
Can you re-produce with the stock edismax?



On Monday, June 20, 2016 12:34 PM, preeti kumari  wrote:



Hi All,

My query looks like below :

q=((_query_:"{!synonym_edismax  qf='partnum' v='597871' bq='' mm=100
synonyms=true synonyms.constructPhrases=true
synonyms.ignoreQueryOperators=true}") OR (partnumcomp:597871* OR
partnum:"597871"~4 OR ngramc:"597 978 787 871"~4 OR partnumngramc:"597 978
787 871"~4 OR content:"597871"~500) AND (mocode:5S3 OR mocode:A01 ))

*numFound=0 by above query although i have data for " ngramc:"597 978 787
871"~4 OR partnumngramc:"597 978 787 871"~4 OR content:"597871"~500"*

When i checked for the parsed query looks like below and i could see the
proximity parameter "*~4*" is not passed to parsed query and so no results:

+(((+DisjunctionMaxQuery((partnum:597871))
())/no_coord) partnumcomp:597871* partnum:597871 PhraseQuery(ngramc:"597
978 787 871") PhraseQuery(partnumngramc:"597 978 787 871") content:597871)
+(mocode:5s3 mocode:a01)


+((+(partnum:597871) ())
partnumcomp:597871* partnum:597871 ngramc:"597 978 787 871"
partnumngramc:"597 978 787 871" content:597871) +(mocode:5s3
mocode:a01)

Please do let me know why the proximity parameter is not getting sent in
parsed query to solr. How to fix this. Is something wrong with the query?

Although repharasing the query to below works :

q=((partnumcomp:597871* OR partnum:"597871"~4 OR ngramc:"597 978 787 871"~4
OR partnumngramc:"597 978 787 871"~4 OR content:"597871"~500) AND
(mocode:5S3 OR mocode:A01 ) OR (_query_:"{!synonym_edismax  qf='partnum'
v='597871' bq='' mm=100 synonyms=true synonyms.constructPhrases=true
synonyms.ignoreQueryOperators=true}"))


Thanks
Preeti

Re: Can someone explain about Sweetspot Similarity ?

2016-06-19 Thread Ahmet Arslan

Hi,

Sweet spot is designed to punish too long or too short documents.

Did you reindex?

Can you see the mention of sweet spot in debugQuery=true response?

Ahmet



On Sunday, June 19, 2016 2:18 PM, dirmanhafiz  wrote:
Hi , Im Dirman and im trying experiment solr with sweetspot similarity,,
can someone tell me min max in sweetspot similarity is about lengthdoc ?
i have average lengthdoc 3-205 tokens / doc and why while i wrote in my
schema.xml parameter sweetspot similairity didnt work ?



  
10
50
0.5
  

   

i want make out of range min max lengthdoc  will be punished 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-someone-explain-about-Sweetspot-Similarity-tp4283248.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Error when searching with special characters

2016-06-18 Thread Ahmet Arslan



If properly escaped ampersand throws parse exception, this could be a bug.



On Saturday, June 18, 2016 7:12 PM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> 
wrote:
Hi,

It does not work with the back slash too.

But I found that it does not work for defType=lucene.
It will work if the defType=dismax or edismax.

What could be the reason that it did not work with the default
defType=lucene?

Regards,
Edwin



On 18 June 2016 at 01:04, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:

> Hi,
>
> May be URL encoding issue?
> By the way, I would use back slash to escape special characters.
>
> Ahmet
>
> On Friday, June 17, 2016 10:08 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
>
>
>
> Hi,
>
> I encountered this error when I tried to search with special characters,
> like "&" and "#".
>
> {
>   "responseHeader":{
> "status":400,
> "QTime":0},
>   "error":{
> "msg":"org.apache.solr.search.SyntaxError: Cannot parse
> '\"Research ': Lexical error at line 1, column 11.  Encountered: 
> after : \"\\\"Research \"",
> "code":400}}
>
>
> I have done the search by putting inverted commands, like: q="Research &
> Development"
>
> What could be the issue here?
>
> I'm facing this problem in both Solr 5.4.0 and Solr 6.0.1.
>
>
> Regards,
> Edwin
>

Re: Error when searching with special characters

2016-06-17 Thread Ahmet Arslan

Hi,

May be URL encoding issue?
By the way, I would use back slash to escape special characters.

Ahmet

On Friday, June 17, 2016 10:08 AM, Zheng Lin Edwin Yeo  
wrote:



Hi,

I encountered this error when I tried to search with special characters,
like "&" and "#".

{
  "responseHeader":{
"status":400,
"QTime":0},
  "error":{
"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'\"Research ': Lexical error at line 1, column 11.  Encountered: 
after : \"\\\"Research \"",
"code":400}}


I have done the search by putting inverted commands, like: q="Research &
Development"

What could be the issue here?

I'm facing this problem in both Solr 5.4.0 and Solr 6.0.1.


Regards,
Edwin

Re: Stemming

2016-06-16 Thread Ahmet Arslan



Hi Jamal,

Snowball requires lowercase filter above it.
This is documented in javadocs but it is a small but important detail.
Please use a lowercase filter after the whitescpace tokenizer.


Ahmet
On Thursday, June 16, 2016 10:13 PM, "Jamal, Sarfaraz" 
 wrote:



Hi Guys,

I have enabled stemming:
  




  

In the Admin Analysis, I type in running or runs and they both break down to 
run.
However when I search for run, runs, or running with an actual query -

It brings back three different sets of results.

Is that correct?

I would imagine that all three would bring back the exact same resultset?

Sas

Re: wildcard search for string having spaces

2016-06-15 Thread Ahmet Arslan

Hi Roshan,

I think there are two options:

1) escape the space q=abc\ p*
2) use prefix query parser q={!prefix f=my_string}abc p

Ahmet


On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble 
 wrote:
Hello,

I have below custom field type defined for solr 6.0.0

   

  
  


  
  

  


I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

Re: Question about multiple fq parameters

2016-06-09 Thread Ahmet Arslan

Hi Mikhail,

Can you please explain what this mysterious op parameter is?
How is it related to range queries issued on date fields?

Thanks,
Ahmet


On Thursday, June 9, 2016 11:43 AM, Mikhail Khludnev 
 wrote:
Shawn,
I found "op" at
org.apache.solr.schema.DateRangeField.parseSpatialArgs(QParser, String).


On Thu, Jun 9, 2016 at 1:46 AM, Shawn Heisey  wrote:

> On 6/8/2016 2:28 PM, Steven White wrote:
> > ?q=*=OR={!field+f=DateA+op=Intersects}[2020-01-01+TO+2030-01-01]
>
> Looking at this and checking the code for the Field query parser, I
> cannot see how what you have used above is any different than:
>
> fq=DateA:[2020-01-01 TO 2030-01-01]
>
> The "op=Intersects" parameter that you have included appears to be
> ignored by the parser code that I examined.
>
> If my understanding of the documentation and the code is correct, then
> you should be able to use this:
>
> fq=DateB:[2000-01-01 TO 2020-01-01] OR DateA:[2020-01-01 TO 2030-01-01]
>
> In my examples I have changed the URL encoded "+" character back to a
> regular space.
>
> Thanks,
> Shawn
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: Scoring changes between 4.10 and 5.5

2016-06-09 Thread Ahmet Arslan

Hi,

I wondered the same before and failed to decipher TFIDFSimilarity.
Scoring looks like tf*idf*idf to me.

I appreciate someone who will shed some light on this.

Thanks,
Ahmet



On Friday, June 10, 2016 12:37 AM, Upayavira  wrote:
I've just done a very simple, single term query against a 4.10 system
and a 5.5 system, each with much the same data.

The score for the 4.10 system was essentially made up of the field
weight, which is:
   score = tf * idf 

Whereas, in the 5.5 system, there is an additional "query weight", which
is idf * query norm. If query norm is 1, then the final score is now:
  score = query_weight * field_weight
  = ( idf * 1 ) * (tf * idf)
  = tf * idf^2

Can anyone explain why this new "query weight" element has appeared in
our scores somewhere between 4.10 and 5.5?

Thanks!

Upayavira

4.10 score 
  "2937439": {
"match": true,
"value": 5.5993805,
"description": "weight(description:obama in 394012)
[DefaultSimilarity], result of:",
"details": [
  {
"match": true,
"value": 5.5993805,
"description": "fieldWeight in 394012, product of:",
"details": [
  {
"match": true,
"value": 1,
"description": "tf(freq=1.0), with freq of:",
"details": [
  {
"match": true,
"value": 1,
"description": "termFreq=1.0"
  }
]
  },
  {
"match": true,
"value": 5.5993805,
"description": "idf(docFreq=56010, maxDocs=5568765)"
  },
  {
"match": true,
"value": 1,
"description": "fieldNorm(doc=394012)"
  }
]
  }
]
5.5 score 
  "2502281":{
"match":true,
"value":28.51136,
"description":"weight(description:obama in 43472) [], result
of:",
"details":[{
"match":true,
"value":28.51136,
"description":"score(doc=43472,freq=1.0), product of:",
"details":[{
"match":true,
"value":5.339603,
"description":"queryWeight, product of:",
"details":[{
"match":true,
"value":5.339603,
"description":"idf(docFreq=31905,
maxDocs=2446459)"},
  {
"match":true,
"value":1.0,
"description":"queryNorm"}]},
  {
"match":true,
"value":5.339603,
"description":"fieldWeight in 43472, product of:",
"details":[{
"match":true,
"value":1.0,
"description":"tf(freq=1.0), with freq of:",
"details":[{
"match":true,
"value":1.0,
"description":"termFreq=1.0"}]},
  {
"match":true,
"value":5.339603,
"description":"idf(docFreq=31905,
maxDocs=2446459)"},
  {
"match":true,
"value":1.0,
"description":"fieldNorm(doc=43472)"}]}]}]},

Re: Question about multiple fq parameters

2016-06-08 Thread Ahmet Arslan

What is the meaning of 'op=Intersects' here?



On Thursday, June 9, 2016 12:20 AM, Mikhail Khludnev 
 wrote:
oh.. hold on. you might need the space in the later one

?=*=OR= {!field+f=DateB+op=Intersects v=$b}
{!field+f=DateA+op=Intersects
v=$a}=[2000-01-01+TO+2020-01-01]=[2020-01-01+TO+2030-01-01]&...

I don't tell you why


On Thu, Jun 9, 2016 at 12:17 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

>
> ?=*=OR=filter({!field+f=DateB+op=Intersects}[2000-01-01+TO+2020-01-01])
> filter({!field+f=DateA+op=Intersects}[2020-01-01+TO+2030-01-01])&...
>
> or
>
> ?=*=OR={!field+f=DateB+op=Intersects v=$b}
> {!field+f=DateA+op=Intersects
> v=$a}=[2000-01-01+TO+2020-01-01]=[2020-01-01+TO+2030-01-01]&...
>
>
>
> On Wed, Jun 8, 2016 at 11:46 PM, Steven White 
> wrote:
>
>> Thanks Mikhail.
>>
>> In that case, how do I force an OR for fq on DateA and DateB?  And to make
>> things more interesting, I will have other fq in my query on different
>> field such as ISBN, or the same field as DateA or DateB that will need to
>> be AND'ed with the query.
>>
>> Steve
>>
>> On Wed, Jun 8, 2016 at 4:31 PM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com
>> > wrote:
>>
>> > C'mon Steve, filters fq=& are always intersected.
>> >
>> > On Wed, Jun 8, 2016 at 11:28 PM, Steven White 
>> > wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > I cannot make sense of this so I hope someone here can shed some
>> light.
>> > >
>> > > The following gives me 0 hits (expected):
>> > >
>> > >
>> >
>> ?q=*=OR={!field+f=DateA+op=Intersects}[2020-01-01+TO+2030-01-01]
>> > >
>> > > The following gives me hits (expected):
>> > >
>> > >
>> >
>> ?q=*=OR={!field+f=DateB+op=Intersects}[2000-01-01+TO+2020-01-01]
>> > >
>> > > But the following (combine the above two) gives me 0 hits
>> (unexpected):
>> > >
>> > >
>> > >
>> >
>> ?=*=OR={!field+f=DateB+op=Intersects}[2000-01-01+TO+2020-01-01]={!field+f=DateA+op=Intersects}[2020-01-01+TO+2030-01-01]
>> > >
>> > > What am I missing?
>> > >
>> > > Thanks.
>> > >
>> > > Steve
>> > >
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>> > Principal Engineer,
>> > Grid Dynamics
>> >
>> > 

>> > 
>> >
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

Re: carrot2 label understanding(clustering)

2016-06-08 Thread Ahmet Arslan

Hi,

This is search result clustering.
Carrot2 also assigns labels to clusters. It automatically generates those 
labels.

Ahmet



On Wednesday, June 8, 2016 12:36 PM, Mugeesh Husain  wrote:
Hi,

I have a few question regarding clustering , i check out this link
https://cwiki.apache.org/confluence/display/solr/Result+Clustering.

In this article they implemented carrot2 for clustering.

Question:
1.)Is this  classification or clustering ? if not then where is clustering
in solr
2.)if we use these approach, what is label of this carrot2


Thanks
Mugeesh 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/carrot2-label-understanding-clustering-tp4281200.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan

Hi Lee,

May be you can find useful starting point on 
https://issues.apache.org/jira/browse/SOLR-1397

Please consider to contribute when you gather something working.

Ahmet




On Sunday, June 5, 2016 10:37 PM, Justin Lee <lee.justi...@gmail.com> wrote:
Thanks, yea, I looked at debug query too.  Unfortunately the output of
debug query doesn't quite do it.  For example, if you use a wildcard query,
it will simply explain the score associated with that wildcard query, not
the actual matching token.  In order words, if you search for "hour*" and
the actual matching text is "hours", debug query doesn't tell you that.
Instead, it just reports the score associated with "hour*".

The closest example I've ever found is this:

https://lucidworks.com/blog/2013/05/09/update-accessing-words-around-a-positional-match-in-lucene-4/

But this kind of approach won't let me use the full power of the Solr
ecosystem.  I'd basically be back to dealing with Lucene directly, which I
think is a step backwards.  I think the right approach is to write my own
SearchComponent, using the highlighter as a starting point.  But I wanted
to make sure there wasn't a simpler way.


On Sun, Jun 5, 2016 at 11:30 AM Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Well debug query has the list of token that caused match.
> If i am not mistaken i read an example about span query and spans thing.
> It was listing the positions of the matches.
> Cannot find the example at the moment..
>
> Ahmet
>
>
>
> On Sunday, June 5, 2016 9:10 PM, Justin Lee <lee.justi...@gmail.com>
> wrote:
> Thanks for the responses Alex and Ahmet.
>
> The TermVector component was the first thing I looked at, but what it gives
> you is offset information for every token in the document.  I'm trying to
> get a list of tokens that actually match the search query, and unless I'm
> missing something, the TermVector component doesn't give you that
> information.
>
> The TermSpans class does contain the right information, but again the hard
> part is: how do I reliably get a list of TokenSpans for the tokens that
> actually match the search query?  That's why I ended up in the highlighter
> source code, because the highlighter has to do just this in order to create
> snippets with accurate highlighting.
>
> Justin
>
>
> On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
>
> > Hi,
> >
> > May be org.apache.lucene.search.spans.TermSpans ?
> >
> >
> >
> > On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> > It sounds like TermVector component's output:
> >
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
> >
> > Perhaps with additional flags enabled (e.g. tv.offsets and/or
> > tv.positions).
> >
> > Regards,
> >Alex.
> > 
> > Newsletter and resources for Solr beginners and intermediates:
> > http://www.solr-start.com/
> >
> >
> >
> > On 5 June 2016 at 07:39, Justin Lee <lee.justi...@gmail.com> wrote:
> > > Is anyone aware of a way of getting a list of each matching token and
> > their
> > > offsets after executing a search?  The reason I want to do this is
> > because
> > > I have the physical coordinates of each token in the original document
> > > stored out of band, and I want to be able to highlight in the original
> > > document.  I would really like to have Solr return the list of matching
> > > tokens because then things like stemming and phrase matching will work
> as
> > > expected. I'm thinking of something like the highlighter component,
> > except
> > > instead of returning html, it would return just the matching tokens and
> > > their offsets.
> > >
> > > I have googled high and low and can't seem to find an exact answer to
> > this
> > > question, so I have spent the last few days examining the internals of
> > the
> > > various highlighting classes in Solr and Lucene.  I think the bulk of
> the
> > > action is in WeightedSpanTermExtractor and its interaction with
> > > getBestTextFragments in the Highlighter class.  But before I spend
> > anymore
> > > time on this I thought I'd ask (1) whether anyone knows of an easier
> way
> > of
> > > doing this, and (2) whether I'm at least barking up the right tree.
> > >
> > > Thanks much,
> > > Justin
> >
>

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan

Well debug query has the list of token that caused match.
If i am not mistaken i read an example about span query and spans thing.
It was listing the positions of the matches.
Cannot find the example at the moment..

Ahmet

On Sunday, June 5, 2016 9:10 PM, Justin Lee <lee.justi...@gmail.com> wrote:
Thanks for the responses Alex and Ahmet.

The TermVector component was the first thing I looked at, but what it gives
you is offset information for every token in the document.  I'm trying to
get a list of tokens that actually match the search query, and unless I'm
missing something, the TermVector component doesn't give you that
information.

The TermSpans class does contain the right information, but again the hard
part is: how do I reliably get a list of TokenSpans for the tokens that
actually match the search query?  That's why I ended up in the highlighter
source code, because the highlighter has to do just this in order to create
snippets with accurate highlighting.

Justin

On Sun, Jun 5, 2016 at 9:09 AM Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> May be org.apache.lucene.search.spans.TermSpans ?
>
>
>
> On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
> It sounds like TermVector component's output:
> https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
>
> Perhaps with additional flags enabled (e.g. tv.offsets and/or
> tv.positions).
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
>
> On 5 June 2016 at 07:39, Justin Lee <lee.justi...@gmail.com> wrote:
> > Is anyone aware of a way of getting a list of each matching token and
> their
> > offsets after executing a search?  The reason I want to do this is
> because
> > I have the physical coordinates of each token in the original document
> > stored out of band, and I want to be able to highlight in the original
> > document.  I would really like to have Solr return the list of matching
> > tokens because then things like stemming and phrase matching will work as
> > expected. I'm thinking of something like the highlighter component,
> except
> > instead of returning html, it would return just the matching tokens and
> > their offsets.
> >
> > I have googled high and low and can't seem to find an exact answer to
> this
> > question, so I have spent the last few days examining the internals of
> the
> > various highlighting classes in Solr and Lucene.  I think the bulk of the
> > action is in WeightedSpanTermExtractor and its interaction with
> > getBestTextFragments in the Highlighter class.  But before I spend
> anymore
> > time on this I thought I'd ask (1) whether anyone knows of an easier way
> of
> > doing this, and (2) whether I'm at least barking up the right tree.
> >
> > Thanks much,
> > Justin
>

Re: Getting a list of matching terms and offsets

2016-06-05 Thread Ahmet Arslan

Hi,

May be org.apache.lucene.search.spans.TermSpans ?



On Sunday, June 5, 2016 7:59 AM, Alexandre Rafalovitch  
wrote:
It sounds like TermVector component's output:
https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component

Perhaps with additional flags enabled (e.g. tv.offsets and/or tv.positions).

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/



On 5 June 2016 at 07:39, Justin Lee  wrote:
> Is anyone aware of a way of getting a list of each matching token and their
> offsets after executing a search?  The reason I want to do this is because
> I have the physical coordinates of each token in the original document
> stored out of band, and I want to be able to highlight in the original
> document.  I would really like to have Solr return the list of matching
> tokens because then things like stemming and phrase matching will work as
> expected. I'm thinking of something like the highlighter component, except
> instead of returning html, it would return just the matching tokens and
> their offsets.
>
> I have googled high and low and can't seem to find an exact answer to this
> question, so I have spent the last few days examining the internals of the
> various highlighting classes in Solr and Lucene.  I think the bulk of the
> action is in WeightedSpanTermExtractor and its interaction with
> getBestTextFragments in the Highlighter class.  But before I spend anymore
> time on this I thought I'd ask (1) whether anyone knows of an easier way of
> doing this, and (2) whether I'm at least barking up the right tree.
>
> Thanks much,
> Justin

Re: debugging solr query

2016-05-27 Thread Ahmet Arslan

Hi Jay,

Please separate the clauses. Feed one of them to the main q parameter with 
content score operator =^ since you are sorting on a structured field(e.g. date)

q:fieldB:(123 OR 456)^=1.0
=dt1:[date1 TO *]
=dt2:[* TO NOW/DAY+1]
=fieldA:abc
=dt1 asc,field2 asc, fieldC desc

Play with the caches.
Also consider disabling caching, and/or supplying execution order for the filer 
queries.
Please see :  
https://lucidworks.com/blog/2012/02/10/advanced-filter-caching-in-solr/

Ahmet



On Friday, May 27, 2016 4:01 PM, Jay Potharaju <jspothar...@gmail.com> wrote:
I updated almost 1/3 of the data and ran my queries with new columns as
mentioned earlier. The query returns data in  almost half the time as
compared to before.
I am thinking that if I update all the columns there would not be much
difference in query response time.

 

 

Are there any suggestions on how handle filtering/querying/sorting on high
cardinality date fields?

Index size: 30Million
Solr: 4.3.1

Thanks

On Thu, May 26, 2016 at 6:04 AM, Jay Potharaju <jspothar...@gmail.com>
wrote:

> Hi,
> Thanks for the feedback. The queries I run are very basic filter queries
> with some sorting.
>
> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>
> I noticed that the date fields(dt1,dt2) are using date instead of tdate
> fields & there are no docValues set on any of the fields used for sorting.
>
> In order to fix this I plan to add a new field using tdate & docvalues
> where required to the schema & update the new columns only for documents
> that have fieldA set to abc. Once the fields are updated query on the new
> fields to measure query performance .
>
>
>- Would the new added fields be used effectively by the solr index
>when querying & filtering? What I am not sure is whether only populating
>small number of documents(fieldA:abc) that are used for the above query
>provide performance benefits.
>- Would there be a performance penalty because majority of the
>documents(!fieldA:abc) dont have values in the new columns?
>
> Thanks
>
> On Wed, May 25, 2016 at 8:40 PM, Jay Potharaju <jspothar...@gmail.com>
> wrote:
>
>> Any links that illustrate and talk about solr internals and how
>> indexing/querying works would be a great help.
>> Thanks
>> Jay
>>
>> On Wed, May 25, 2016 at 6:30 PM, Jay Potharaju <jspothar...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> Thanks for the feedback. The queries I run are very basic filter queries
>>> with some sorting.
>>>
>>> q:*:*=(dt1:[date1 TO *] && dt2:[* TO NOW/DAY+1]) && fieldA:abc &&
>>> fieldB:(123 OR 456)=dt1 asc,field2 asc, fieldC desc
>>>
>>> I noticed that the date fields(dt1,dt2) are using date instead of tdate
>>> fields & there are no docValues set on any of the fields used for sorting.
>>>
>>> In order to fix this I plan to add a new field using tdate & docvalues
>>> where required to the schema & update the new columns only for documents
>>> that have fieldA set to abc. Once the fields are updated query on the new
>>> fields to measure query performance .
>>>
>>>
>>>- Would the new added fields be used effectively by the solr index
>>>when querying & filtering? What I am not sure is whether only populating
>>>small number of documents(fieldA:abc) that are used for the above query
>>>provide performance benefits.
>>>- Would there be a performance penalty because majority of the
>>>documents(!fieldA:abc) dont have values in the new columns?
>>>
>>>
>>> Thanks
>>> Jay
>>>
>>> On Tue, May 24, 2016 at 8:06 PM, Erick Erickson <erickerick...@gmail.com
>>> > wrote:
>>>
>>>> Try adding debug=timing, that'll give you an idea of what component is
>>>> taking all the time.
>>>> From there, it's "more art than science".
>>>>
>>>> But you haven't given us much to go on. What is the query? Are you
>>>> grouping?
>>>> Faceting on high-cardinality fields? Returning 10,000 rows?
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Tue, May 24, 2016 at 4:52 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
>>>> wrote:
>>>> >
>>>> >
>>>> > Hi,
>>>> >
>>>> > Is it QueryComponent taking time?
>>>> > Ot other components?
>>>> >
>>>> > Also

Re: How can Most Popular Search be implemented in Solr?

2016-05-27 Thread Ahmet Arslan

Hi,

Solr does not explicitly save incoming/maintain queries.
* Some people save queries at the UI side.
* Some folks enable Solr logging and then extract useful query, numFound, 
QTime, etc information from logs: http://soleami.com
* Others identify searches that return zero documents (missing content 
detection)
* Commercial solutions : https://sematext.com/site-search-analytics/

You might be interested in this book:
http://rosenfeldmedia.com/books/search-analytics-for-your-site/

Ahmet



On Friday, May 27, 2016 9:01 AM, Syedabbasmehdi Rizvi 
 wrote:
Hi,

Below is my question:


I want to implement Most Popular search in Solr. Is there any OOTB 
functionality in Solr that can achieve this?
I have had a good look in 
StatsComponent as well as 
TermsComponent
 but did not achieve what I want. What I basically want is that I get a set of 
keywords or phrases that were searched most frequently over a given period of 
time in Solr. Does Solr maintain a record of all the search that was made? If 
no then how can I get record of these searches?

Could you please help me out.

Regards
Abbas


::DISCLAIMER::


The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information 
could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in 
transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on 
the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the 
author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, 
dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written 
consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please 
delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and 
other defects.

Re: how can we use multi term search along with stop words

2016-05-26 Thread Ahmet Arslan

Hi,

Are you firing both trailing and leading wildcard query?
Or you just put stars for emphasizing purposes?

Please consider using normal queries, since you are already using a tokenized 
field.

By the way what is 'tollc soon'?

Ahmet

On Thursday, May 26, 2016 4:33 PM, Preeti Bhat <preeti.b...@shoregrp.com> wrote:
Hi Ahmet & Sid,

Thanks for the reply

I have the below requirement
1) If I search with say company_nm:*llc* then we should not return any results  
or only few results where llc is embedded in other words like tollc soon. So I 
had implemented the stopwords.
2) But If I search with say company_nm:*google llc* then it should return the 
result of google llc  and soon.

The problem here is 1st part is working perfectly, while the second part is not 
working.

Thanks and Regards,
Preeti Bhat
Shore Group Associates LLC
(C) +91-996-644-8187
www.ShoreGroupAssociates.com

-Original Message-
From: Siddhartha Singh Sandhu [mailto:sandhus...@gmail.com]
Sent: Thursday, May 26, 2016 6:54 PM
To: solr-user@lucene.apache.org; Ahmet Arslan
Subject: Re: how can we use multi term search along with stop words

Hi Preeti,

You can use the analysis tool in the Solr console to see how your queries are 
being tokenized. Based on your results you might need to make changes in 
"strings_ci".

Also, If you want to be able to search on stopwords you might want to remove 
solr.StopFilterFactory from indexing and query analyzer of "strings_ci". The 
stopwords.txt is present in the core conf directory. You will need to re-index 
after you make these changes.

Regards,

Sid.

On Thu, May 26, 2016 at 7:26 AM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi Bhat,
>
> What do you mean by multi term search?
> In your first e-mail, your example uses quotes, which means
> phrase/proximity search.
>
> ahmet
>
>
>
> On Thursday, May 26, 2016 11:49 AM, Preeti Bhat
> <preeti.b...@shoregrp.com>
> wrote:
> HI All,
>
> Sorry for asking the same question again, but could someone please
> advise me on this.
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
> From: Preeti Bhat
> Sent: Wednesday, May 25, 2016 2:22 PM
> To: solr-user@lucene.apache.org
> Subject: how can we use multi term search along with stop words
>
> HI,
>
> I am trying to search the field named company_nm with value "Google llc".
> We have the stopword on "llc", so when I try to search it returns 0
> results. Could anyone please guide me through the process of using
> stopwords in multi term search.
>
> Please note I am using solr 6.0.0 and using standard parser.
>
> 
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
> 
> 
>  words="stopwords.txt" ignoreCase="true"/>
>   
>   
>   
> 
>   
> 
>  stored="true"/>
>
>
> Thanks and Regards,
> Preeti Bhat
>
>
>
> NOTICE TO RECIPIENTS: This communication may contain confidential
> and/or privileged information. If you are not the intended recipient
> (or have received this communication in error) please notify the
> sender and it-supp...@shoregrp.com immediately, and destroy this
> communication. Any unauthorized copying, disclosure or distribution of
> the material in this communication is strictly forbidden. Any views or
> opinions presented in this email are solely those of the author and do
> not necessarily represent those of the company. Finally, the recipient
> should check this email and any attachments for the presence of
> viruses. The company accepts no liability for any damage caused by any virus 
> transmitted by this email.

>

NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.

Re: sort by custom function of similarity score

2016-05-26 Thread Ahmet Arslan

Hi,

Probably, using the 'query' function query, which returns the score of a given 
query.
https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-UsingFunctionQuery




On Thursday, May 26, 2016 1:59 PM, aanilpala  wrote:
is it allowed to provide a sort function (sortspec) that is using similarity
score. for example something like in the following:

sort=product(2,score) desc

seems that it won't work. is there an alternative way to achieve this?

using solr6

thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sort-by-custom-function-of-similarity-score-tp4279228.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how can we use multi term search along with stop words

2016-05-26 Thread Ahmet Arslan

Hi Bhat,

What do you mean by multi term search?
In your first e-mail, your example uses quotes, which means phrase/proximity 
search.

ahmet



On Thursday, May 26, 2016 11:49 AM, Preeti Bhat  
wrote:
HI All,

Sorry for asking the same question again, but could someone please advise me on 
this.


Thanks and Regards,
Preeti Bhat


From: Preeti Bhat
Sent: Wednesday, May 25, 2016 2:22 PM
To: solr-user@lucene.apache.org
Subject: how can we use multi term search along with stop words

HI,

I am trying to search the field named company_nm with value "Google llc". We 
have the stopword on "llc", so when I try to search it returns 0 results. Could 
anyone please guide me through the process of using stopwords in multi term 
search.

Please note I am using solr 6.0.0 and using standard parser.


  



  
  



  
  
  

  




Thanks and Regards,
Preeti Bhat



NOTICE TO RECIPIENTS: This communication may contain confidential and/or 
privileged information. If you are not the intended recipient (or have received 
this communication in error) please notify the sender and 
it-supp...@shoregrp.com immediately, and destroy this communication. Any 
unauthorized copying, disclosure or distribution of the material in this 
communication is strictly forbidden. Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those of 
the company. Finally, the recipient should check this email and any attachments 
for the presence of viruses. The company accepts no liability for any damage 
caused by any virus transmitted by this email.

Re: debugging solr query

2016-05-24 Thread Ahmet Arslan



Hi,

Is it QueryComponent taking time? 
Ot other components?

Also make sure there is plenty of RAM for OS cache.

Ahmet

On Wednesday, May 25, 2016 1:47 AM, Jay Potharaju  wrote:



Hi,
I am trying to debug solr performance problems on an old version of solr,
4.3.1.
The queries are taking really long -in the range of 2-5 seconds!!.
Running filter query with only one condition also takes about a second.

There is memory available on the box for solr to use. I have been looking
at the following link but was looking for some more reference that would
tell me why a particular query is slow.

https://wiki.apache.org/solr/SolrPerformanceProblems

Solr version:4.3.1
Index size:128 GB
Heap:65 GB
Index size:75 GB
Memory usage:70 GB

Even though there is available memory is high all is not being used ..i
would expect the complete index to be in memory but it doesnt look like it
is. Any recommendations ??

-- 
Thanks
Jay

Re: highlight don't work if df not specified

2016-05-23 Thread Ahmet Arslan

Hi Solomon,

How come 
hl.q=blah blah=normal_text,title 
would produce "undefined field text" error message?

Please try 
hl.q=blah blah=normal_text,title
just to verify there is a problem with the fielded queries.

Ahmet

On Monday, May 23, 2016 10:31 AM, michael solomon <micheal...@gmail.com> wrote:
Hi,
When I'm increase hl.maxAnalyzedChars nothing happened.

AND

hl.q=blah blah=normal_text,title
I get:

"error":{
"metadata":[
  "error-class","org.apache.solr.common.SolrException",
  "root-error-class","org.apache.solr.common.SolrException"],
"msg":"undefined field text",
"code":400}}




On Sun, May 22, 2016 at 5:34 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> What happens when you increase hl.maxAnalyzedChars?
>
> OR
>
> hl.q=blah blah=normal_text,title
>
> Ahmet
>
>
>
> On Sunday, May 22, 2016 5:24 PM, michael solomon <micheal...@gmail.com>
> wrote:
>  "true" stored="true"/>
> 
>
>
> On Sun, May 22, 2016 at 5:18 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
>
> > Hi,
> >
> > Weird, are your fields stored?
> >
> >
> >
> > On Sunday, May 22, 2016 5:14 PM, michael solomon <micheal...@gmail.com>
> > wrote:
> > Thanks Ahmet,
> > It was mistake in the question, sorry, in the quey I wrote it properly.
> >
> >
> > On Sun, May 22, 2016 at 5:06 PM, Ahmet Arslan <iori...@yahoo.com.invalid
> >
> > wrote:
> >
> > > Hi,
> > >
> > > q=normal_text:"bla bla":"bla bla"
> > >
> > > should be
> > > q=+normal_text:"bla bla" +title:"bla bla"
> > >
> > >
> > >
> > > On Sunday, May 22, 2016 4:52 PM, michael solomon <micheal...@gmail.com
> >
> > > wrote:
> > > Hi,
> > > I'm I query multiple fields in solr:
> > > q=normal_text:"bla bla":"bla bla" 
> > >
> > > I turn on the highlighting, but it doesn't work even when I fill hl.fl.
> > > it work when I fill df(default field) parameter, but then it's
> highlights
> > > only one field.
> > > What the problem?
> > > Thanks,
> > > michael
> > >
> >
>

Re: highlight don't work if df not specified

2016-05-22 Thread Ahmet Arslan

Hi,

What happens when you increase hl.maxAnalyzedChars?

OR

hl.q=blah blah=normal_text,title

Ahmet



On Sunday, May 22, 2016 5:24 PM, michael solomon <micheal...@gmail.com> wrote:




On Sun, May 22, 2016 at 5:18 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> Weird, are your fields stored?
>
>
>
> On Sunday, May 22, 2016 5:14 PM, michael solomon <micheal...@gmail.com>
> wrote:
> Thanks Ahmet,
> It was mistake in the question, sorry, in the quey I wrote it properly.
>
>
> On Sun, May 22, 2016 at 5:06 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
> wrote:
>
> > Hi,
> >
> > q=normal_text:"bla bla":"bla bla"
> >
> > should be
> > q=+normal_text:"bla bla" +title:"bla bla"
> >
> >
> >
> > On Sunday, May 22, 2016 4:52 PM, michael solomon <micheal...@gmail.com>
> > wrote:
> > Hi,
> > I'm I query multiple fields in solr:
> > q=normal_text:"bla bla":"bla bla" 
> >
> > I turn on the highlighting, but it doesn't work even when I fill hl.fl.
> > it work when I fill df(default field) parameter, but then it's highlights
> > only one field.
> > What the problem?
> > Thanks,
> > michael
> >
>

Re: highlight don't work if df not specified

2016-05-22 Thread Ahmet Arslan

Hi,

Weird, are your fields stored?

On Sunday, May 22, 2016 5:14 PM, michael solomon <micheal...@gmail.com> wrote:
Thanks Ahmet,
It was mistake in the question, sorry, in the quey I wrote it properly.

On Sun, May 22, 2016 at 5:06 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> q=normal_text:"bla bla":"bla bla"
>
> should be
> q=+normal_text:"bla bla" +title:"bla bla"
>
>
>
> On Sunday, May 22, 2016 4:52 PM, michael solomon <micheal...@gmail.com>
> wrote:
> Hi,
> I'm I query multiple fields in solr:
> q=normal_text:"bla bla":"bla bla" 
>
> I turn on the highlighting, but it doesn't work even when I fill hl.fl.
> it work when I fill df(default field) parameter, but then it's highlights
> only one field.
> What the problem?
> Thanks,
> michael
>

Re: highlight don't work if df not specified

2016-05-22 Thread Ahmet Arslan

Hi,

q=normal_text:"bla bla":"bla bla"

should be 
q=+normal_text:"bla bla" +title:"bla bla"



On Sunday, May 22, 2016 4:52 PM, michael solomon  wrote:
Hi,
I'm I query multiple fields in solr:
q=normal_text:"bla bla":"bla bla" 

I turn on the highlighting, but it doesn't work even when I fill hl.fl.
it work when I fill df(default field) parameter, but then it's highlights
only one field.
What the problem?
Thanks,
michael

Re: How to use a regex search within a phrase query?

2016-05-22 Thread Ahmet Arslan

Hi Erez,

I don't think it is possible to combine regex with phrase out-of-the-box.
However, there is https://issues.apache.org/jira/browse/LUCENE-5205 for the 
task.

Can't you define your query in terms of pure regex?
something like /[0-9]{3} .* [0-9]{4}/

ahmet


On Sunday, May 22, 2016 1:37 PM, Erez Michalak  wrote:
Hey,
I'm developing a search application based on SOLR 5.3.1, and would like to add 
to it regex search capabilities on a specific tokenized text field named 
'content'.
Is it possible to combine the default regex syntax within a phrase query (and 
moreover, within a proximity search)? If so, please instruct me how..

Thanks in advance,
Erez Michalak

p.s.
Maybe the following example will make my question clearer:
The query content:/[0-9]{3}/ returns documents with (at least one) 3 digits 
token as expected.
However,

* the query content:"/[0-9]{3}/ /[0-9]{4}/" doesn't match the contents 
'123-1234' and '123 1234', even though they are tokenized to two tokens ('123' 
and '1234') which individually match each part of the query

* the query content:"/[0-9]{3}/ example" doesn't match the content '123 
example'

* even the query content:"/[0-9]{3}/" (same as the query that works but 
surrounded with quotation marks) doesn't return documents with 3 digits token!

* etc.



This email and any attachments thereto may contain private, confidential, and 
privileged material for the sole use of the intended recipient. Any review, 
copying, or distribution of this email (or any attachments thereto) by others 
is strictly prohibited. If you are not the intended recipient, please contact 
the sender immediately and permanently delete the original and any copies of 
this email and any attachments thereto.

Re: indexing dovecot mailbox

2016-05-22 Thread Ahmet Arslan

Hi Andreas,

Exactly, SimplePostTool does not recognize/support the file-ending.

If they are text files, you can change file exception to *.txt, post tool will 
grab them.

If you have some code to read those files, you can use SolrJ to roll your own 
indexer
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Sorry I am not familiar with these e-mail staff, may be Apache Tika can 
read/recognize these mail files.

Ahmet



On Sunday, May 22, 2016 1:14 PM, Andreas Meyer <a.me...@nimmini.de> wrote:
Hello!

The files I want to index are IMAP-folders of dovecot, Maildir.

bitmachine1:/home/a.meyer/Postfach/cur # file 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S: SMTP mail, ASCII text

I can read them with the Midnight Commeander. Has it something to do
with the file-ending not recognized?

Andreas


Ahmet Arslan <iori...@yahoo.com.INVALID> schrieb am 22.05.16 um 00:46:32 Uhr:

> Hi Meyer,
> 
> Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize 
> certain file types.
> They (xml,json,...,log) are actually listed in the log msg in your email.
> 
> Can you describe the format of the files that you want to index?
> Are they text files?
> 
> ahmet
> 
> 
> 
> On Sunday, May 22, 2016 1:16 AM, Andreas Meyer <a.me...@nimmini.de> wrote:
> Hello!
> 
> Bear with me, I am new to solr and everything is very
> complex. Don't know how the thing is working.
> 
> I installed solr-5.5.1.tgz and got it running. Try to
> index a mailbox of dovecot with
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> 
> after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
> as schema.xml, but no files other than dovecot.index.log and 
> dovecot.mailbox.log
> are indexed.
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> /usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
> -Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
> org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/myfiles/update...
> Entering auto mode. File endings considered are 
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
> POSTing file dovecot.index.log (text/plain) to [base]/extract
> POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
> Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
> 2 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
> Time spent: 0:00:02.976
> 
> I was hoping the post command would index the email in 
> /home/a.meyer/Postfach/cur,
> but it doesn't. The content of this folder looks like this:
> 
> -rw--- 1 a.meyer users   4764 25. Apr 13:27 
> 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
> -rw--- 1 a.meyer users 276318 26. Apr 17:48 
> 1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
> -rw--- 1 a.meyer users   4578 27. Apr 17:16 
> 1461770179.Vfe03I10010aM756286.bitmachine1:2,S
> -rw--- 1 a.meyer users  16981  3. Mai 10:12 
> 1462263159.Vfe03I1000c5M88.bitmachine1:2,RS
> 
> What did I miss? Could need some help with this one.
> 
> Kind regards
> 
>   Andreas

Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan



Hi,

You might be also interested in the MailEntityProcessor of DataImportHandler.

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors



On Sunday, May 22, 2016 3:46 AM, Ahmet Arslan <iori...@yahoo.com.INVALID> wrote:
Hi Meyer,

Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain 
file types.
They (xml,json,...,log) are actually listed in the log msg in your email.

Can you describe the format of the files that you want to index?
Are they text files?

ahmet




On Sunday, May 22, 2016 1:16 AM, Andreas Meyer <a.me...@nimmini.de> wrote:
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas

Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan

Hi Meyer,

Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain 
file types.
They (xml,json,...,log) are actually listed in the log msg in your email.

Can you describe the format of the files that you want to index?
Are they text files?

ahmet



On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas

Re: Solrj 4.7.2 - slowing down over time

2016-05-19 Thread Ahmet Arslan

Hi,

EmbeddedSolrServer bypass the servlet container.
Please see : 
http://find.searchhub.org/document/a88f669d38513a76




On Thursday, May 19, 2016 6:23 PM, Roman Slavik  wrote:
Hi Ahmet,
thanks for your response, I appreciate it.

I thought that EmbeddedSolrServer is just wrapper around Solr core
functionality. Solr 4.7.2 is (was?) distributed as war file and I didn't
found any mention about compatibility problem with Tomcat. 
Maybe with jetty it would work slightly faster, but I don't think this
causes problem we have.


Roman



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-4-7-2-slowing-down-over-time-tp4277519p429.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solrj 4.7.2 - slowing down over time

2016-05-18 Thread Ahmet Arslan

Hi Roman,

You said you were using EmbeddedSolrServer, also you mention Tomcat.
I don't think it is healthy to use both.
Also I wouldn't use EmbeddedSolrServer at all.
It is rarely used and there can be hidden things there.

Consider using jetty which is actually tested.

Since you commit every minute, optimize could be dropped too. 

Ahmet


On Thursday, May 19, 2016 12:24 AM, Joel Bernstein  wrote:
One thing to investigate is whether your caches are too large and gradually
filling up memory. It does sound like memory is getting tighter over time.
A memory profiler would be helpful in figuring out memory issues.

Moving to autoCommits would also eliminate any slowness due to overlapping
searchers from committing too frequently.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, May 18, 2016 at 10:39 AM, Roman Slavík  wrote:

> Hi all,
>
> we're using solr in our application and have problem that both searching
> and indexing is slowing down over time.
>
> Versions:
> -
> Java 1.7
> Solr 4.7.2
> Lucene 4.1 (luceneMatchVersion param in solrconfig.xml)
>
> App architecture:
> -
> We don't use solr as standalone application, but we integrated him with
> solrj library into our app.
> It has 1 CoreContainer with 3 cores (let's call them alpha, beta, gamma).
> EmbeddedSolrServer is used as java class to work with each core.
> Application does both searching and indexing.
>
> What's going on:
> 
> After Tomcat restart everything is working great for 2-3 days. But after
> this time solr (all cores) starts slw down until it's unusable and we
> have to restart Tomcat. Then it works fine.
> For example search time for really complex query is 1,5 s when it works
> fine. But then it rises to more than 1 min. Same issue is with indexing -
> first fast but then slow.
>
> Searching:
> --
> Core apha is used mainly for normal search. But sometimes for faceting too.
> Beta and gamma are only for facets.
> alpha: 25000 queries/day
> beta: 7000 queries/day
> gamma: 7000 queries/day
> We do lots of query joins, sometimes cross cores.
>
> Indexing:
> -
> We commit changes continuously over day. Number of commits is limited to 1
> commit/min for all three cores. So we do max 1440 commits daily. One commit
> contains between 1 and 100 docs.
> Method EmbeddedSolrServer.add(SolrInputDocument) is used and in the end
> EmbeddedSolrServer.commit().
> Every night we call EmbeddedSolrServer.optimize() on each core.
>
> Index size:
> ---
> alpha: 13,5 GB
> beta: 300 MB
> gamma: 600 MB
>
> Hardware:
> -
> Ubuntu 14.04
> 8 core CPU
> java heap space 22 GB RAM
> SSD drive with more than 50 GB free space
>
> Solr config (Same configuration is used for all cores):
> ---
>  class="${solr.directoryFactory:solr.MMapDirectoryFactory}"/>
> LUCENE_41
> 
> false
> 10
>
> 32
> 1
> 1000
> 1
>
> native
> false
> true
>
> 
>   1
>   0
> 
> 
>
> 
>
>  autowarmCount="0"/>
>  autowarmCount="0"/>
>  autowarmCount="0"/>
>
> true
> false
> 2
>
>
> Conclusion:
> ---
> Is something wrong in configuration? Or is this some kind of bug? Or...?
> Can you give me some advice how to resolve this problem please?
>
>
> Roman
>
>
>
>
>
>
>
>
>
>
>

Re: Precision, Recall, ROC in solr

2016-05-18 Thread Ahmet Arslan

Hi Tentri,

Evaluation in IR primary carried out by traditional TREC-style (also referred 
to as Cranfield paradigm) evaluation methodology.
The evaluation methodology requires a document collection, a set of information 
needs (called topics or queries), and a set of query relevance judgments 
(qrels) (right answers).

qrels is the most labor expensive since it requires human assessors.

Once you have these three elements, then you can use evaluation scripts such as 
http://trec.nist.gov/trec_eval/trec_eval_latest.tar.gz
to calculate effectiveness metrics (recall, precision, etc).

See for an example :
http://www-personal.umich.edu/~kevynct/trec-web-2014/
ahmet



On Wednesday, May 18, 2016 3:37 PM, Tentri Oktaviani 
 wrote:
Hi solr users,

My final task on college is making a search engine. I'm using solr to
access and retrieve data from ontology which later will be used as
corpuses. I'm entirely new to these (information retrieval, ontology,
python and solr) things.

There's a step in information retrieval to evaluate the query result. I'm
planning to use Precision, Recall, and ROC score to evaluate this. Is there
any way I can use function in solr to calculate the score of precision,
recall, and ROC? From solr interface or even the codes behind is doesn't
matter.

Thank you in advance.

Regards,
Tentri

Re: Filter query (fq) on comma seperated value does not work

2016-05-16 Thread Ahmet Arslan

Hi,

I think stock example from official ref guide will do the trick.
Please see :
https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-RegularExpressionPatternTokenizer

Ahmet

On Tuesday, May 17, 2016 8:31 AM, SRINI SOLR <srini.s...@gmail.com> wrote:
Hi Ahmet / Team -
Thanks for your quick response...

Can you please help me out on this PatternTokenizer configuration...
Here we are using configuration as below ...

And also - I have made changes to the field value so that it is separated
by space instead of commas and indexed the data as such... And now I was
able to retrieve the expected results.

But Still Can you help me out in achieving the results using the comma as
you suggested.

Thanks & Regards

On Mon, May 16, 2016 at 5:50 PM, Ahmet Arslan <iori...@yahoo.com.invalid>
wrote:

> Hi,
>
> Its all about how you tokenize the category field.
> It looks like you are using a string type, which does not tokenize at all
> (e.g. verbatim)
> Please use a PatterTokenizer and configure it so that it splits on comma.
>
> Ahmet
>
>
>
> On Monday, May 16, 2016 2:11 PM, SRINI SOLR <srini.s...@gmail.com> wrote:
> Hi Team -
> Can you please help me out on the following ...
>
> I have a following field in the solr document which has the comma seperated
> values like below ..
>
> 1,456,768,345  doc1
> 456 doc2
> 1,456  doc3
>
> So - Here I need to filter the search docs which contains category is
> 456...
> when i do like following ...
>
> fq=category:456
>
> it is returning only one document doc2 which has only category is 456.
> 456
>
> But I need other two also which as this category 456
>
> Can you please help me out to achieve this ...
>
>
> Thanks & Regards
>

Re: easiest way to search parts of words

2016-05-16 Thread Ahmet Arslan

Hi Gates,

There are two approaches:

1) Use a wildcard query with star operator q=consult*

2) Create an index with EdgeNGramFilterFactory and issue a regular search 
q=consult

(2) will be faster at the cost of bigger index size
You don't need to change anything for (1) if the execution time is satisfactory.

Ahmet



On Tuesday, May 17, 2016 12:11 AM, M Gates  wrote:

Hi

Wondering if someone can guide me to how I can search words by part. 

ie: how to return the word ‘consultation' by say entering a query with just the 
word ‘consult’.

How does one do this in Solr ?

Thanks,
Mark

Re: Filter query (fq) on comma seperated value does not work

2016-05-16 Thread Ahmet Arslan

Hi,

Its all about how you tokenize the category field.
It looks like you are using a string type, which does not tokenize at all (e.g. 
verbatim)
Please use a PatterTokenizer and configure it so that it splits on comma.

Ahmet



On Monday, May 16, 2016 2:11 PM, SRINI SOLR  wrote:
Hi Team -
Can you please help me out on the following ...

I have a following field in the solr document which has the comma seperated
values like below ..

1,456,768,345  doc1
456 doc2
1,456  doc3

So - Here I need to filter the search docs which contains category is 456...
when i do like following ...

fq=category:456

it is returning only one document doc2 which has only category is 456.
456

But I need other two also which as this category 456

Can you please help me out to achieve this ...


Thanks & Regards

Re: URL parameters combined with text param

2016-05-13 Thread Ahmet Arslan

Hi,

In the first debug query response, special words are also queries so it is not 
working.
Not sure edismax query parser recognizes _query_ field. But lucene query parser 
does.
Try to switch to lucene query parser.

Also if you can divide your query words into q and fq below will work:

q=hospital=lucene={!lucene q.op=AND v=$a}=Leapfrog


Ahmet
On Friday, May 13, 2016 9:01 AM, Bastien Latard - MDPI AG 
<lat...@mdpi.com.INVALID> wrote:



Thanks both!

I already tried "=true", but it doesn't tell me that much...Or at 
least, I don't see any problem...
Below are the responses...

1. /select?q=hospital AND_query_:"{!q.op=AND 
v=$a}"=abstract,title=hospital Leapfrog=true



0
280

 hospital AND_query_:"{!q.op=AND v=$a}"
 hospital Leapfrog
 true
 abstract,title




 hospital AND_query_:"{!q.op=AND v=$a}"
 hospital AND_query_:"{!q.op=AND v=$a}"
 (+(DisjunctionMaxQuery((abstract:hospit | 
title:hospit | authors:hospital | doi:hospital)) 
DisjunctionMaxQuery(((Synonym(abstract:and abstract:andqueri) 
abstract:queri) | (Synonym(title:and title:andqueri) title:queri) | 
(Synonym(authors:and authors:andquery) authors:query) | 
doi:and_query_:)) DisjunctionMaxQuery((abstract:"(q qopand) op and (v 
va) a" | title:"(q qopand) op and (v va) a" | authors:"(q qopand) op and 
(v va) a" | doi:"{!q.op=and v=$a}"/no_coord
 +((abstract:hospit | title:hospit 
| authors:hospital | doi:hospital) ((Synonym(abstract:and 
abstract:andqueri) abstract:queri) | (Synonym(title:and title:andqueri) 
title:queri) | (Synonym(authors:and authors:andquery) authors:query) | 
doi:and_query_:) (abstract:"(q qopand) op and (v va) a" | title:"(q 
qopand) op and (v va) a" | authors:"(q qopand) op and (v va) a" | 
doi:"{!q.op=and v=$a}")

ExtendedDismaxQParser





[...]





2. /select?q=_query_:"{!q.op=AND v='hospital'}"+_query_:"{!q.op=AND 
v=$a}"=hospital Leapfrog=true



   0
   2
   
 _query_:"{!q.op=AND v='hospital'}" 
_query_:"{!q.op=AND v=$a}"
 hospital Leapfrog
 true
 true
   




   _query_:"{!q.op=AND v='hospital'}" 
_query_:"{!q.op=AND v=$a}"
   _query_:"{!q.op=AND v='hospital'}" 
_query_:"{!q.op=AND v=$a}"
   (+())/no_coord
   +()
   
   ExtendedDismaxQParser
   
   
   
   
   
 [...]
   



On 12/05/2016 17:06, Erick Erickson wrote:
> Try adding =query to your query and look at the parsed results.
> This shows you exactly what Solr sees rather than what you think
> it should.
>
> Best,
> Erick
>
> On Thu, May 12, 2016 at 6:24 AM, Ahmet Arslan <iori...@yahoo.com.invalid> 
> wrote:
>> Hi,
>>
>> Well, what happens
>>
>> q=hospital={!lucene q.op=AND v=$a}=hospital Leapfrog
>>
>> OR
>>
>> q=+_query_:"{!lucene q.op=AND v='hospital'}" +_query_:"{!lucene q.op=AND 
>> v=$a}"=hospital Leapfrog
>>
>>
>> Ahmet
>>
>>
>> On Thursday, May 12, 2016 3:28 PM, Bastien Latard - MDPI AG 
>> <lat...@mdpi.com.INVALID> wrote:
>> Hi Ahmet,
>>
>> Thanks for your answer, but this doesn't work on my local index.
>> q1 returns 2 results.
>>
>> http://localhost:8983/solr/my_core/select?q=hospital AND
>> _query_:"{!q.op=AND%20v=$a}"=abstract,title=hospital Leapfrog
>> ==> returns 254 results (the same as
>> http://localhost:8983/solr/my_core/select?q=hospital )
>>
>> Kind regards,
>> Bastien
>>
>> On 11/05/2016 16:06, Ahmet Arslan wrote:
>>> Hi Bastien,
>>>
>>> Please use magic _query_ field, q=hospital AND _query_:"{!q.op=AND v=$a}"
>>>
>>> ahmet
>>>
>>>
>>> On Wednesday, May 11, 2016 2:35 PM, Latard - MDPI AG 
>>> <lat...@mdpi.com.INVALID> wrote:
>>> Hi Everybody,
>>>
>>> Is there a way to pass only some of the data by reference and some
>>> others in the q param?
>>>
>>> e.g.:
>>>
>>> q1.   http://localhost:8983/solr/my_core/select?{!q.op=OR
>>> v=$a}=abstract,title=hospital Leapfrog=true
>>>
>>> q1a.  http://localhost:8983/solr/my_core/select?q=hospital AND
>>> Leapfrog=abstract,title
>>>
>>> q2.  http://localhost:8983/solr/my_core/select?q=hospital AND
>>> ({!q.op=AND v=$a})=abstract,title=hospital Leapfrog
>>>
>>> q1 & q1a  are returning the same results, but q2 is somehow not
>>> analyzing the $a parameter properly...
>>>
>>> Am I missing anything?
>>>
>>> Kind regards,
>>> Bastien Latard
>>> Web engineer
>>
>> Kind regards,
>> Bastien Latard
>> Web engineer
>> --
>> MDPI AG
>> Postfach, CH-4005 Basel, Switzerland
>> Office: Klybeckstrasse 64, CH-4057
>> Tel. +41 61 683 77 35
>> Fax: +41 61 302 89 18
>> E-mail:
>> lat...@mdpi.com
>> http://www.mdpi.com/


Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/

Re: URL parameters combined with text param

2016-05-12 Thread Ahmet Arslan

Hi,

Well, what happens 

q=hospital={!lucene q.op=AND v=$a}=hospital Leapfrog

OR

q=+_query_:"{!lucene q.op=AND v='hospital'}" +_query_:"{!lucene q.op=AND 
v=$a}"=hospital Leapfrog


Ahmet


On Thursday, May 12, 2016 3:28 PM, Bastien Latard - MDPI AG 
<lat...@mdpi.com.INVALID> wrote:
Hi Ahmet,

Thanks for your answer, but this doesn't work on my local index.
q1 returns 2 results.

http://localhost:8983/solr/my_core/select?q=hospital AND 
_query_:"{!q.op=AND%20v=$a}"=abstract,title=hospital Leapfrog
==> returns 254 results (the same as 
http://localhost:8983/solr/my_core/select?q=hospital )

Kind regards,
Bastien

On 11/05/2016 16:06, Ahmet Arslan wrote:
> Hi Bastien,
>
> Please use magic _query_ field, q=hospital AND _query_:"{!q.op=AND v=$a}"
>
> ahmet
>
>
> On Wednesday, May 11, 2016 2:35 PM, Latard - MDPI AG 
> <lat...@mdpi.com.INVALID> wrote:
> Hi Everybody,
>
> Is there a way to pass only some of the data by reference and some
> others in the q param?
>
> e.g.:
>
> q1.   http://localhost:8983/solr/my_core/select?{!q.op=OR
> v=$a}=abstract,title=hospital Leapfrog=true
>
> q1a.  http://localhost:8983/solr/my_core/select?q=hospital AND
> Leapfrog=abstract,title
>
> q2.  http://localhost:8983/solr/my_core/select?q=hospital AND
> ({!q.op=AND v=$a})=abstract,title=hospital Leapfrog
>
> q1 & q1a  are returning the same results, but q2 is somehow not
> analyzing the $a parameter properly...
>
> Am I missing anything?
>
> Kind regards,
> Bastien Latard
> Web engineer


Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/

Re: Error

2016-05-11 Thread Ahmet Arslan

Hi Midas,

It looks like you are committing too frequently, cache warming cannot catchup.
Either lower your commit rate, or disable cache auto warm (autowarmCount=0).
You can also remove queries registered at newSearcher event if you have defined 
some.

Ahmet



On Wednesday, May 11, 2016 2:51 PM, Midas A  wrote:
Hi i am getting following error

org.apache.solr.common.SolrException: Error opening new searcher.
exceeded limit of maxWarmingSearchers=2, try again later.



what should i do to remove it .

Re: URL parameters combined with text param

2016-05-11 Thread Ahmet Arslan

Hi Bastien,

Please use magic _query_ field, q=hospital AND _query_:"{!q.op=AND v=$a}"

ahmet


On Wednesday, May 11, 2016 2:35 PM, Latard - MDPI AG  
wrote:
Hi Everybody,

Is there a way to pass only some of the data by reference and some 
others in the q param?

e.g.:

q1.   http://localhost:8983/solr/my_core/select?{!q.op=OR 
v=$a}=abstract,title=hospital Leapfrog=true

q1a.  http://localhost:8983/solr/my_core/select?q=hospital AND 
Leapfrog=abstract,title

q2.  http://localhost:8983/solr/my_core/select?q=hospital AND 
({!q.op=AND v=$a})=abstract,title=hospital Leapfrog

q1 & q1a  are returning the same results, but q2 is somehow not 
analyzing the $a parameter properly...

Am I missing anything?

Kind regards,
Bastien Latard
Web engineer
-- 
MDPI AG
Postfach, CH-4005 Basel, Switzerland
Office: Klybeckstrasse 64, CH-4057
Tel. +41 61 683 77 35
Fax: +41 61 302 89 18
E-mail:
lat...@mdpi.com
http://www.mdpi.com/

Re: How to search string

2016-05-11 Thread Ahmet Arslan

Hi,

You can be explicit about the field that you want to search on. e.g. 
q=product_name:(Garmin Class A)

Or you can use lucene query parser with default field (df) parameter. e.g. 
q={!lucene df=product_name)Garmin Class A

Its all about query parsers.

Ahmet


On Wednesday, May 11, 2016 9:12 AM, kishor  wrote:
I want to search a product and product name is "Garmin Class A" so  I expect
result is product name matching string "Garmin Class A" but it searches
separately i dont know why and how it happen.Please guide me how to search a
string in only one field only not in other fields. "debug": {  
"rawquerystring": "Garmin Class A","querystring": "Garmin Class A",  
"parsedquery": "(+(DisjunctionMaxQuery((product_name:Garmin))
DisjunctionMaxQuery((product_name:Class))
DisjunctionMaxQuery((product_name:A))) ())/no_coord",  
"parsedquery_toString": "+((product_name:Garmin) (product_name:Class)
(product_name:A)) ()","explain": {},



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-string-tp4276052.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to search in solr for words like %rek Dr%

2016-05-11 Thread Ahmet Arslan

Hi Thrinadh,

Why don't you use plain wildcard search? There are two operator star and 
question mark for this purpose.

Ahmet


On Wednesday, May 11, 2016 4:31 AM, Thrinadh Kuppili  
wrote:
Thank you, Yes i am aware that surround with quotes will result in match for
space but i am trying to match word based on input which cant be controlled. 
I need to search solr for %rek Dr%  and return all result which has "rek Dr"
without qoutes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-search-in-solr-for-words-like-rek-Dr-tp4275854p4276027.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet ignoring repeated word

2016-05-10 Thread Ahmet Arslan

+1 to Toke's facet and stats combo!

On Tuesday, May 10, 2016 11:21 AM, Toke Eskildsen  
wrote:
On Fri, 2016-04-29 at 08:55 +, G, Rajesh wrote:

> I am trying to implement word 
> cloud
>   using Solr.  The problem I have is Solr facet query ignores repeated words 
> in a document eg.

Use a combination of faceting and stats:

1) Resolve candidate words with faceting, just as you have already done.

2) Create a stats-request with the same q as you used for faceting, with
a termfreq-function for each term in your facet result.

Working example from the techproducts-demo that comes with Solr:

http://localhost:8983/solr/techproducts/select
?q=name%3Addr%0A
=name=json=true
=true
={!sum=true%20func}termfreq(%27name%27,%20%27ddr%27)
={!sum=true%20func}termfreq(%27name%27,%20%271GB%27)

where 'name' is the field ('comments' in your setup) and 'ddr' and '1GB'
are two terms ('absorbed', 'am', 'believe' etc. in your setup).

The result will be something like

"response": {
"numFound": 3,
...
"stats": {
"stats_fields": {
  "termfreq('name', 'ddr')": {
"sum": 6
  },
  "termfreq('name', '1GB')": {
"sum": 3
  }
}
  }

- Toke Eskildsen, State and University Library, Denmark

Re: how to find out how many times a word appears in a collection of documents?

2016-05-10 Thread Ahmet Arslan

Hi,

fl parameters accepts multivalued parameters, please try fl=title,link

ahmet



On Tuesday, May 10, 2016 2:26 PM, "liviuchrist...@yahoo.com.INVALID" 
<liviuchrist...@yahoo.com.INVALID> wrote:
Hi Ahmet, Thank you very muchThere would be another question: I can't make it 
provide results from more than one 
field:http://localhost:8983/solr/cuvinte/admin/luke?fl=_text_&?fl=title&?fl=link=100

is my querry sintax wrong?I need to get results from more than one field... for 
example the words from the following fields: _text_
it gives me this03459136591360-11041truefalseorg.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/home/solr/Downloads/Solr6/solr-6.0.0/server/solr/cuvinte/data/index
 lockFactory=org.apache.lucene.store.NativeFSLockFactory@4614229d; 
maxCacheMB=48.0 maxMergeSizeMB=4.0)segments_j16514628748877022016-05-10T10:08:07.702Ztext_generalITS-MVop-ITS--V---58629241761398983935335404351743292729390282032456524315207032019919732191571827117599159061508014983141871395113866137911262112452123001186011706114381126911204106371036810339102381016710160101029416935690288898883788048764865286098552835082238138807278017463721370476936692067426721669566056599654064726378629462126194614860955999596158655796569356615642553255125406538753715363529151915165506250244850483348284730471247104696465646294608458645271693232388615734106947304503335212356162210446603001666934105IndexedTokenizedStoredDocValuesMultivaluedTermVector StoredStore Offset With TermVectorStore Position With 
TermVectorStore Payloads With TermVectorOmit NormsOmit Term Frequencies & 
PositionsOmit PositionsStore Offsets 
with PositionsLazyBinarySort Missing FirstSort Missing 
LastDocument Frequency (df) is not updated when a 
document is marked for deletion. df values include deleted 
documents.
 Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570

  From: Ahmet Arslan <iori...@yahoo.com.INVALID>

To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>; 
"liviuchrist...@yahoo.com" <liviuchrist...@yahoo.com> 
Sent: Tuesday, May 10, 2016 1:42 PM
Subject: Re: how to find out how many times a word appears in a collection of 
documents?
  
Hi Christian,

Collection wide term statistics can be accessed via TermsComponent or 
LukeRequestHandler.

Ahmet



On Tuesday, May 10, 2016 1:26 PM, "liviuchrist...@yahoo.com.INVALID" 
<liviuchrist...@yahoo.com.INVALID> wrote:
Hi everyone,
I need to "read" the solr/lucene index and see how many times does words appear 
in all documents. For example: I have a collection of 1 mil documents and I 
want to see a list like this:the - 10 timesbread - 1000 timesspoon - 10 
timesfork - 5 times
etc.
How do I do that???
Kind regards,Christian

Re: how to find out how many times a word appears in a collection of documents?

2016-05-10 Thread Ahmet Arslan

Hi Christian,

Collection wide term statistics can be accessed via TermsComponent or 
LukeRequestHandler.

Ahmet



On Tuesday, May 10, 2016 1:26 PM, "liviuchrist...@yahoo.com.INVALID" 
 wrote:
Hi everyone,
I need to "read" the solr/lucene index and see how many times does words appear 
in all documents. For example: I have a collection of 1 mil documents and I 
want to see a list like this:the - 10 timesbread - 1000 timesspoon - 10 
timesfork - 5 times
etc.
How do I do that???
Kind regards,Christian

Re: Facet ignoring repeated word

2016-05-09 Thread Ahmet Arslan

Hi,


I understand the word cloud part. 
It looks like you want to use within-resultList term frequency information.In 
your first mail, I thought you want within-document term frequency.

TermsComponent reports within-collection term frequency.

I am not sure how to retrieve within-resultList term frequency.
Traversing the result list and collecting term vector data seems plausible.

Ahmet

 



On Monday, May 9, 2016 11:55 AM, "G, Rajesh" <r...@cebglobal.com> wrote:
Hi Ahmet,

Please let me know if I am not clear

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-Original Message-
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Friday, May 6, 2016 1:08 PM
To: Ahmet Arslan <iori...@yahoo.com>; solr-user@lucene.apache.org
Subject: RE: Facet ignoring repeated word

Hi Ahmet,



Sorry it is Word Cloud  
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.google.co.uk_webhp-3Fsourceid-3Dchrome-2Dinstant-26ion-3D1-26espv-3D2-26ie-3DUTF-2D8-23newwindow-3D1-26q-3Dword-2Bcloud=CwIGaQ=zzHkMf6HMoOvCB4yTPe0Gg=05YCVYE-IrDXcnbr1V8J9Q=k-w03YA11ltRmGgXa55Yx2gs1Jk1QowoFIE32lm9QMU=X_BPC_BR1vgdcijmmd50zYBOnIP97BfPfS2H7MxC9V4=




We have comments from survey. We want to build word cloud using the filed 
comments



e.g For question 1 the comments are



Comment 1.Projects, technology, features, performance

Comment 2.Too many projects and technology, not enough people to run 
projects



I want to run a query for question 1 that will produce the below result



projects: 3

technology:2

features:1

performance:1

Too:1

Many:1

Enough:1

People:1

Run:1





Facet produces the result but ignores repeated words in a document[projects 
count will be 2 instead of 3].



projects: 2

technology:2

features:1

performance:1

Too:1

Many:1

Enough:1

People:1

Run:1



TeamVectorComponent produces the result as expected but they are not grouped by 
words, instead they are grouped by id.





1





1











2





2









I wanted to know if it is possible to produce a result that is grouped by word 
and also does not ignore repeated words in a document. If it is not possible 
then I have to write some script that will take the above result from solr 
group words and sum the count



Thanks

Rajesh









CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.



-Original Message-----

From: Ahmet Arslan [mailto:iori...@yahoo.com]

Sent: Friday, May 6, 2016 12:39 PM

To: G, Rajesh <r...@cebglobal.com>; solr-user@lucene.apache.org

Subject: Re: Facet ignoring repeated word



Hi Rajesh,



Can you please explain what do you mean by "tag cloud"?

How it is related to a query?

Please explain your requirements.



Ahmet







On Friday, May 6, 2016 8:44 AM, "G," <r...@cebglobal.com> wrote:

Hi,



Can you please help? If there is a solution then It will be easy, else I have 
to create a script in python that can process the results from 
TermVectorComponent and group the result by words in different documents to 
find the word count. The Python script will accept the exported Solr result as 
input



Thanks

Rajesh







CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, inc

Re: Filter queries & caching

2016-05-08 Thread Ahmet Arslan

Hi,

As I understand it useful incase you use an OR operator between two restricting 
clauses.
Recall that multiple fq means implicit AND.

ahmet

On Monday, May 9, 2016 4:02 AM, Jay Potharaju  wrote:
As mentioned above adding filter() will add the filter query to the cache.
This would mean that results are fetched from cache instead of running n
number of filter queries  in parallel.
Is it necessary to use the filter() option? I was under the impression that
all filter queries will get added to the "filtercache". What is the
advantage of using filter()?

*From
doc: 
https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig
*
This cache is used by SolrIndexSearcher for filters (DocSets) for unordered
sets of all documents that match a query. The numeric attributes control
the number of entries in the cache.
Solr uses the filterCache to cache results of queries that use the fq
search parameter. Subsequent queries using the same parameter setting
result in cache hits and rapid returns of results. See Searching for a
detailed discussion of the fq parameter.

*From Yonik's site: http://yonik.com/solr/query-syntax/#FilterQuery
*

(Since Solr 5.4)

A filter query retrieves a set of documents matching a query from the
filter cache. Since scores are not cached, all documents that match the
filter produce the same score (0 by default). Cached filters will be
extremely fast when they are used again in another query.

Thanks

On Fri, May 6, 2016 at 9:46 AM, Jay Potharaju  wrote:

> We have high query load and considering that I think the suggestions made
> above will help with performance.
> Thanks
> Jay
>
> On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey  wrote:
>
>> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
>> > With three separate
>> > fq parameters, you'll get three cache entries in filterCache from the
>> > one query.
>>
>> One more tidbit of information related to this:
>>
>> When you have multiple filters and they aren't cached, I am reasonably
>> certain that they run in parallel.  Instead of one complex filter, you
>> would have three simple filters running simultaneously.  For low to
>> medium query loads on a server with a whole bunch of CPUs, where there
>> is plenty of spare CPU power, this can be a real gain in performance ...
>> but if the query load is really high, it might be a bad thing.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Thanks
> Jay Potharaju

>
>

-- 
Thanks
Jay Potharaju

Re: Facet ignoring repeated word

2016-05-06 Thread Ahmet Arslan

Hi Rajesh,

Can you please explain what do you mean by "tag cloud"?
How it is related to a query?
Please explain your requirements.

Ahmet



On Friday, May 6, 2016 8:44 AM, "G," <r...@cebglobal.com> wrote:
Hi,

Can you please help? If there is a solution then It will be easy, else I have 
to create a script in python that can process the results from 
TermVectorComponent and group the result by words in different documents to 
find the word count. The Python script will accept the exported Solr result as 
input

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.


-Original Message-
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Thursday, May 5, 2016 4:29 PM
To: Ahmet Arslan <iori...@yahoo.com>; solr-user@lucene.apache.org; 
erickerick...@gmail.com
Subject: RE: Facet ignoring repeated word

Hi,

TermVectorComponent works. I am able to find the repeating words within the 
same document...that facet was not able to. The problem I see is 
TermVectorComponent produces result by a document e.g. and I have to combine 
the counts i.e count of word my is=6 in the list of documents. Can you please 
suggest a solution to group count by word across documents?. Basically we want 
to build word cloud from Solr result


1675


4





1675


2




http://localhost:8182/solr/dev/tvrh?q=*:*=true=comments=true=comments=1000


Hi Erick,
I need the count of repeated words to build word cloud

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Tuesday, May 3, 2016 6:19 AM
To: solr-user@lucene.apache.org; G, Rajesh <r...@cebglobal.com>
Subject: Re: Facet ignoring repeated word

Hi,

StatsComponent does not respect the query parameter. However you can feed a 
function query (e.g., termfreq) to it.

Instead consider using TermVectors or MLT's interesting terms.


https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis

Ahmet


On Monday, May 2, 2016 9:31 AM, "G, Rajesh" <r...@cebglobal.com> wrote:
Hi Erick/ Ahmet,

Thanks for your suggestion. Can we have a query in TermsComponent like. I need 
the word count of comments for a question id not all. When I include the query 
q=questionid=123 I still see count of all

http://localhost:8182/solr/dev/terms?terms.fl=comments=true=1000=questionid=123

StatsComponent is not supporting text fields

Field type 
textcloud_en{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
 class=solr.TextField}} is not currently supported

  

  
  
  


  
  

  

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than

Re: How to get all the docs whose field contain a specialized string?

2016-05-06 Thread Ahmet Arslan

Hi,


It looks like brand_s is defined as string, which is not tokenized.
Please do one of the following to retrieve "brand_s":"ibm hp"
 
a) use a tokenized field type
or
b) issue a wildcard query of q=ibm*

Ahmet


On Friday, May 6, 2016 8:35 AM, 梦在远方  wrote:



Hi, all


I do a query by solr admin UI ,but the response is not what i desired!
My operations follows!


first step: get all data.
http://127.0.0.1:8080/solr/example/select?q=*%3A*=json=true


response follows:


"response": { "numFound": 5, "start": 0, "docs": [   { 
"id": "1", "goods_name_s": "cpu1", "brand_s": "amd", 
"_version_": 1533546720443498500   },   { "id": "2", 
"goods_name_s": "cpu2", "brand_s": "ibm",// there is a 'ibm' 
"_version_": 1533546730775117800   },   { "id": "3", 
"goods_name_s": "cpu3", "brand_s": "intel", "_version_": 
1533546741316452400   },   { "id": "4", "goods_name_s": 
"cpu4", "brand_s": "other", "_version_": 1533546750936088600
   },   { "id": "5", "goods_name_s": "cpu5", 
"brand_s": "ibm hp",//there is a 'ibm' "_version_": 1533548604687384600 
  } ]

second step: query the record which 'brand_s' contain 'ibm'.
http://127.0.0.1:8080/solr/example/select?q=brand_s%3Aibm=json=true


"response": { "numFound": 1, "start": 0, "docs": [   { 
"id": "2", "goods_name_s": "cpu2", "brand_s": "ibm", 
"_version_": 1533546730775117800   } ]   }


my question is why there is only one doc found? There are two Docs which 
contains 'ibm' in all docs.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1782 matches

Mail list logo