Re: update please

2017-05-30 Thread Saman Rasheed
Hi Rick,


Thanks for coming back to me on this, btw it's 'Saman' but please call me sam 
like everyone else 


here we go:

~~

i have an english book which i have indexed its contents successfully into a 
field called 'content,
with the following properties:





so if need to return the number of a specific term regex e.g. '*olomo*' then my 
document should
contain 2 and give me 'Solomon' with a term frequency = 2.


I've tried going through the term vector section in the reference and various 
other posts
on the internet but still i havent managed to figure out how.


the nearest i found is the following syntax/way:


http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]=true=true=true


which brings my pc to a near halt for about a couple of minutes, and then it 
returns the term
frequency of every term! but i only need the term frequency of particular 
pattern/regex:


is there a way to narrow it down to just one regex term, e.g. *thing*, so it 
will find the term frequency of 'soothing',
'somthing' and 'everything' ... etc each with their number of occurences per 
document?


thanks,



From: Rick Leir <rl...@leirtech.com>
Sent: 30 May 2017 16:45
To: solr-user@lucene.apache.org
Subject: Re: update please

Salman,
That is a week ago, which is a long while. And my Android does not display the 
archives link in a readable way. Would you mind repeating the question here? Be 
a bit verbose, sometimes it is better that way.
Cheers -- Rick


On May 30, 2017 12:29:34 PM EDT, Saman Rasheed <saman_rash...@hotmail.com> 
wrote:
>hi, can someone kindly update me on the question i raised on Mon, 22
>May, 17:14
>
>
>subject:
>
>
>without termfeq - returning the number of terms/or regex of terms in a
>document<http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201705.mbox/ajax/%3CVI1P190MB0334F019686D8591A6679F448CF80%40VI1P190MB0334.EURP190.PROD.OUTLOOK.COM%3E>
>
>
>thanks,

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com


update please

2017-05-30 Thread Saman Rasheed
hi, can someone kindly update me on the question i raised on Mon, 22 May, 17:14


subject:


without termfeq - returning the number of terms/or regex of terms in a 
document


thanks,


Re: JSON facet performance for aggregations

2017-05-25 Thread Saman Rasheed
hi yonik,


i like your work on solr very much, and i'm hoping it can deliver what we are 
looking to acheive here... and apologies for the direct aproach but i dont i 
have a choice, i've sumitted the request below to the mailing list and i still 
havent had a reply ... and part of me wondering it's because either i have 
missed out on something very obvious, or maybe my aproach to my problem is 
using the wrong technology here!


The mailing list is not allowing me to send you a direct link to the issue 
unless you want to see my message with alot of xml 

so i'm pasting the contents of my message below:

thanks,

~

i have an english book which i have indexed its contents successfully into 
field called 'content,
with the following properties:





so if need to return the number of a specific term regex e.g. '*olomo*' then my 
document should
contain 2 and give me 'Solomon' with a term frequency = 2.


I've tried going through the term vector section in the reference and various 
other posts
on the internet but still i havent managed to figure out how.


the nearest i found is the following syntax/way:


http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]=true=true=true


which brings my pc to a near halt for about a couple of minutes, and then it 
returns the term
frequency of every term! but i only need the term frequency of particular 
pattern/regex:


is there a way to narrow it down to just one regex term, e.g. *thing*, so it 
will find soothing,
somthing, everything each with their number of occurences for the document?


thanks,



~





From: Yonik Seeley 
Sent: 24 May 2017 10:45
To: solr-user@lucene.apache.org
Subject: Re: JSON facet performance for aggregations

On Mon, May 8, 2017 at 11:27 AM, Yonik Seeley  wrote:
> I opened https://issues.apache.org/jira/browse/SOLR-10634 to address
> this performance issue.

OK, this has been committed.
A quick test shows about a 30x speedup when faceting on a
string/numeric docvalues field with 100K unique values and doing a
simple aggregation on another numeric field (and when the limit:-1).

-Yonik


without termfeq - returning the number of terms/or regex of terms in a document

2017-05-22 Thread Saman Rasheed
i have an english book which i have indexed its contents successfully into 
field called 'content, with the following properties:





so if need to return the number of a specific term regex e.g. '*olomo*' then my 
document should contain 2 and give me 'Solomon' with a term frequency = 2.


I've tried going through the term vector section in the reference and various 
other posts on the internet but still i havent managed to figure out how.


the nearest i found is the following syntax/way:


http://localhost:8983/solr/test/tvrh?q=content:[*%20TO%20*]=true=true=true


which brings my pc to a near halt for about a couple of minutes, and then it 
returns the term frequency of every term! but i only need the term frequency of 
particular pattern/regex:


is there a way to narrow it down to just one regex term, e.g. *thing*, so it 
will find soothing, somthing, everything each with their number of occurences 
for the document?


thanks,



counting_number_of_term_in_a_doc

2017-04-26 Thread Saman Rasheed
Hi, I've been trying to figure out how to return the (number) of  matching 
words in a regex term lookup with no luck.


Basically i have a large text document indexed, next when i do a regex term 
lookup like the following:


http://localhost:8983/solr/core1/terms?terms.fl=content=.*term.*=1


That returns all the words (up to 1000) that are either an exact match, start, 
end or contain the word 'term' successfully, see below:




0
6



1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1





What i need is the syntax to produce e.g. how many times the word 'min' or 
'term' exists in that document either as term by itself or part of another term?


At the moment it only tells me that it occurs in '1' document which can be 
useful later on.


I've been looking at the cwiki page: 
https://cwiki.apache.org/confluence/display/solr/The+Terms+Component


and other articles on the net with no luck.


Can you please help.


Many thanks.



termfreq usage/syntax

2017-04-25 Thread Saman Rasheed
hi Solr team, i'm starting to have fun with solr, and i'm in a big project that 
requires me to index some books and then do certain term lookups on them.


I'm using windows 10 and i've successfully managed to index a book containing 
more than 118,000 words! which is normal i guess.


in the solr admin UI, if i, for example do a look up on a term let's say 
'house', i type in 'fl' field the following:


termfreq(content,house)


and i get the following response:


{ "responseHeader":{ "status":0, "QTime":166, "params":{ "q":"*:*", 
"indent":"on", "fl":"termfreq(content,house)", "wt":"json", 
"_":"1493115416033"}}, "response":{"numFound":1,"start":0,"docs":[ { 
"termfreq(content,house)":200}] }


and this is what i expect, however i havent been successful in doing an 
approximate searchon the word 'house' i.e when 'house' is part or in the middle 
of word! e.g. 'rehousing' or 'housing'.


what i'm looking for is syntax similair to: 'termfreq(content,*house*)'

which doesnt work.


i've had a look at the online wiki reference on the section below:


termfreq


Returns the number of times the term appears in the field for that document.


termfreq(text,'memory')



from: https://cwiki.apache.org/confluence/display/solr/Function+Queries


and i've substituted the words 'text' and 'memory' with the ones in my example 
above, still no luck on approximate searches.


not sure what i'm doing wrong here, can you please help.


regards,

Sam.