Re: Two instances of solr - the same datadir?

2013-07-04 Thread Roman Chyla
I have spent lot of time in the past day playing with this setup, and made
it work finally, here are few bits of interest:

- solr v40
- linux, java7, local filesystem
- big index, 1 RW instance + 2 RO instances (sharing the same index)


lock is acquired when solr is writing data - if you happen to be starting
your RO instance at this moment and you are using 'native' lock, it will
fail. However, when using RW instance with 'native' lock, and 2 RO
instances 'single' lock, the RO instances can start, but they will
eventually get into troubles too - our index is too big and so when core
RELOAD is called and indexing is under way, the RO instances time out.

core reload, when using 'native' lock, seems to work fine - if you were
lucky and all instances managed to start - HOWEVER, the core is
unresponsive until fully loaded (makes sense), but this is actually
terrible - your search is gone for seconds/minutes

the best setup is as described in my original post - RO instances MUST NOT
commit anything - neither use reload (because during reload solr tries to
acquire lock). Instead, they should just reopen the searcher - i repeat:
you should make sure that nothing is every going to write on the RO
instance. And because there is no public api for reopening the searcher, I
wrote a simple handler which just calls:

req.getCore().getSearcher(true, false, null, false);

when called, the RO instances continue to handle requests using the old
searcher, warming in the background, once ready, the new searcher takes
over [to repeat: i am triggering this refresh from the RW instance, it does
'curl http://foo/solr/myhandler?command=reopenSearcher]


the bad thing: when the RO instance dies (eg OOM error) and the RW is just
in the middle of writing data, you can't restart RO instance (unless you
use lock 'single' or some other lock)

HTH,

  roman




On Tue, Jul 2, 2013 at 5:35 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Wouldn't it be better to do a RELOAD?

 http://wiki.apache.org/solr/CoreAdmin#RELOAD

 Michael Della Bitta

 Applications Developer

 o: +1 646 532 3062  | c: +1 917 477 7906

 appinions inc.

 “The Science of Influence Marketing”

 18 East 41st Street

 New York, NY 10017

 t: @appinions https://twitter.com/Appinions | g+:
 plus.google.com/appinions
 w: appinions.com http://www.appinions.com/


 On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge peter.stu...@gmail.com
 wrote:

  The RO instance commit isn't (or shouldn't be) doing any real writing,
 just
  an empty commit to force new searchers, autowarm/refresh caches etc.
  Admittedly, we do all this on 3.6, so 4.0 could have different behaviour
 in
  this area.
  As long as you don't have autocommit in solrconfig.xml, there wouldn't be
  any commits 'behind the scenes' (we do all our commits via a local solrj
  client so it can be fully managed).
  The only caveat might be NRT/soft commits, but I'm not too familiar with
  this in 4.0.
  In any case, your RO instance must be getting updated somehow, otherwise
  how would it know your write instance made any changes?
  Perhaps your write instance notifies the RO instance externally from
 Solr?
  (a perfectly valid approach, and one that would allow a 'single' lock to
  work without contention)
 
 
 
  On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla roman.ch...@gmail.com
 wrote:
 
   Interesting, we are running 4.0 - and solr will refuse the start (or
   reload) the core. But from looking at the code I am not seeing it is
  doing
   any writing - but I should digg more...
  
   Are you sure it needs to do writing? Because I am not calling commits,
 in
   fact I have deactivated *all* components that write into index, so
 unless
   there is something deep inside, which automatically calls the commit,
 it
   should never happen.
  
   roman
  
  
   On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge peter.stu...@gmail.com
   wrote:
  
Hmmm, single lock sounds dangerous. It probably works ok because
 you've
been [un]lucky.
For example, even with a RO instance, you still need to do a commit
 in
order to reload caches/changes from the other instance.
What happens if this commit gets called in the middle of the other
instance's commit? I've not tested this scenario, but it's very
  possible
with a 'single' lock the results are indeterminate.
If the 'single' lock mechanism is making assumptions e.g. no other
   process
will interfere, and then one does, the Lucene index could very well
 get
corrupted.
   
For the error you're seeing using 'native', we use native lockType
 for
   both
write and RO instances, and it works fine - no contention.
Which version of Solr are you using? Perhaps there's been a change in
behaviour?
   
Peter
   
   
On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla roman.ch...@gmail.com
   wrote:
   
 as i discovered, it is not good to use 'native' locktype in this
scenario,
 actually there is a note in the 

[Solr 4.2] deleteInstanceDir is added to CoreAdminHandler but is not supported in Unload CoreAdminRequest

2013-07-04 Thread Lyuba Romanchuk
Hi,

I need to unload core with deleting instance directory of the core.
According to code of Solr4.2 I don't see the support for this parameter in
solrj.
Is there the fix or open issue for this?

Best regards,
Lyuba


Re: Joins with SolrCloud

2013-07-04 Thread slevytam
Hi Yonik,

Thanks for the reply.  It was very helpful.

This may be a newb question but will this work on a individual rows of a
query or do all the queries' results need to be on the same shard.

ex.

if the main query would return 
- user15 (shard 1)
- user16 (shard 2)
- user17 (shard 3)

is it acceptable to have 
doc1 (shard 1)
whatever (shard 2)
yeah (shard 3)

for a join of 
- user15, doc1
- user16, whatever
- user17, yeah

or do all the results of the main query need to reside on the same shard as
all the results of join.

Hopefully that's an understandable question.

Thanks,

slevytam



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Simple Moving Average of Query Durations

2013-07-04 Thread Alan Woodward
I started some work on https://issues.apache.org/jira/browse/SOLR-4735, which 
may help here.  Have been pulled away onto other things, but I want to get back 
to it soon.

Alan Woodward
www.flax.co.uk


On 3 Jul 2013, at 23:54, Otis Gospodnetic wrote:

 Hi Jan,
 
 http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue -
 SOLR-1792?
 
 Otis
 --
 Performance Monitoring -- http://sematext.com/spm
 Solr  ElasticSearch Support -- http://sematext.com/
 
 
 
 
 On Wed, Jul 3, 2013 at 5:59 PM, Jan Morlock jan.morl...@googlemail.com 
 wrote:
 Hi,
 
 we would like to observe the mean value of the average time per request for
 the last N (e.g. 20) queries (a.k.a. simple moving average) of our Solr
 server using Nagios. Does anybody know if such an observable is already
 implemented.
 
 If not, I think the perfect place for it would be the getStatistics() method
 inside
 solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java. Would
 you agree?
 
 Thank you very much.
 
 Best regards
 Jan
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Simple-Moving-Average-of-Query-Durations-tp4075312.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Advice for performance issues with group.facet=true

2013-07-04 Thread Daniel Bryant
Many thanks for your response Otis - I had feared as much, but it's good 
to have it confirmed.


Best wishes,

Daniel


On 03/07/2013 17:05, Otis Gospodnetic wrote:

Hi,

I think nobody in the community is focused on field
collapsing/grouping, so I suspect there won't be a fix until somebody
gets a strong-enough itch or business requires it so much that it
decides it pays to invests in the contribution.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Wed, Jul 3, 2013 at 5:54 AM, Daniel Bryant
daniel.bry...@tai-dev.co.uk wrote:

Hi everyone,

I'm seeing very bad performance when grouping (field collapsing) using
group.facet=true with a large result set.

- I have an index with 2 million documents, and I query with five facet
fields (each with 30+ groups)
- If I set group.facet=false the query can take 2000ms on first run, but no
more than 250ms on subsequent execution
- If I set group.facet=true it takes on average 18000ms on the first run,
and the same time on all subsequent runs (suggesting to me that a cache is
not being used)

I've checked the Solr Jira and several others are experiencing the same
issue:

https://issues.apache.org/jira/browse/SOLR-4763

Could anyone offer any advice or suggestions please? This is becoming a
blocking issue for us, and I'm very curious if this will be fixed in the
near future?

Best wishes,

Daniel

--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 (0)
7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


--
*Daniel Bryant  |  Software Development Consultant  | www.tai-dev.co.uk 
http://www.tai-dev.co.uk/*
daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk  |  +44 
(0) 7799406399  |  Twitter: @taidevcouk https://twitter.com/taidevcouk


Re: Moving from single Solr instance to Solr Cloud

2013-07-04 Thread Furkan KAMACI
Which version of Solr you are using?

2013/7/4 Ali, Saqib docbook@gmail.com

 We have single Solr instance with lot of indexed document. Now we would
 like to move to SolrCloud implementation.

 Can we move the existing index to SolrCloud? If so, how? Or do we need to
 reindex our data in SolrCloud?

 Thanks,
 Saqib



Re: PropagateServer Implementation for Solr

2013-07-04 Thread Daniel Collins
Ok, in the scenario where the calling app uses SolrJ and creates a
CloudSolrServer to send all its requests in.  In that case, yes I can see
the logic that says CloudSolrServer shouldn't load balance that (its not
that type of request), it should forward it on to all the servers in the
cloud.  What will happen to the responses, do you get N (independent)
responses back or do you plan to do some kind of aggregation?

I confess we don't use SolrJ (our clients are C++), so we just manually
send the request to all the servers in the cloud (will integrate with ZK
when we work out that interface) so it would be nice if HTTP callers could
do the same (maybe something like distrib=true|false on the LukeRequest
as a shot in the dark, caller can request details from 1 server, or from
the cloud as a whole?)

Is there a way to send the Threads (/admin/threads) and stats requests
(/admin/mbeans)?  We also use them for monitoring (we can't deploy the
web-based monitoring tools for various internal reasons which I won't bore
you with!), but I can't see a request in SolrJ that would map to them?





On 3 July 2013 22:08, Furkan KAMACI furkankam...@gmail.com wrote:

 Hi;

 I've written an e-mail at dev list and I want to share same e-mail here.
 I've opened two issues at Jira and I want to get feedback of community.

 First issue is: https://issues.apache.org/jira/browse/SOLR-4995
 Currently Solr servers are interacting with only one Solr node. I think
 that there should be an implementation that propagates requests into
 multiple Solr nodes. For example when Solr is used as SolrCloud sending a
 LukeRequest should be made to one node at each shard. First patch will be
 related to implementing a PropagateServer for Solr.

 Second issue is related to first one:
 https://issues.apache.org/jira/browse/SOLR-4996
 Let's assume that you are using Solr as SolrCloud and you have more than
 one shard. Let's assume that there are 20 docs at shard_1 and 15 docs at
 shard_2. When using CloudSolrServer if you make a LukeRequest it uses
 LBHttpSolrServer internally and it sends request to just one Solr Node (via
 HttpSolrServer) as round robin. So you may get 20 docs as a result at first
 request and if you send same request you may get 15 docs as a result too.
 Using a PropagateServer inside CloudSolrServer will fix that bug.

 I've made initial patchs for them and I will change/add code to them after
 getting feedback from community (i.e. first patch does not make multi
 threaded requests at PropagateServer, I just want to get feedbacks of
 community after that I will add other features)

 Thanks;
 Furkan KAMACI



Surprising score?

2013-07-04 Thread Lochschmied, Alexander
Hi Solr people!

querying for series:RCWP returns me the response below. Why does RCWP 
Moisture Resistant score worse than D/CRCW-P e3 with the field definition 
below? OK, we are ignoring dashes and spaces, but I would have expected that 
matches towards the beginning score better. Can I change this behavior (in Solr 
4)?

--
result
doc
str name=seriesRCWP/str
float name=score3.2698402/float
/doc
doc
str name=seriesD/CRCW-P e3/str
float name=score1.3624334/float
/doc
doc
str name=seriesRCWP Moisture Resistant/str
float name=score0.5449734/float
/doc
/result
--

fieldType name=series class=solr.TextField positionIncrementGap=100
analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=[\-\s]+ replacement=/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.NGramFilterFactory minGramSize=2 
maxGramSize=50/
/analyzer
analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=[\-\s]+ replacement=/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Thanks,
Alexander


Re: Surprising score?

2013-07-04 Thread Jeroen Steggink

Hi Alexander,

This is because you have length normalization enabled for that field.
http://ir.dcs.gla.ac.uk/wiki/Length_Normalisation

If you want it disabled set the following:

fieldType name=series class=solr.TextField positionIncrementGap=100 
omitNorms=true


 Jeroen

On 4-7-2013 11:10, Lochschmied, Alexander wrote:

Hi Solr people!

querying for series:RCWP returns me the response below. Why does RCWP Moisture 
Resistant score worse than D/CRCW-P e3 with the field definition below? OK, we are ignoring 
dashes and spaces, but I would have expected that matches towards the beginning score better. Can I change 
this behavior (in Solr 4)?

--
result
doc
str name=seriesRCWP/str
float name=score3.2698402/float
/doc
doc
str name=seriesD/CRCW-P e3/str
float name=score1.3624334/float
/doc
doc
str name=seriesRCWP Moisture Resistant/str
float name=score0.5449734/float
/doc
/result
--

fieldType name=series class=solr.TextField positionIncrementGap=100
analyzer type=index
charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ 
replacement=/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.NGramFilterFactory minGramSize=2 
maxGramSize=50/
/analyzer
analyzer type=query
charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\-\s]+ 
replacement=/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

Thanks,
Alexander





Re: PropagateServer Implementation for Solr

2013-07-04 Thread Furkan KAMACI
Here is an example how I use PropagateServer inside CloudSolrServer:

public static ListCloudStatistics customListStatistics(CloudSolrServer
solrServer) {
  NamedListObject namedList = new SimpleOrderedMapObject();
  try {
namedList = solrServer.request(new LukeRequest());
  } catch (SolrServerException e) {
e.printStackTrace();
  } catch (IOException e) {
e.printStackTrace();
  }
 ListNamedListObject all = (ListNamedListObject)
namedList.get(all);
 ListCloudStatistics cloudStatisticsList = new
ArrayListCloudStatistics();

  for (NamedListObject namedListSlice : all) {
 cloudStatisticsList.add(new CloudStatistics((NamedListObject)
namedListSlice.get(index)));
  }
  return cloudStatisticsList;
}

PS: CloudStatistics is a class implemented by me and holds statistics
metrics.


2013/7/4 Daniel Collins danwcoll...@gmail.com

 Ok, in the scenario where the calling app uses SolrJ and creates a
 CloudSolrServer to send all its requests in.  In that case, yes I can see
 the logic that says CloudSolrServer shouldn't load balance that (its not
 that type of request), it should forward it on to all the servers in the
 cloud.  What will happen to the responses, do you get N (independent)
 responses back or do you plan to do some kind of aggregation?

 I confess we don't use SolrJ (our clients are C++), so we just manually
 send the request to all the servers in the cloud (will integrate with ZK
 when we work out that interface) so it would be nice if HTTP callers could
 do the same (maybe something like distrib=true|false on the LukeRequest
 as a shot in the dark, caller can request details from 1 server, or from
 the cloud as a whole?)

 Is there a way to send the Threads (/admin/threads) and stats requests
 (/admin/mbeans)?  We also use them for monitoring (we can't deploy the
 web-based monitoring tools for various internal reasons which I won't bore
 you with!), but I can't see a request in SolrJ that would map to them?





 On 3 July 2013 22:08, Furkan KAMACI furkankam...@gmail.com wrote:

  Hi;
 
  I've written an e-mail at dev list and I want to share same e-mail here.
  I've opened two issues at Jira and I want to get feedback of community.
 
  First issue is: https://issues.apache.org/jira/browse/SOLR-4995
  Currently Solr servers are interacting with only one Solr node. I think
  that there should be an implementation that propagates requests into
  multiple Solr nodes. For example when Solr is used as SolrCloud sending a
  LukeRequest should be made to one node at each shard. First patch will be
  related to implementing a PropagateServer for Solr.
 
  Second issue is related to first one:
  https://issues.apache.org/jira/browse/SOLR-4996
  Let's assume that you are using Solr as SolrCloud and you have more than
  one shard. Let's assume that there are 20 docs at shard_1 and 15 docs at
  shard_2. When using CloudSolrServer if you make a LukeRequest it uses
  LBHttpSolrServer internally and it sends request to just one Solr Node
 (via
  HttpSolrServer) as round robin. So you may get 20 docs as a result at
 first
  request and if you send same request you may get 15 docs as a result too.
  Using a PropagateServer inside CloudSolrServer will fix that bug.
 
  I've made initial patchs for them and I will change/add code to them
 after
  getting feedback from community (i.e. first patch does not make multi
  threaded requests at PropagateServer, I just want to get feedbacks of
  community after that I will add other features)
 
  Thanks;
  Furkan KAMACI
 



SOLR 4.0 frequent admin problem

2013-07-04 Thread David Quarterman
Hi,

About once a week the admin system comes up with SolrCore Initialization 
Failures. There's nothing in the logs and SOLR continues to work in the 
application it's supporting and in the 'direct access' mode (i.e. 
http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).

The cure is to restart Jetty (8.1.7) and then we can use the admin system again 
via pc's. However, a colleague can get into admin on an iPad with no trouble 
when no browser on a pc can!

Anyone any ideas? It's really frustrating!

Best regards,

DQ



ClassNotFoundException regarding SolrInfoMBean under Tomcat 7

2013-07-04 Thread Michael Bakonyi
Hi everyone,

I'm trying to get the CMS TYPO3 connected with Solr 3.6.2.

By now I followed the installation at http://wiki.apache.org/solr/SolrTomcat 
except that I didn't copy the .war-file into the $SOLR_HOME but referencing to 
it at a different location via Tomcat Context fragment file.

Until then the Solr-Server works – I can reach the GUI via URL.

To get Solr connected with the CMS I then created a new core-folder (btw. can 
anybody give me kind of a live example, when to use different cores? Until now 
I still don't really understand the concept of cores ..) by duplicating the 
example-folder in which I overwrote some files (especially solrconfig.xml) with 
files offered by the TYPO3-community. I also moved the file solr.xml one 
level up and edited it (added core-fragment and especially adjusted 
instanceDir)  to get a correct multicore-setup like in the example 
multicore-setup within the downloaded solr-tgz-package.

But now I get the Java-exception 

java.lang.NoClassDefFoundError: org/apache/solr/core/SolrInfoMBean at 
java.lang.ClassLoader.defineClass1(Native Method)

In the Tomcat-log file it is said additionally: Caused by: 
java.lang.ClassNotFoundException: org.apache.solr.core.SolrInfoMBean.

My guess is, that within the new solrconfig.xml there are calls to classes 
which aren't included correctly. There are some libs, which are included at the 
top of this file but the paths of the references should be ok as I checked them 
via Bash: At http://wiki.apache.org/solr/SolrConfigXml it is said that the lib 
dir= directory is relative to the instanceDir, so this is what I've checked. I 
also inserted absolute paths but this wasn't successful either.

Can anybody give me a hint how to solve this problem? Would be great :)

Cheers,
Michael

Re: Surprising score?

2013-07-04 Thread Upayavira
And be sure to re-index your content.

Upayavira

On Thu, Jul 4, 2013, at 11:28 AM, Jeroen Steggink wrote:
 Hi Alexander,
 
 This is because you have length normalization enabled for that field.
 http://ir.dcs.gla.ac.uk/wiki/Length_Normalisation
 
 If you want it disabled set the following:
 
 fieldType name=series class=solr.TextField
 positionIncrementGap=100 omitNorms=true
 
 
   Jeroen
 
 On 4-7-2013 11:10, Lochschmied, Alexander wrote:
  Hi Solr people!
 
  querying for series:RCWP returns me the response below. Why does RCWP 
  Moisture Resistant score worse than D/CRCW-P e3 with the field 
  definition below? OK, we are ignoring dashes and spaces, but I would have 
  expected that matches towards the beginning score better. Can I change this 
  behavior (in Solr 4)?
 
  --
  result
  doc
  str name=seriesRCWP/str
  float name=score3.2698402/float
  /doc
  doc
  str name=seriesD/CRCW-P e3/str
  float name=score1.3624334/float
  /doc
  doc
  str name=seriesRCWP Moisture Resistant/str
  float name=score0.5449734/float
  /doc
  /result
  --
 
  fieldType name=series class=solr.TextField positionIncrementGap=100
  analyzer type=index
  charFilter class=solr.PatternReplaceCharFilterFactory 
  pattern=[\-\s]+ replacement=/
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true 
  words=stopwords.txt enablePositionIncrements=true/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.NGramFilterFactory minGramSize=2 
  maxGramSize=50/
  /analyzer
  analyzer type=query
  charFilter class=solr.PatternReplaceCharFilterFactory 
  pattern=[\-\s]+ replacement=/
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  /analyzer
  /fieldType
 
  Thanks,
  Alexander
 
 


Find related words

2013-07-04 Thread Dotan Cohen
How might one find the top related words for a given word in a Solr index?

For instance, given the following single-field documents:
1: I love chocolate
2: I love Solr
3: I eat chocolate cake
4: You will eat chocolate candy

Thus, given the word Chocolate Solr might find these top words:
I (3 times matched)
eat (2 times matched)
love, cake, you, will, candy (1 time each)

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Moving from single Solr instance to Solr Cloud

2013-07-04 Thread Ali, Saqib
Hello Furkan,

We are using Solr 4.3

Thanks


On Thu, Jul 4, 2013 at 1:43 AM, Furkan KAMACI furkankam...@gmail.comwrote:

 Which version of Solr you are using?

 2013/7/4 Ali, Saqib docbook@gmail.com

  We have single Solr instance with lot of indexed document. Now we would
  like to move to SolrCloud implementation.
 
  Can we move the existing index to SolrCloud? If so, how? Or do we need to
  reindex our data in SolrCloud?
 
  Thanks,
  Saqib
 



Re: Joins with SolrCloud

2013-07-04 Thread Yonik Seeley
Yes, joins support distributed search fine,
provided that the individual documents that are joined reside on the same shard.

For example, if you are modeling blogs and posts (one blog object as many posts)

shard1
--
joe!blog_info
joe!post1

shard2
--
mary!blog_info
mary!post1


So now you can search for post bodies and join to the main blog via
{!join from=blog_pointer to=blog_id}post_body:hello

If both mary and joe have a post with hello, they will both be found
and joined to their main blog info docs with a single distributed
search across the collection.

-Yonik
http://lucidworks.com


On Thu, Jul 4, 2013 at 3:37 AM, slevytam developm...@the10thfloor.com wrote:
 Hi Yonik,

 Thanks for the reply.  It was very helpful.

 This may be a newb question but will this work on a individual rows of a
 query or do all the queries' results need to be on the same shard.

 ex.

 if the main query would return
 - user15 (shard 1)
 - user16 (shard 2)
 - user17 (shard 3)

 is it acceptable to have
 doc1 (shard 1)
 whatever (shard 2)
 yeah (shard 3)

 for a join of
 - user15, doc1
 - user16, whatever
 - user17, yeah

 or do all the results of the main query need to reside on the same shard as
 all the results of join.

 Hopefully that's an understandable question.

 Thanks,

 slevytam



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Joins-with-SolrCloud-tp4073199p4075408.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Tony Mullins
Hi ,

I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user
creates a search criteria 'X1' and he/she wants to know the occurrence of a
specific term in the result set of that 'X1' search criteria.
And then again he/she creates another search criteria 'X2' and he/she wants
to know the occurrence of that same term in the result set of that 'X2'
search criteria.

At the moment if I give termfreq(field,term) then it gives me the term
frequency per document and if I use totaltermfreq(field,term), it gives me
the total term frequency in entire index not in the result set of my search
criteria.

So what I need is your help to find how to how to get total occurrence of a
term in query's result set.

If this is my result set

doc
str name=typeMovies/str
str name=formatdvd/str
str name=productThe Hunger Games/str/doc

  doc
str name=typeBooks/str
str name=formatpaperback/str
str name=productThe Hunger Book/str/doc

And I am looking for term 'hunger' in product field then I want to get
value = '2' , and if I am searching for term 'games' in product field I
want to get value = '1' .

Thanks,
Tony


Solr Phonetic Search returning documents but not Highlight Information

2013-07-04 Thread snkar
We have a pretty simple Solr Schema:

fields
   field name=DocId type=long indexed=true stored=true
required=true /
 field name=DocTitle type=string indexed=true stored=true
required=true /
 field name=Content type=text_general indexed=false stored=true
required=true /
 
 field name=ContentSearch type=text_general indexed=true
stored=false multiValued=true/
 field name=ContentSearchStemming type=text_stem indexed=true
stored=false multiValued=true/
 field name=ContentSearchPhonetic type=text_phonetic indexed=true
stored=false multiValued=true/
 field name=ContentSearchSynonym type=text_synonym indexed=true
stored=false multiValued=true/
 field name=_version_ type=long indexed=true stored=true/
 /fields
 
 uniqueKeyDocId/uniqueKey
 copyField source=Content dest=ContentSearch/
 copyField source=Content dest=ContentSearchStemming/
 copyField source=Content dest=ContentSearchPhonetic/
 copyField source=Content dest=ContentSearchSynonym/
 
 fieldType name=text_general class=solr.TextField
positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

fieldType name=text_stem class=solr.TextField 
analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SnowballPorterFilterFactory/
/analyzer  
 /fieldType
 
 fieldType name=text_phonetic class=solr.TextField 
analyzer
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.PhoneticFilterFactory encoder=Soundex
inject=false/
/analyzer  
 /fieldType
 
 fieldType name=text_synonym class=solr.TextField 
 analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
  /analyzer 
 /fieldType

We are indexing documents in Solr using Solrnet and have a requirement to
support Phonetic Search based on the Soundex algorithm. Once we have indexed
documents, we can search in the Solr Admin Panel using a Phonetic query and
the relevant document is returned in the Search Results but the highlight
collection is blank.

Eg. Use Case:
--
We index a text document which contains the word electromagnetic(Soundex
Code: E423)
We execute a Search in the Solr Admin Panel using the following query:
ContentSearchPhonetic:electing(Soundex Code: E423).
The Search shows one document returned but the highlight collection is
blank.
Solr is definitely using the Phonetic Soundex algorithm to locate the
document as the word electing is not present in the document. But somehow
it is not being able to return the highlight data.
The same schema and config can successfully return documents along with
highlight data for other approximate searches like synonym, fuzzy or
stemming. Only for phonetic search, we are not getting the highlight data.
The screenshot from the Solr Admin Panle is shown below:
http://lucene.472066.n3.nabble.com/file/n4075492/HighlightIssue.png 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Phonetic-Search-returning-documents-but-not-Highlight-Information-tp4075492.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Jack Krupansky
Sorry, but there is no such feature in Solr at this time - you would have to 
do it manually, either by retrieving all of the results or by writing a 
custom value source (function) that does the desired calculation within 
Solr.


Feel free to file a Jira for suggesting such a new feature/improvement.

-- Jack Krupansky

-Original Message- 
From: Tony Mullins

Sent: Thursday, July 04, 2013 9:45 AM
To: solr-user@lucene.apache.org
Subject: Total Term Frequency per ResultSet in Solr 4.3 ?

Hi ,

I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user
creates a search criteria 'X1' and he/she wants to know the occurrence of a
specific term in the result set of that 'X1' search criteria.
And then again he/she creates another search criteria 'X2' and he/she wants
to know the occurrence of that same term in the result set of that 'X2'
search criteria.

At the moment if I give termfreq(field,term) then it gives me the term
frequency per document and if I use totaltermfreq(field,term), it gives me
the total term frequency in entire index not in the result set of my search
criteria.

So what I need is your help to find how to how to get total occurrence of a
term in query's result set.

If this is my result set

doc
   str name=typeMovies/str
   str name=formatdvd/str
   str name=productThe Hunger Games/str/doc

 doc
   str name=typeBooks/str
   str name=formatpaperback/str
   str name=productThe Hunger Book/str/doc

And I am looking for term 'hunger' in product field then I want to get
value = '2' , and if I am searching for term 'games' in product field I
want to get value = '1' .

Thanks,
Tony 



Re: SOLR 4.0 frequent admin problem

2013-07-04 Thread Roman Chyla
Yes :-)  see SOLR-118, seems an old issue...
On 4 Jul 2013 06:43, David Quarterman da...@corexe.com wrote:

 Hi,

 About once a week the admin system comes up with SolrCore Initialization
 Failures. There's nothing in the logs and SOLR continues to work in the
 application it's supporting and in the 'direct access' mode (i.e.
 http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).

 The cure is to restart Jetty (8.1.7) and then we can use the admin system
 again via pc's. However, a colleague can get into admin on an iPad with no
 trouble when no browser on a pc can!

 Anyone any ideas? It's really frustrating!

 Best regards,

 DQ




Re: Find related words

2013-07-04 Thread Jack Krupansky
You can take a look at the MoreLikeThis/Find Similar feature. That gives you 
an approximation, but using documents rather than discrete terms. You would 
have to write a custom component of your own based on logic from MLT.


-- Jack Krupansky

-Original Message- 
From: Dotan Cohen

Sent: Thursday, July 04, 2013 8:09 AM
To: solr-user@lucene.apache.org
Subject: Find related words

How might one find the top related words for a given word in a Solr index?

For instance, given the following single-field documents:
1: I love chocolate
2: I love Solr
3: I eat chocolate cake
4: You will eat chocolate candy

Thus, given the word Chocolate Solr might find these top words:
I (3 times matched)
eat (2 times matched)
love, cake, you, will, candy (1 time each)

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com 



Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Yonik Seeley
If you just want to retrieve those counts, this seems like simple faceting.

q=something
facet=true
facet.query=product:hunger
facet.query=product:games

-Yonik
http://lucidworks.com

On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com wrote:
 Hi ,

 I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user
 creates a search criteria 'X1' and he/she wants to know the occurrence of a
 specific term in the result set of that 'X1' search criteria.
 And then again he/she creates another search criteria 'X2' and he/she wants
 to know the occurrence of that same term in the result set of that 'X2'
 search criteria.

 At the moment if I give termfreq(field,term) then it gives me the term
 frequency per document and if I use totaltermfreq(field,term), it gives me
 the total term frequency in entire index not in the result set of my search
 criteria.

 So what I need is your help to find how to how to get total occurrence of a
 term in query's result set.

 If this is my result set

 doc
 str name=typeMovies/str
 str name=formatdvd/str
 str name=productThe Hunger Games/str/doc

   doc
 str name=typeBooks/str
 str name=formatpaperback/str
 str name=productThe Hunger Book/str/doc

 And I am looking for term 'hunger' in product field then I want to get
 value = '2' , and if I am searching for term 'games' in product field I
 want to get value = '1' .

 Thanks,
 Tony


Re: Find related words

2013-07-04 Thread Koji Sekiguchi

You may want collocations a given word? I've implemented LUCENE-474 for Solr
a while ago and I found it worked pretty well.

https://issues.apache.org/jira/browse/LUCENE-474

Hope this helps.

koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html

(13/07/04 21:09), Dotan Cohen wrote:

How might one find the top related words for a given word in a Solr index?

For instance, given the following single-field documents:
1: I love chocolate
2: I love Solr
3: I eat chocolate cake
4: You will eat chocolate candy

Thus, given the word Chocolate Solr might find these top words:
I (3 times matched)
eat (2 times matched)
love, cake, you, will, candy (1 time each)

Thanks!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com







Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
My solr config has :

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

!-- softAutoCommit is like autoCommit except it causes a
 'soft' commit which only ensures that changes are visible
 but does not ensure that data is synced to disk.  This is
 faster and more near-realtime friendly than a hard commit.
  --
   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit


Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
over tomcat.


Still when i am adding documents to solr and searching its returning 0
hits. Its taking long before the document actually starts showing up.

Can somebody help.

Thanks


Re: Find related words

2013-07-04 Thread Dotan Cohen
Thank you Jack and Koji. I will take a look at MLT and also at the
.zip files from LUCENE-474. Koji, did you have to modify the code for
the latest Solr?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Tony Mullins
Hi Yonik,

With facet it didn't work.

Please see the result set doc below

http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20

doc
 str name=id27/str
 str name=typeMovies/str
  str name=formatdvd/str
  str name=productThe amazing spider man is amazing spider the
spider/str
  int name=popularity1/int
  long name=_version_1439641369145507840/long

  int name=amazing_freq2/int
  int name=spider_freq3/int
  /doc
  /resultlst name=facet_countslst name=facet_queries
  int name=product:spider1/int
   int name=product:amazing1/int
/lst

As you can see facet is actually just returning the no. of docs found
against those keywrods not the actual frequency.
Actual frequency is returned by the field 'amazing_freq'  'spider_freq' !

So is there any workaround for this to get the total of term-frequency in
resultset without any modification to Solr source code ?


Thanks,
Tony


On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote:

 If you just want to retrieve those counts, this seems like simple faceting.

 q=something
 facet=true
 facet.query=product:hunger
 facet.query=product:games

 -Yonik
 http://lucidworks.com

 On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com
 wrote:
  Hi ,
 
  I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user
  creates a search criteria 'X1' and he/she wants to know the occurrence
 of a
  specific term in the result set of that 'X1' search criteria.
  And then again he/she creates another search criteria 'X2' and he/she
 wants
  to know the occurrence of that same term in the result set of that 'X2'
  search criteria.
 
  At the moment if I give termfreq(field,term) then it gives me the term
  frequency per document and if I use totaltermfreq(field,term), it gives
 me
  the total term frequency in entire index not in the result set of my
 search
  criteria.
 
  So what I need is your help to find how to how to get total occurrence
 of a
  term in query's result set.
 
  If this is my result set
 
  doc
  str name=typeMovies/str
  str name=formatdvd/str
  str name=productThe Hunger Games/str/doc
 
doc
  str name=typeBooks/str
  str name=formatpaperback/str
  str name=productThe Hunger Book/str/doc
 
  And I am looking for term 'hunger' in product field then I want to get
  value = '2' , and if I am searching for term 'games' in product field I
  want to get value = '1' .
 
  Thanks,
  Tony



Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Yonik Seeley
Ah, sorry - I thought you were after docfreq, not termfreq.
-Yonik
http://lucidworks.com

On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com wrote:
 Hi Yonik,

 With facet it didn't work.

 Please see the result set doc below

 http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20

 doc
  str name=id27/str
  str name=typeMovies/str
   str name=formatdvd/str
   str name=productThe amazing spider man is amazing spider the
 spider/str
   int name=popularity1/int
   long name=_version_1439641369145507840/long

   int name=amazing_freq2/int
   int name=spider_freq3/int
   /doc
   /resultlst name=facet_countslst name=facet_queries
   int name=product:spider1/int
int name=product:amazing1/int
 /lst

 As you can see facet is actually just returning the no. of docs found
 against those keywrods not the actual frequency.
 Actual frequency is returned by the field 'amazing_freq'  'spider_freq' !

 So is there any workaround for this to get the total of term-frequency in
 resultset without any modification to Solr source code ?


 Thanks,
 Tony


 On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com wrote:

 If you just want to retrieve those counts, this seems like simple faceting.

 q=something
 facet=true
 facet.query=product:hunger
 facet.query=product:games

 -Yonik
 http://lucidworks.com

 On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com
 wrote:
  Hi ,
 
  I have lots of crawled data, indexed in my Solr (4.3.0) and lets say user
  creates a search criteria 'X1' and he/she wants to know the occurrence
 of a
  specific term in the result set of that 'X1' search criteria.
  And then again he/she creates another search criteria 'X2' and he/she
 wants
  to know the occurrence of that same term in the result set of that 'X2'
  search criteria.
 
  At the moment if I give termfreq(field,term) then it gives me the term
  frequency per document and if I use totaltermfreq(field,term), it gives
 me
  the total term frequency in entire index not in the result set of my
 search
  criteria.
 
  So what I need is your help to find how to how to get total occurrence
 of a
  term in query's result set.
 
  If this is my result set
 
  doc
  str name=typeMovies/str
  str name=formatdvd/str
  str name=productThe Hunger Games/str/doc
 
doc
  str name=typeBooks/str
  str name=formatpaperback/str
  str name=productThe Hunger Book/str/doc
 
  And I am looking for term 'hunger' in product field then I want to get
  value = '2' , and if I am searching for term 'games' in product field I
  want to get value = '1' .
 
  Thanks,
  Tony



RE: SOLR 4.0 frequent admin problem

2013-07-04 Thread David Quarterman
Cheers, Roman! It was a default Jetty set up so now added a 'work' directory 
and that's in use now.

-Original Message-
From: Roman Chyla [mailto:roman.ch...@gmail.com] 
Sent: 04 July 2013 15:00
To: solr-user@lucene.apache.org
Subject: Re: SOLR 4.0 frequent admin problem

Yes :-)  see SOLR-118, seems an old issue...
On 4 Jul 2013 06:43, David Quarterman da...@corexe.com wrote:

 Hi,

 About once a week the admin system comes up with SolrCore 
 Initialization Failures. There's nothing in the logs and SOLR 
 continues to work in the application it's supporting and in the 'direct 
 access' mode (i.e.
 http://123.465.789.100:8080/solr/collection1/select?q=bingo:*).

 The cure is to restart Jetty (8.1.7) and then we can use the admin 
 system again via pc's. However, a colleague can get into admin on an 
 iPad with no trouble when no browser on a pc can!

 Anyone any ideas? It's really frustrating!

 Best regards,

 DQ




Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Tony Mullins
So what is the workaround for this problem ?
Can it be done without changing any source code ?

Thanks,
Tony


On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote:

 Ah, sorry - I thought you were after docfreq, not termfreq.
 -Yonik
 http://lucidworks.com

 On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com
 wrote:
  Hi Yonik,
 
  With facet it didn't work.
 
  Please see the result set doc below
 
 
 http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20
 
  doc
   str name=id27/str
   str name=typeMovies/str
str name=formatdvd/str
str name=productThe amazing spider man is amazing spider the
  spider/str
int name=popularity1/int
long name=_version_1439641369145507840/long
 
int name=amazing_freq2/int
int name=spider_freq3/int
/doc
/resultlst name=facet_countslst name=facet_queries
int name=product:spider1/int
 int name=product:amazing1/int
  /lst
 
  As you can see facet is actually just returning the no. of docs found
  against those keywrods not the actual frequency.
  Actual frequency is returned by the field 'amazing_freq'  'spider_freq'
 !
 
  So is there any workaround for this to get the total of term-frequency in
  resultset without any modification to Solr source code ?
 
 
  Thanks,
  Tony
 
 
  On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com
 wrote:
 
  If you just want to retrieve those counts, this seems like simple
 faceting.
 
  q=something
  facet=true
  facet.query=product:hunger
  facet.query=product:games
 
  -Yonik
  http://lucidworks.com
 
  On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com
  wrote:
   Hi ,
  
   I have lots of crawled data, indexed in my Solr (4.3.0) and lets say
 user
   creates a search criteria 'X1' and he/she wants to know the occurrence
  of a
   specific term in the result set of that 'X1' search criteria.
   And then again he/she creates another search criteria 'X2' and he/she
  wants
   to know the occurrence of that same term in the result set of that
 'X2'
   search criteria.
  
   At the moment if I give termfreq(field,term) then it gives me the term
   frequency per document and if I use totaltermfreq(field,term), it
 gives
  me
   the total term frequency in entire index not in the result set of my
  search
   criteria.
  
   So what I need is your help to find how to how to get total occurrence
  of a
   term in query's result set.
  
   If this is my result set
  
   doc
   str name=typeMovies/str
   str name=formatdvd/str
   str name=productThe Hunger Games/str/doc
  
 doc
   str name=typeBooks/str
   str name=formatpaperback/str
   str name=productThe Hunger Book/str/doc
  
   And I am looking for term 'hunger' in product field then I want to get
   value = '2' , and if I am searching for term 'games' in product field
 I
   want to get value = '1' .
  
   Thanks,
   Tony
 



Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Jack Krupansky
These statistics are use for determining document relevance or score for the 
query itself. As such, they are one of two things: 1) (per field) per 
document, or for the universe of documents in the collection. That's it, one 
of the two.


You keep referring to ResultSet, but there is no such concept in relevancy 
or scoring, at least in the Lucene model for relevancy and scoring.


If you might more details on Lucene/Solr scoring, see:
http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Feel free to propose an alternative model to relevancy and scoring, but 
don't expect an implementation of such a model in the near-term.


You might also be able to implement your alternative model for relevance and 
scoring using a custom Similarity (scoring) plug-in, coupled with custom 
Value Sources to expose whatever alternative metrics you wish.


But, before you embark on such a venture, be aware that the performance of 
such an alternative relevance model might not be as appealing as you might 
want. You'll have to do a proof of concept to see how well things actually 
work out.


-- Jack Krupansky

-Original Message- 
From: Tony Mullins

Sent: Thursday, July 04, 2013 12:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Total Term Frequency per ResultSet in Solr 4.3 ?

So what is the workaround for this problem ?
Can it be done without changing any source code ?

Thanks,
Tony


On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote:


Ah, sorry - I thought you were after docfreq, not termfreq.
-Yonik
http://lucidworks.com

On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com
wrote:
 Hi Yonik,

 With facet it didn't work.

 Please see the result set doc below


http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20

 doc
  str name=id27/str
  str name=typeMovies/str
   str name=formatdvd/str
   str name=productThe amazing spider man is amazing spider the
 spider/str
   int name=popularity1/int
   long name=_version_1439641369145507840/long

   int name=amazing_freq2/int
   int name=spider_freq3/int
   /doc
   /resultlst name=facet_countslst name=facet_queries
   int name=product:spider1/int
int name=product:amazing1/int
 /lst

 As you can see facet is actually just returning the no. of docs found
 against those keywrods not the actual frequency.
 Actual frequency is returned by the field 'amazing_freq'  'spider_freq'
!

 So is there any workaround for this to get the total of term-frequency 
 in

 resultset without any modification to Solr source code ?


 Thanks,
 Tony


 On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com
wrote:

 If you just want to retrieve those counts, this seems like simple
faceting.

 q=something
 facet=true
 facet.query=product:hunger
 facet.query=product:games

 -Yonik
 http://lucidworks.com

 On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins tonymullins...@gmail.com
 wrote:
  Hi ,
 
  I have lots of crawled data, indexed in my Solr (4.3.0) and lets say
user
  creates a search criteria 'X1' and he/she wants to know the 
  occurrence

 of a
  specific term in the result set of that 'X1' search criteria.
  And then again he/she creates another search criteria 'X2' and he/she
 wants
  to know the occurrence of that same term in the result set of that
'X2'
  search criteria.
 
  At the moment if I give termfreq(field,term) then it gives me the 
  term

  frequency per document and if I use totaltermfreq(field,term), it
gives
 me
  the total term frequency in entire index not in the result set of my
 search
  criteria.
 
  So what I need is your help to find how to how to get total 
  occurrence

 of a
  term in query's result set.
 
  If this is my result set
 
  doc
  str name=typeMovies/str
  str name=formatdvd/str
  str name=productThe Hunger Games/str/doc
 
doc
  str name=typeBooks/str
  str name=formatpaperback/str
  str name=productThe Hunger Book/str/doc
 
  And I am looking for term 'hunger' in product field then I want to 
  get

  value = '2' , and if I am searching for term 'games' in product field
I
  want to get value = '1' .
 
  Thanks,
  Tony






Re: Auto Soft commit not working !!!

2013-07-04 Thread Daniel Collins
You should see the commit messages in the solr logs, do they come up at the
expected frequency?


On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote:

 My solr config has :

  autoCommit
maxTime15000/maxTime
openSearcherfalse/openSearcher
  /autoCommit

 !-- softAutoCommit is like autoCommit except it causes a
  'soft' commit which only ensures that changes are visible
  but does not ensure that data is synced to disk.  This is
  faster and more near-realtime friendly than a hard commit.
   --
autoSoftCommit
  maxTime1000/maxTime
/autoSoftCommit


 Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
 over tomcat.


 Still when i am adding documents to solr and searching its returning 0
 hits. Its taking long before the document actually starts showing up.

 Can somebody help.

 Thanks



Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread P Williams
Hi Tony,

Have you seen the
TermVectorComponenthttp://wiki.apache.org/solr/TermVectorComponent?
 It will return the TermVectors for the documents in your result set (note
that the rows parameter matters if you want results for the whole set, the
default is 10).  TermVectors also must be stored for each field that you
want term frequency returned for.  Suppose you have the query
http://localhost:8983/solr/collection1/tvrh?q=cablefl=includestv.tf=true on
the example that comes packaged with Solr.  Then part of the response is:

lst name=termVectors
str name=uniqueKeyFieldNameid/str
lst name=IW-02
str name=uniqueKeyIW-02/str
/lst
lst name=9885A004
str name=uniqueKey9885A004/str
lst name=includes
lst name=32mb
int name=tf1/int
/lst
lst name=av
int name=tf1/int
/lst
lst name=battery
int name=tf1/int
/lst
lst name=cable
int name=tf2/int
/lst
lst name=card
int name=tf1/int
/lst
lst name=sd
int name=tf1/int
/lst
lst name=usb
int name=tf1/int
/lst
/lst
/lst
lst name=3007WFP
str name=uniqueKey3007WFP/str
lst name=includes
lst name=cable
int name=tf1/int
/lst
lst name=usb
int name=tf1/int
/lst
/lst
/lst
lst name=MA147LL/A
str name=uniqueKeyMA147LL/A/str
lst name=includes
lst name=cable
int name=tf1/int
/lst
lst name=earbud
int name=tf1/int
/lst
lst name=headphones
int name=tf1/int
/lst
lst name=usb
int name=tf1/int
/lst
/lst
/lst
/lst

Then you can use an XPath query like
sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to
calculate the term frequency in the 'includes' field for the whole result
set.  You could extend this to get the term frequency across all fields for
your result set with some alterations to the query and schema.xml
configuration.  Alternately you could get the response as json (wt=json)
and use javascript to sum. I know this is not terribly efficient but, if
I'm understanding your request correctly, it's possible.

Cheers,
Tricia


On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins tonymullins...@gmail.comwrote:

 So what is the workaround for this problem ?
 Can it be done without changing any source code ?

 Thanks,
 Tony


 On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com wrote:

  Ah, sorry - I thought you were after docfreq, not termfreq.
  -Yonik
  http://lucidworks.com
 
  On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins tonymullins...@gmail.com
  wrote:
   Hi Yonik,
  
   With facet it didn't work.
  
   Please see the result set doc below
  
  
 
 http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20
  
   doc
str name=id27/str
str name=typeMovies/str
 str name=formatdvd/str
 str name=productThe amazing spider man is amazing spider the
   spider/str
 int name=popularity1/int
 long name=_version_1439641369145507840/long
  
 int name=amazing_freq2/int
 int name=spider_freq3/int
 /doc
 /resultlst name=facet_countslst name=facet_queries
 int name=product:spider1/int
  int name=product:amazing1/int
   /lst
  
   As you can see facet is actually just returning the no. of docs found
   against those keywrods not the actual frequency.
   Actual frequency is returned by the field 'amazing_freq' 
 'spider_freq'
  !
  
   So is there any workaround for this to get the total of term-frequency
 in
   resultset without any modification to Solr source code ?
  
  
   Thanks,
   Tony
  
  
   On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com
  wrote:
  
   If you just want to retrieve those counts, this seems like simple
  faceting.
  
   q=something
   facet=true
   facet.query=product:hunger
   facet.query=product:games
  
   -Yonik
   http://lucidworks.com
  
   On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins 
 tonymullins...@gmail.com
   wrote:
Hi ,
   
I have lots of crawled data, indexed in my Solr (4.3.0) and lets say
  user
creates a search criteria 'X1' and he/she wants to know the
 occurrence
   of a
specific term in the result set of that 'X1' search criteria.
And then again he/she creates another search criteria 'X2' and
 he/she
   wants
to know the occurrence of that same term in the result set of that
  'X2'
search criteria.
   
At the moment if I give termfreq(field,term) then it gives me the
 term
frequency per document and if I use totaltermfreq(field,term), it
  gives
   me
the total term frequency in entire index not in the result set of my
   search
criteria.
   
So what I need is your help to find how to how to get total
 occurrence
   of a
term in query's result set.
   
If this is my result set
   
doc
str name=typeMovies/str
str name=formatdvd/str
str name=productThe Hunger Games/str/doc
   
  doc
str name=typeBooks/str
str name=formatpaperback/str
str name=productThe 

Re: Find related words

2013-07-04 Thread Koji Sekiguchi

Hi Dotan,

(13/07/04 23:51), Dotan Cohen wrote:

Thank you Jack and Koji. I will take a look at MLT and also at the
.zip files from LUCENE-474. Koji, did you have to modify the code for
the latest Solr?


Yes. As the Lucene APIs for accessing index have been changed,
I had to modify the code.

koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Re: Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
I checked with the tomcat logs. Although the config says it to commit every
15000ms

autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit


Strangely there are no commit logs. Did i miss anything?


-

Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
tomcat . The index size is 10.95 GB. With this configuration it takes more
than 60 seconds to return the indexed document. When adding documents to
solr and searching after soft commit time, its returning 0 hits. Its taking
long before the document actually starts showing up, even more than the
autoCommit interval.

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/openSearcher
 /autoCommit

   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
tomcat.








On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.comwrote:

 You should see the commit messages in the solr logs, do they come up at the
 expected frequency?


 On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote:

  My solr config has :
 
   autoCommit
 maxTime15000/maxTime
 openSearcherfalse/openSearcher
   /autoCommit
 
  !-- softAutoCommit is like autoCommit except it causes a
   'soft' commit which only ensures that changes are visible
   but does not ensure that data is synced to disk.  This is
   faster and more near-realtime friendly than a hard commit.
--
 autoSoftCommit
   maxTime1000/maxTime
 /autoSoftCommit
 
 
  Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
  over tomcat.
 
 
  Still when i am adding documents to solr and searching its returning 0
  hits. Its taking long before the document actually starts showing up.
 
  Can somebody help.
 
  Thanks
 



Re: Auto Soft commit not working !!!

2013-07-04 Thread Jack Krupansky

1. Do you have an update processor chain that doesn't have RunUpdate in it?

2. Is the updateLog solrconfig directive missing?

3. Is _version_ missing from your schema?

-- Jack Krupansky

-Original Message- 
From: Rohit Kumar

Sent: Thursday, July 04, 2013 9:22 PM
To: solr-user@lucene.apache.org
Subject: Re: Auto Soft commit not working !!!

I checked with the tomcat logs. Although the config says it to commit every
15000ms

autoCommit
  maxTime15000/maxTime
  openSearcherfalse/openSearcher
/autoCommit


Strangely there are no commit logs. Did i miss anything?


-

Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
tomcat . The index size is 10.95 GB. With this configuration it takes more
than 60 seconds to return the indexed document. When adding documents to
solr and searching after soft commit time, its returning 0 hits. Its taking
long before the document actually starts showing up, even more than the
autoCommit interval.

autoCommit
  maxTime15000/maxTime
  openSearcherfalse/openSearcher
/autoCommit

  autoSoftCommit
maxTime1000/maxTime
  /autoSoftCommit

Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
tomcat.








On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins 
danwcoll...@gmail.comwrote:


You should see the commit messages in the solr logs, do they come up at 
the

expected frequency?


On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote:

 My solr config has :

  autoCommit
maxTime15000/maxTime
openSearcherfalse/openSearcher
  /autoCommit

 !-- softAutoCommit is like autoCommit except it causes a
  'soft' commit which only ensures that changes are visible
  but does not ensure that data is synced to disk.  This is
  faster and more near-realtime friendly than a hard commit.
   --
autoSoftCommit
  maxTime1000/maxTime
/autoSoftCommit


 Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
 over tomcat.


 Still when i am adding documents to solr and searching its returning 0
 hits. Its taking long before the document actually starts showing up.

 Can somebody help.

 Thanks






Re: Moving from single Solr instance to Solr Cloud

2013-07-04 Thread Otis Gospodnetic
Hello,

In SolrCloud works Collections (logical indices) have shards and
replicas, so you would probably want to create a new Collection with
some number of shards and replicas and reindex into it.  That would be
the cleanest.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Wed, Jul 3, 2013 at 9:10 PM, Ali, Saqib docbook@gmail.com wrote:
 We have single Solr instance with lot of indexed document. Now we would
 like to move to SolrCloud implementation.

 Can we move the existing index to SolrCloud? If so, how? Or do we need to
 reindex our data in SolrCloud?

 Thanks,
 Saqib


Re: Auto Soft commit not working !!!

2013-07-04 Thread Rohit Kumar
1. Do you have an update processor chain that doesn't have RunUpdate in it?*- No
*

2. Is the updateLog solrconfig directive missing? - *Bang On. It was
still commented !!!*

3. Is _version_ missing from your schema?  *Checked it. and its present


*
*I will test again and update soon .


*
*Thanks

*



On Fri, Jul 5, 2013 at 8:30 AM, Jack Krupansky j...@basetechnology.comwrote:

 1. Do you have an update processor chain that doesn't have RunUpdate in it?

 2. Is the updateLog solrconfig directive missing?

 3. Is _version_ missing from your schema?

 -- Jack Krupansky

 -Original Message- From: Rohit Kumar
 Sent: Thursday, July 04, 2013 9:22 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Auto Soft commit not working !!!


 I checked with the tomcat logs. Although the config says it to commit every
 15000ms

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/**openSearcher
 /autoCommit


 Strangely there are no commit logs. Did i miss anything?


 --**--**
 -

 Having issues in Soft Auto commit (Near Real Time). Am using solr 4.0 on
 tomcat . The index size is 10.95 GB. With this configuration it takes more
 than 60 seconds to return the indexed document. When adding documents to
 solr and searching after soft commit time, its returning 0 hits. Its taking
 long before the document actually starts showing up, even more than the
 autoCommit interval.

 autoCommit
   maxTime15000/maxTime
   openSearcherfalse/**openSearcher
 /autoCommit

   autoSoftCommit
 maxTime1000/maxTime
   /autoSoftCommit

 Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running over
 tomcat.








 On Fri, Jul 5, 2013 at 12:13 AM, Daniel Collins danwcoll...@gmail.com
 wrote:

  You should see the commit messages in the solr logs, do they come up at
 the
 expected frequency?


 On 4 July 2013 15:35, Rohit Kumar rohit.kku...@gmail.com wrote:

  My solr config has :
 
   autoCommit
 maxTime15000/maxTime
 openSearcherfalse/**openSearcher
   /autoCommit
 
  !-- softAutoCommit is like autoCommit except it causes a
   'soft' commit which only ensures that changes are visible
   but does not ensure that data is synced to disk.  This is
   faster and more near-realtime friendly than a hard commit.
--
 autoSoftCommit
   maxTime1000/maxTime
 /autoSoftCommit
 
 
  Machine is ubuntu 13 / 4 cores / 16GB RAM. Given 6gb to Solr running
  over tomcat.
 
 
  Still when i am adding documents to solr and searching its returning 0
  hits. Its taking long before the document actually starts showing up.
 
  Can somebody help.
 
  Thanks
 





Early Access Release #2 for Solr 4.x Deep Dive book is now available for download on Lulu.com

2013-07-04 Thread Jack Krupansky
Okay, it’s hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #2 
is now available for purchase and download as an e-book for $9.99 on Lulu.com 
at:

http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html


(That link says “1”, but it apparently correctly redirects to EAR #2.)

My recent blog posts over the past two weeks detailed the changes from EAR#1. A 
lot of them were formatting and indexing, but a couple more scripting update 
processor examples, and a new “Solr Hot Spots” preface section to point the 
reader to interesting sections worth checking out, such as the grammars for the 
various query parsers, a complete list of functions, and complete lists of char 
filters, tokenizers, token filters, and update processors.

See:
http://basetechnology.blogspot.com/

The next EAR will be in approximately two weeks, contents TBD.

If you have purchased EAR#1, there is no need to rush out and pick up EAR#2. I 
mean, the technical content changes were only modest, and EAR#3 will be out in 
another two weeks anyway. That said, EAR#2 is a significant improvement over 
EAR#1.

-- Jack Krupansky

Re: Concurrent Modification Exception

2013-07-04 Thread Dmitry Kan
Can you repeat the test with for example jetty? In case jboss (?) has some
issues in the case.

What type of query was this?
On 2 Jul 2013 19:27, adityab aditya_ba...@yahoo.com wrote:

 Anyone , any suggestion or pointers for this issue?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Concurrent-Modification-Exception-tp4074371p4074829.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Total Term Frequency per ResultSet in Solr 4.3 ?

2013-07-04 Thread Tony Mullins
OK.  Thanks Tricia , Jack  Yonik for your suggestions and time.

Regards,
Tony.


On Fri, Jul 5, 2013 at 1:20 AM, P Williams
williams.tricia.l...@gmail.comwrote:

 Hi Tony,

 Have you seen the
 TermVectorComponenthttp://wiki.apache.org/solr/TermVectorComponent?
  It will return the TermVectors for the documents in your result set (note
 that the rows parameter matters if you want results for the whole set, the
 default is 10).  TermVectors also must be stored for each field that you
 want term frequency returned for.  Suppose you have the query
 http://localhost:8983/solr/collection1/tvrh?q=cablefl=includestv.tf=trueon
 the example that comes packaged with Solr.  Then part of the response is:

 lst name=termVectors
 str name=uniqueKeyFieldNameid/str
 lst name=IW-02
 str name=uniqueKeyIW-02/str
 /lst
 lst name=9885A004
 str name=uniqueKey9885A004/str
 lst name=includes
 lst name=32mb
 int name=tf1/int
 /lst
 lst name=av
 int name=tf1/int
 /lst
 lst name=battery
 int name=tf1/int
 /lst
 lst name=cable
 int name=tf2/int
 /lst
 lst name=card
 int name=tf1/int
 /lst
 lst name=sd
 int name=tf1/int
 /lst
 lst name=usb
 int name=tf1/int
 /lst
 /lst
 /lst
 lst name=3007WFP
 str name=uniqueKey3007WFP/str
 lst name=includes
 lst name=cable
 int name=tf1/int
 /lst
 lst name=usb
 int name=tf1/int
 /lst
 /lst
 /lst
 lst name=MA147LL/A
 str name=uniqueKeyMA147LL/A/str
 lst name=includes
 lst name=cable
 int name=tf1/int
 /lst
 lst name=earbud
 int name=tf1/int
 /lst
 lst name=headphones
 int name=tf1/int
 /lst
 lst name=usb
 int name=tf1/int
 /lst
 /lst
 /lst
 /lst

 Then you can use an XPath query like
 sum(//lst[@name='cable']/int[@name='tf']) where 'cable' was the term, to
 calculate the term frequency in the 'includes' field for the whole result
 set.  You could extend this to get the term frequency across all fields for
 your result set with some alterations to the query and schema.xml
 configuration.  Alternately you could get the response as json (wt=json)
 and use javascript to sum. I know this is not terribly efficient but, if
 I'm understanding your request correctly, it's possible.

 Cheers,
 Tricia


 On Thu, Jul 4, 2013 at 10:24 AM, Tony Mullins tonymullins...@gmail.com
 wrote:

  So what is the workaround for this problem ?
  Can it be done without changing any source code ?
 
  Thanks,
  Tony
 
 
  On Thu, Jul 4, 2013 at 8:01 PM, Yonik Seeley yo...@lucidworks.com
 wrote:
 
   Ah, sorry - I thought you were after docfreq, not termfreq.
   -Yonik
   http://lucidworks.com
  
   On Thu, Jul 4, 2013 at 10:57 AM, Tony Mullins 
 tonymullins...@gmail.com
   wrote:
Hi Yonik,
   
With facet it didn't work.
   
Please see the result set doc below
   
   
  
 
 http://localhost:8080/solr/collection2/select?fl=*,amazing_freq:termfreq%28product,%27amazing%27%29,spider_freq:termfreq%28product,%27spider%27%29fq=id%3A27q=spiderfl=*df=productwt=xmlindent=truefacet=truefacet.query=product:spiderfacet.query=product:amazingrows=20
   
doc
 str name=id27/str
 str name=typeMovies/str
  str name=formatdvd/str
  str name=productThe amazing spider man is amazing spider the
spider/str
  int name=popularity1/int
  long name=_version_1439641369145507840/long
   
  int name=amazing_freq2/int
  int name=spider_freq3/int
  /doc
  /resultlst name=facet_countslst name=facet_queries
  int name=product:spider1/int
   int name=product:amazing1/int
/lst
   
As you can see facet is actually just returning the no. of docs found
against those keywrods not the actual frequency.
Actual frequency is returned by the field 'amazing_freq' 
  'spider_freq'
   !
   
So is there any workaround for this to get the total of
 term-frequency
  in
resultset without any modification to Solr source code ?
   
   
Thanks,
Tony
   
   
On Thu, Jul 4, 2013 at 7:05 PM, Yonik Seeley yo...@lucidworks.com
   wrote:
   
If you just want to retrieve those counts, this seems like simple
   faceting.
   
q=something
facet=true
facet.query=product:hunger
facet.query=product:games
   
-Yonik
http://lucidworks.com
   
On Thu, Jul 4, 2013 at 9:45 AM, Tony Mullins 
  tonymullins...@gmail.com
wrote:
 Hi ,

 I have lots of crawled data, indexed in my Solr (4.3.0) and lets
 say
   user
 creates a search criteria 'X1' and he/she wants to know the
  occurrence
of a
 specific term in the result set of that 'X1' search criteria.
 And then again he/she creates another search criteria 'X2' and
  he/she
wants
 to know the occurrence of that same term in the result set of that
   'X2'
 search criteria.

 At the moment if I give termfreq(field,term) then it gives me the
  term
 frequency per document and if I use totaltermfreq(field,term), it
   gives
me
 the total term frequency in entire index not in the result set of
 my
search
 criteria.

 So what I need is your help to