Facets for fields in subdocuments with block join, is it possible?

2014-02-11 Thread Henning Ivan Solberg

Hello,

I'm testing block join in solr 4.6.1 and wondering, is it possible to 
get facets for fields in subdocuments with number of hits based on ROOT 
documents?


See example below:

doc
documentPartROOT/documentPart
texttesting 123/text
titletitle/test
groupGRP/group
subdocument
field3khat/field3
field47000/field4
field5purchase/field5
/subdocoment
subdocument
field3cannabis/field3
field4500/field4
field5sale/field5
/subdocoment
/doc

My query looks like this:

solrQuery.setQuery(text:testing);
solrQuery.setFilterQueries({!parent 
which=\dokumentPart:ROOT\}field3:khat);

solrQuery.setFacet(true);
solrQuery.addFacetField(group,field5);

This does not give me any facets for the subdocument fields, so i'm 
thinking, could a solution be to execute a second query to get the 
facets for the subdocument by join from parent to child whith a {!child 
of=} query like this:


solrQuery.setQuery({!child of=\dokumentPart:ROOT\}text:testing);
solrQuery.setFilterQueries(field3:khat);
solrQuery.setFacet(true);
solrQuery.addFacetField(field5,field4, field3);

The problem with this method is that the facet count will be based on 
sub documents and not ROOT/parent documents...


Is there a silver bullet for this kind of requirement?

Yours faithfully

Henning Solberg



Re: Group.Facet issue in Sharded Solr Setup

2014-02-11 Thread rks_lucene
Quick follow up on my question below and if anyone is using Group.facets in a
sharded solr setup ?

Based on further testing, the group.facets counts dont seem reliable at all
for lesser popular items in the facet list.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-Facet-issue-in-Sharded-Solr-Setup-tp4116077p4116635.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets for fields in subdocuments with block join, is it possible?

2014-02-11 Thread Mikhail Khludnev
Hello Henning,

There is no open source facet component for child level of block-join.
There is no even open jira for this.

Don.t think it helps.
11.02.2014 12:22 пользователь Henning Ivan Solberg h...@lovdata.no
написал:

 Hello,

 I'm testing block join in solr 4.6.1 and wondering, is it possible to get
 facets for fields in subdocuments with number of hits based on ROOT
 documents?

 See example below:

 doc
 documentPartROOT/documentPart
 texttesting 123/text
 titletitle/test
 groupGRP/group
 subdocument
 field3khat/field3
 field47000/field4
 field5purchase/field5
 /subdocoment
 subdocument
 field3cannabis/field3
 field4500/field4
 field5sale/field5
 /subdocoment
 /doc

 My query looks like this:

 solrQuery.setQuery(text:testing);
 solrQuery.setFilterQueries({!parent which=\dokumentPart:ROOT\}
 field3:khat);
 solrQuery.setFacet(true);
 solrQuery.addFacetField(group,field5);

 This does not give me any facets for the subdocument fields, so i'm
 thinking, could a solution be to execute a second query to get the facets
 for the subdocument by join from parent to child whith a {!child of=} query
 like this:

 solrQuery.setQuery({!child of=\dokumentPart:ROOT\}text:testing);
 solrQuery.setFilterQueries(field3:khat);
 solrQuery.setFacet(true);
 solrQuery.addFacetField(field5,field4, field3);

 The problem with this method is that the facet count will be based on sub
 documents and not ROOT/parent documents...

 Is there a silver bullet for this kind of requirement?

 Yours faithfully

 Henning Solberg




Set up embedded Solr container and cores programmatically to read their configs from the classpath

2014-02-11 Thread Robert Krüger
Hi,

I have an application with an embedded Solr instance (and I want to
keep it embedded) and so far I have been setting up my Solr
installation programmatically using folder paths to specify where the
specific container or core configs are.

I have used the CoreContainer methods createAndLoad and create using
File arguments and this works fine. However, now I want to change this
so that all configuration files are loaded from certain locations
using the classloader but I have not been able to get this to work.

E.g. I want to have my solr config located in the classpath at

my/base/package/solr/conf

and the core configs at

my/base/package/solr/cores/core1/conf,
my/base/package/solr/cores/core2/conf

etc..

Is this possible at all? Looking through the source code it seems that
specifying classpath resources in such a qualified way is not
supported but I may be wrong.

I could get this to work for the container by supplying my own
implementation of SolrResourceLoader that allows a base path to be
specified for the resources to be loaded (I first thought that would
happen already when specifying instanceDir accordingly but looking at
the code it does not. for resources loaded through the classloader,
instanceDir is not prepended). However then I am stuck with the
loading of the cores' resources as the respective code (see
org.apache.solr.core.CoreContainer#createFromLocal) instantiates a
SolResourceLoader internally.

Thanks for any help with this (be it a clarification that it is not possible).

Robert


How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Furkan KAMACI
Hi;

I've written a code that I can update a file to Zookeeper for SlorCloud.
Currently I have many configurations at Zookeeper for SolrCloud. I want to
update synonyms.txt file so I should know the currently linked
configuration (I will update the synonyms.txt file under appropriate
configuration folder) How can I learn it?

Thanks;
Furkan KAMACI


Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Alan Woodward
For a particular collection or core?  There should be a collection.configName 
property specified for the core or collection which tells you which ZK config 
directory is being used.

Alan Woodward
www.flax.co.uk


On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

 Hi;
 
 I've written a code that I can update a file to Zookeeper for SlorCloud.
 Currently I have many configurations at Zookeeper for SolrCloud. I want to
 update synonyms.txt file so I should know the currently linked
 configuration (I will update the synonyms.txt file under appropriate
 configuration folder) How can I learn it?
 
 Thanks;
 Furkan KAMACI



Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Furkan KAMACI
I am looking it for a particular collection.


2014-02-11 13:55 GMT+02:00 Alan Woodward a...@flax.co.uk:

 For a particular collection or core?  There should be a
 collection.configName property specified for the core or collection which
 tells you which ZK config directory is being used.

 Alan Woodward
 www.flax.co.uk


 On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

  Hi;
 
  I've written a code that I can update a file to Zookeeper for SlorCloud.
  Currently I have many configurations at Zookeeper for SolrCloud. I want
 to
  update synonyms.txt file so I should know the currently linked
  configuration (I will update the synonyms.txt file under appropriate
  configuration folder) How can I learn it?
 
  Thanks;
  Furkan KAMACI




Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Furkan KAMACI
Hi;

OK, I've checked the source code and implemented that:

   public String readConfigName(SolrZkClient zkClient, String collection)
throws KeeperException, InterruptedException {

  String configName = null;

  String path = ZkStateReader.COLLECTIONS_ZKNODE + / + collection;

  LOGGER.info(Load collection config from: + path);
  byte[] data = zkClient.getData(path, null, null, true);

  if (data != null) {
 ZkNodeProps props = ZkNodeProps.load(data);
 configName = props.getStr(CONFIGNAME_PROP);
  }

  if (configName != null  !zkClient.exists(CONFIGS_ZKNODE + / +
configName, true)) {
 LOGGER.error(Specified config does not exist in ZooKeeper: +
configName);
 throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
Specified config does not exist in ZooKeeper: + configName);
  }
  return configName;
   }

So, I can get the linked configuration name.

Thanks;
Furkan KAMACI


2014-02-11 13:57 GMT+02:00 Furkan KAMACI furkankam...@gmail.com:

 I am looking it for a particular collection.


 2014-02-11 13:55 GMT+02:00 Alan Woodward a...@flax.co.uk:

 For a particular collection or core?  There should be a
 collection.configName property specified for the core or collection which
 tells you which ZK config directory is being used.

 Alan Woodward
 www.flax.co.uk


 On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

  Hi;
 
  I've written a code that I can update a file to Zookeeper for SlorCloud.
  Currently I have many configurations at Zookeeper for SolrCloud. I want
 to
  update synonyms.txt file so I should know the currently linked
  configuration (I will update the synonyms.txt file under appropriate
  configuration folder) How can I learn it?
 
  Thanks;
  Furkan KAMACI





Re: Lowering query time

2014-02-11 Thread Joel Cohen
I'd like to thank you for lending a hand on my query time problem with
SolrCloud. By switching to a single shard with replicas setup, I've reduced
my query time to 18 msec. My full ingestion of 300k+ documents went down
from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
that are going in that should help a bit as well. Big thanks to everyone
that had suggestions.


On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch arafa...@gmail.comwrote:

 I suspect faceting is the issue here. The actual query you have shown
 seem to bring back a single document (or a single set of document for
 a product):
 fq=id:(320403401)

 On the other hand, you are asking for 4 field facets:
 facet.field=q_virtualCategory_ss
 facet.field=q_brand_s
 facet.field=q_color_s
 facet.field=q_category_ss
 AND 2 range facets, both clustered/grouped:
 facet.range=daysSinceStart_i
 facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)

 And for all facets you have asked to bring back ALL of the results:
 facet.limit=-1

 Plus, you are doing a complex sort:
 sort=popularity_i desc,popularity_i desc

 So, you are probably spending quite a bit of time counting (especially
 in a shared setup) and then quite a bit more sending the response
 back.

 I would check the size of the result document (HTTP result) and see
 how large it is. Maybe you don't need all of the stuff that's coming
 back. I assume you are not actually querying Solr from the client's
 machine (that is I hope it is inside your data centre close to your
 web server), otherwise I would say to look at automatic content
 compression as well to minimize on-wire document size.

 Finally, if your documents have many stored fields (store=true in
 schema.xml) but you only return small subsets of them during search,
 you could look into using enableLazyFieldLoading flag in the
 solrconfig.

 Regards,
Alex.
 P.s. As others said, you don't seem to have too many documents.
 Perhaps you want replication instead of sharding for improved
 performance.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
 alexey_kozhemia...@epam.com wrote:
  Btw timing for distributed requests are broken at this moment, it
 doesn't combine values from requests to shards.  I'm working on a patch.
 
  https://issues.apache.org/jira/browse/SOLR-3644
 
  -Original Message-
  From: Jack Krupansky [mailto:j...@basetechnology.com]
  Sent: Tuesday, February 04, 2014 22:00
  To: solr-user@lucene.apache.org
  Subject: Re: Lowering query time
 
  Add the debug=true parameter to some test queries and look at the
 timing
  section to see which search components are taking the time.
 Traditionally, highlighting for large documents was a top culprit.
 
  Are you returning a lot of data or field values? Sometimes reducing the
 amount of data processed can help. Any multivalued fields with lots of
 values?
 
  -- Jack Krupansky
 
  -Original Message-
  From: Joel Cohen
  Sent: Tuesday, February 4, 2014 1:43 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Lowering query time
 
  1. We are faceting. I'm not a developer so I'm not quite sure how we're
 doing it. How can I measure?
  2. I'm not sure how we'd force this kind of document partitioning. I can
 see how my shards are partitioned by looking at the clusterstate.json from
 Zookeeper, but I don't have a clue on how to get documents into specific
 shards.
 
  Would I be better off with fewer shards given the small size of my
 indexes?
 
 
  On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley yo...@heliosearch.com
 wrote:
 
  On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen joel.co...@bluefly.com
  wrote:
   I'm trying to get the query time down to ~15 msec. Anyone have any
   tuning recommendations?
 
  I guess it depends on what the slowest part of the query currently is.
   If you are faceting, it's often that.
  Also, it's often a big win if you can somehow partition documents such
  that requests can normally be serviced from a single shard.
 
  -Yonik
  http://heliosearch.org - native off-heap filters and fieldcache for
  solr
 
 
 
 
  --
 
  joel cohen, senior system engineer
 
  e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th
 st. new york, ny 10018 www.bluefly.com 
 http://www.bluefly.com/?referer=autosig | *fly since
  2013...*
 




-- 

joel cohen, senior system engineer

e joel.co...@bluefly.com p 212.944.8000 x276
bluefly, inc. 42 w. 39th st. new york, ny 10018
www.bluefly.com http://www.bluefly.com/?referer=autosig | *fly since
2013...*


Urgent Help. Best Way to have multiple OR Conditions for same field in SOLR

2014-02-11 Thread rajeev.nadgauda
HI,

I am new to SOLR , we have CRM data for Contacts and Companies which are in
millions, we have switched to SOLR for fast search results.

PROBLEM: We have large inclusion and exclusion lists with names of companies
or contacts.
Ex: Include or Exclude : company A  Company B  Company C  
Company n  where assume  n = 1;

What would be the best way to do this kind of a query using SOLR.

WHAT I HAVE TRIED: 
Setting q == field_name: (companyA OR companyB . OR Company n);
This works only for a list of 400 odd.

Looking forward for assistance on this.

Thank You,
Rajeev.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Urgent-Help-Best-Way-to-have-multiple-OR-Conditions-for-same-field-in-SOLR-tp4116681.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr-query with NOT and OR operator

2014-02-11 Thread Johannes Siegert

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and 
documents matching to (field2:value2).


But solr deliver only documents, that are the result of (field2:value2). 
I receive several documents, if I request only for ((-(field1:value1))).


Thanks!

Johannes


Re: solr-query with NOT and OR operator

2014-02-11 Thread Mikhail Khludnev
http://wiki.apache.org/solr/CommonQueryParameters#debugQuery
and
http://wiki.apache.org/solr/CommonQueryParameters#explainOther
usually help so much


On Tue, Feb 11, 2014 at 7:57 PM, Johannes Siegert 
johannes.sieg...@marktjagd.de wrote:

 Hi,

 my solr-request contains the following filter-query:

 fq=((-(field1:value1)))+OR+(field2:value2).

 I expect solr deliver documents matching to ((-(field1:value1))) and
 documents matching to (field2:value2).

 But solr deliver only documents, that are the result of (field2:value2). I
 receive several documents, if I request only for ((-(field1:value1))).

 Thanks!

 Johannes




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Tf-Idf for a specific query

2014-02-11 Thread David Miller
Hi Erick,

Slower queries for getting facets can be tolerated, as long as they don't
affect those without facets. The requirement is for a separate query which
can get me both term vector and facet counts.

One issue I am facing is that, for a search query I only want the term
vectors and facet counts, but not the results/docs. If I set the rows=0,
then term vectors are not returned. Could you suggest some way to achieve
the above.

Also it will be helpful to get a way to get aggregate TF of a term (across
all docs in the query).

Regards,
David






On Sat, Feb 8, 2014 at 10:49 AM, Erick Erickson erickerick...@gmail.comwrote:

 David:

 If you're, say, faceting on fields with lots of unique values, this
 will be quite expensive.
 No idea whether you can tolerate slower queries or not, just sayin'

 Erick

 On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com
 wrote:
  Thanks Mikhai,
 
  It seems that, this was what I was looking for. Being new to this, I
 wasn't
  aware of such a use of facets.
 
  Now I can probably combine the term vectors and facets to fit my
 scenario.
 
  Regards,
  Dave
 
 
  On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
  David,
 
  I can imagine that DF for resultset is facets!
 
 
  On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
  wrote:
 
   Hi Mikhail,
  
   The DF seems to be based on the entire document set. What I require is
   based on a the results of a single query.
  
   Suppose my Solr query returns a set of 50K documents from a superset
 of
   10Million documents, I require to calculate the DF just based on the
 50K
   documents. But currently it seems to be calculated on the entire doc
 set.
  
   So, is there any way to get the DF or IDF just on basis of the docs
   returned by the query?
  
   Regards,
   Dave
  
  
  
  
  
  
  
   On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
wrote:
  
Hello Dave
you can get DF from http://wiki.apache.org/solr/TermsComponent(invert
   it
yourself)
then, for certain term you can get number of occurrences per
 document
  by
http://wiki.apache.org/solr/FunctionQuery#tf
   
   
   
On Fri, Feb 7, 2014 at 3:58 AM, David Miller 
 davthehac...@gmail.com
wrote:
   
 Hi Guys..

 I require to obtain Tf-idf score from Solr for a certain set of
documents.
 But the catch is that, I needs the IDF (or DF) to be calculated on
  the
 documents returned by the specific query and not the entire
 corpus.

 Please provide me some hint on whether Solr has this feature or
 if I
   can
 use the Lucene Api directly to achieve this.


 Thanks in advance,
 Dave

   
   
   
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
   
http://www.griddynamics.com
 mkhlud...@griddynamics.com
   
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com
 



Re: solr-query with NOT and OR operator

2014-02-11 Thread Jack Krupansky
With so many parentheses in there, I wonder what you are really trying to 
do Try expressing your query in simple English first so that we can 
understand your goal.


But generally, a purely negative nested query must have a *:* term to apply 
the exclusion against:


fq=((*:* -(field1:value1)))+OR+(field2:value2).

-- Jack Krupansky

-Original Message- 
From: Johannes Siegert

Sent: Tuesday, February 11, 2014 10:57 AM
To: solr-user@lucene.apache.org
Subject: solr-query with NOT and OR operator

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and
documents matching to (field2:value2).

But solr deliver only documents, that are the result of (field2:value2).
I receive several documents, if I request only for ((-(field1:value1))).

Thanks!

Johannes 



Re: Lowering query time

2014-02-11 Thread Erick Erickson
Hmmm, I'm still a little puzzled BTW. 300K documents, unless they're
huge, shouldn't be taking 100 minutes. I can index 11M documents on
my laptop (Wikipedia dump) in 45 minutes for instance Of course
that's a single core, not cloud and not replicas...

So possibly it' on the data acquisition side? Is your Solr CPU pegged?

YMMV of course.

Erick


On Tue, Feb 11, 2014 at 6:40 AM, Joel Cohen joel.co...@bluefly.com wrote:

 I'd like to thank you for lending a hand on my query time problem with
 SolrCloud. By switching to a single shard with replicas setup, I've reduced
 my query time to 18 msec. My full ingestion of 300k+ documents went down
 from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
 that are going in that should help a bit as well. Big thanks to everyone
 that had suggestions.


 On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

  I suspect faceting is the issue here. The actual query you have shown
  seem to bring back a single document (or a single set of document for
  a product):
  fq=id:(320403401)
 
  On the other hand, you are asking for 4 field facets:
  facet.field=q_virtualCategory_ss
  facet.field=q_brand_s
  facet.field=q_color_s
  facet.field=q_category_ss
  AND 2 range facets, both clustered/grouped:
  facet.range=daysSinceStart_i
  facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
 
  And for all facets you have asked to bring back ALL of the results:
  facet.limit=-1
 
  Plus, you are doing a complex sort:
  sort=popularity_i desc,popularity_i desc
 
  So, you are probably spending quite a bit of time counting (especially
  in a shared setup) and then quite a bit more sending the response
  back.
 
  I would check the size of the result document (HTTP result) and see
  how large it is. Maybe you don't need all of the stuff that's coming
  back. I assume you are not actually querying Solr from the client's
  machine (that is I hope it is inside your data centre close to your
  web server), otherwise I would say to look at automatic content
  compression as well to minimize on-wire document size.
 
  Finally, if your documents have many stored fields (store=true in
  schema.xml) but you only return small subsets of them during search,
  you could look into using enableLazyFieldLoading flag in the
  solrconfig.
 
  Regards,
 Alex.
  P.s. As others said, you don't seem to have too many documents.
  Perhaps you want replication instead of sharding for improved
  performance.
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
  alexey_kozhemia...@epam.com wrote:
   Btw timing for distributed requests are broken at this moment, it
  doesn't combine values from requests to shards.  I'm working on a patch.
  
   https://issues.apache.org/jira/browse/SOLR-3644
  
   -Original Message-
   From: Jack Krupansky [mailto:j...@basetechnology.com]
   Sent: Tuesday, February 04, 2014 22:00
   To: solr-user@lucene.apache.org
   Subject: Re: Lowering query time
  
   Add the debug=true parameter to some test queries and look at the
  timing
   section to see which search components are taking the time.
  Traditionally, highlighting for large documents was a top culprit.
  
   Are you returning a lot of data or field values? Sometimes reducing the
  amount of data processed can help. Any multivalued fields with lots of
  values?
  
   -- Jack Krupansky
  
   -Original Message-
   From: Joel Cohen
   Sent: Tuesday, February 4, 2014 1:43 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Lowering query time
  
   1. We are faceting. I'm not a developer so I'm not quite sure how we're
  doing it. How can I measure?
   2. I'm not sure how we'd force this kind of document partitioning. I
 can
  see how my shards are partitioned by looking at the clusterstate.json
 from
  Zookeeper, but I don't have a clue on how to get documents into specific
  shards.
  
   Would I be better off with fewer shards given the small size of my
  indexes?
  
  
   On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley yo...@heliosearch.com
  wrote:
  
   On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen joel.co...@bluefly.com
   wrote:
I'm trying to get the query time down to ~15 msec. Anyone have any
tuning recommendations?
  
   I guess it depends on what the slowest part of the query currently is.
If you are faceting, it's often that.
   Also, it's often a big win if you can somehow partition documents such
   that requests can normally be serviced from a single shard.
  
   -Yonik
   http://heliosearch.org - native off-heap filters and fieldcache for
   solr
  
  
  
  
   --
  
   joel cohen, senior system engineer
  
   e joel.co...@bluefly.com p 212.944.8000 

Re: Urgent Help. Best Way to have multiple OR Conditions for same field in SOLR

2014-02-11 Thread Erick Erickson
right, 10K Boolean clauses are not very efficient. You actually can
up the limit here, but still...

Consider a post filter, here's a place to start:
http://lucene.apache.org/solr/4_3_1/solr-core/org/apache/solr/search/PostFilter.html

Best,
Erick


On Tue, Feb 11, 2014 at 6:47 AM, rajeev.nadgauda 
rajeev.nadga...@leadenrich.com wrote:

 HI,

 I am new to SOLR , we have CRM data for Contacts and Companies which are in
 millions, we have switched to SOLR for fast search results.

 PROBLEM: We have large inclusion and exclusion lists with names of
 companies
 or contacts.
 Ex: Include or Exclude : company A  Company B  Company C  
 Company n  where assume  n = 1;

 What would be the best way to do this kind of a query using SOLR.

 WHAT I HAVE TRIED:
 Setting q == field_name: (companyA OR companyB . OR Company
 n);
 This works only for a list of 400 odd.

 Looking forward for assistance on this.

 Thank You,
 Rajeev.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Urgent-Help-Best-Way-to-have-multiple-OR-Conditions-for-same-field-in-SOLR-tp4116681.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr-query with NOT and OR operator

2014-02-11 Thread Erick Erickson
Solr/Lucene is not strictly Boolean logic, this trips up a lot
of people.

Excellent blog on the subject here:
http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best,
Erick


On Tue, Feb 11, 2014 at 8:22 AM, Jack Krupansky j...@basetechnology.comwrote:

 With so many parentheses in there, I wonder what you are really trying to
 do Try expressing your query in simple English first so that we can
 understand your goal.

 But generally, a purely negative nested query must have a *:* term to
 apply the exclusion against:

 fq=((*:* -(field1:value1)))+OR+(field2:value2).

 -- Jack Krupansky

 -Original Message- From: Johannes Siegert
 Sent: Tuesday, February 11, 2014 10:57 AM
 To: solr-user@lucene.apache.org
 Subject: solr-query with NOT and OR operator


 Hi,

 my solr-request contains the following filter-query:

 fq=((-(field1:value1)))+OR+(field2:value2).

 I expect solr deliver documents matching to ((-(field1:value1))) and
 documents matching to (field2:value2).

 But solr deliver only documents, that are the result of (field2:value2).
 I receive several documents, if I request only for ((-(field1:value1))).

 Thanks!

 Johannes



Re: Is \'optimize\' necessary for a 45-segment Solr 4.6 index?

2014-02-11 Thread Shawn Heisey
On 2/11/2014 3:27 AM, Jäkel, Guido wrote:
 Dear Shawn,

 On 2/9/2014 11:41 PM, Arun Rangarajan wrote:
 I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
 'out of memory' error. Is optimize really necessary, since I read that
 lucene is able to handle multiple segments well now?
 It seems I currently run into the same problem while migrating from Solr 1.4 
 to Solr 4.6.1.

 I run into OOM-Problems -- after running a full, fresh re-index of your 
 catalogue data -- while optimizing an ~80GB core on a 16GB JVM. After about 
 one hour the heap explodes within a minute while  create compound file 
 _5b2.cfs. How to deal with this? Wit it happens because there are too much 
 small (about 30 @ 1..4GB) segments before optimize? It seem that they are 
 limited to this size by the defaults of the TieredMergePolicy. And, of 
 course: Is optimizedepreciated?

 Because it takes about 1h to reach the point of prolems any hints or 
 explanations will be helpful for me to save a lot of time!

Replying to a privately sent email on this thread:

I can't be sure that there are no memory leaks in Solr's program code,
but it is a rare thing, and I'm running 4.6.1 on a large system with a
smaller heap than yours without problems, so a memory leak is unlikely.
My setup DOES do index optimizes.

I have two guesses.  It could be either or both.  They are similar but
not identical.  There might be something else entirely, but these are
the most likely:

One guess is that you don't have enough RAM, leading to a performance
issue that compounds itself.  Adding the optimize pushes the system over
a threshold, everything slows down enough that the system tries to do
too much simultaneously, and it uses all the heap.

Assuming there's nothing else running on the machine, with an 80GB index
and a 16GB heap, a perfectly ideal server for this index would have 96GB
of RAM.  You might be able to get really good performance with 48GB, but
more would be better.  If it were me, I don't think I'd try it with less
than 64GB.

http://wiki.apache.org/solr/SolrPerformanceProblems#RAM

The other guess is that your Solr config and your request/index
characteristicsare resulting in a lot of heap usage, so when you add an
optimize on top of it, 16GB is not enough.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Re: solr-query with NOT and OR operator

2014-02-11 Thread Johannes Siegert

Hi Jack,

thanks!

fq=((*:* -(field1:value1)))+OR+(field2:value2).

This is the solution.

Johannes

Am 11.02.2014 17:22, schrieb Jack Krupansky:
With so many parentheses in there, I wonder what you are really trying 
to do Try expressing your query in simple English first so that we 
can understand your goal.


But generally, a purely negative nested query must have a *:* term to 
apply the exclusion against:


fq=((*:* -(field1:value1)))+OR+(field2:value2).

-- Jack Krupansky

-Original Message- From: Johannes Siegert
Sent: Tuesday, February 11, 2014 10:57 AM
To: solr-user@lucene.apache.org
Subject: solr-query with NOT and OR operator

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and
documents matching to (field2:value2).

But solr deliver only documents, that are the result of (field2:value2).
I receive several documents, if I request only for ((-(field1:value1))).

Thanks!

Johannes


--
Johannes Siegert
Softwareentwickler

Telefon:  0351 - 418 894 -73
Fax:  0351 - 418 894 -99
E-Mail:   johannes.sieg...@marktjagd.de
Xing: https://www.xing.com/profile/Johannes_Siegert2

Webseite: http://www.marktjagd.de
Blog: http://blog.marktjagd.de
Facebook: http://www.facebook.com/marktjagd
Twitter:  http://twitter.com/Marktjagd
__

Marktjagd GmbH | Schützenplatz 14 | D - 01067 Dresden

Geschäftsführung: Jan Großmann
Sitz Dresden | Amtsgericht Dresden | HRB 28678



Re: Is 'optimize' necessary for a 45-segment Solr 4.6 index?

2014-02-11 Thread Arun Rangarajan
Dear Shawn,
Thanks for your reply. For now, I did merges in steps with maxSegments
param (using HOST:PORT/CORE/update?optimize=truemaxSegments=10). First I
merged the 45 segments to 10, and then from 10 to 5. (Merging from 5 to 2
again caused out-of-memory exception.) Now I have a 5-segment index with
all segments roughly of equal sizes. Will try using that and see if that is
good enough for us.


On Sun, Feb 9, 2014 at 11:22 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/9/2014 11:41 PM, Arun Rangarajan wrote:
  I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
  'out of memory' error. Is optimize really necessary, since I read that
  lucene is able to handle multiple segments well now?

 I have had indexes with more than 45 segments, because of the merge
 settings that I use.  My large index shards are about 16GB at the
 moment.  Out of memory errors are very rare because I use a fairly large
 heap, at 6GB for a machine that hosts three of these large shards.  When
 I was still experimenting with my memory settings, I did see occasional
 out of memory errors during normal segment merging.

 Increasing your heap size is pretty much required at this point.  I've
 condensed some very basic information about heap sizing here:

 http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 As for whether optimizing on 4.x is necessary: I do not have any hard
 numbers for you, but I can tell you that an optimized index does seem
 noticeably faster than one that is freshly built and has has a large
 number of relatively large segments.

 I optimize my index shards on an schedule, but it is relatively
 infrequent -- one large shard per night.  Most of the time what I have
 is one really large segment and a bunch of super-small segments, and
 that does not seem to suffer from performance issues compared to a fully
 optimized index.  The situation is different right after a fresh
 rebuild, which produces a handful of very large segments and a bunch of
 smaller segments of varying sizes.

 Interesting but probably irrelevant details:

 Although I don't use mergeFactor any more, the TieredMergePolicy
 settings that I use are equivalent to a mergeFactor of 35.  I chose this
 number back in the 1.4.1 days because it resulted in synchronicity
 between merges and lucene segment names when LogByteSizeMergePolicy was
 still in use.  Segments _0 through _z would be merged into segment _10,
 and so on.

 Thanks,
 Shawn




handleSelect=true with SolrCloud

2014-02-11 Thread Jeff Wartes

I’m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0 at 
present.) The old query style relied on using /solr/select?qt=foo to select the 
proper requestHandler. I know handleSelect=true is deprecated now, but it’d be 
pretty handy for testing to be able to be backwards compatible, at least until 
some time after the initial release.

So in my SolrCloud configuration, I set requestDispatcher handleSelect=true” 
and deleted the /select requestHandler as suggested here: 
http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Resolution_.28qt_param.29

However, my /solr/collection1/select?qt=foo query throws an “unknown handler: 
null” error with this configuration. Has anyone successfully tried 
handleSelect=true with the collections api?

Thanks.




boost group doclist members

2014-02-11 Thread David Santamauro


Without falling into the x/y problem area, I'll explain what I want to 
do: I would like to group my result set by a field, f1 and within each 
group, I'd like to boost the score of the most appropriate member of 
the group so it appears first in the doc list.


The most appropriate member is defined by the content of other fields 
(e.g., f2, f3). So basically, I'd like to boost based on the values in 
fields f2 and f3.


If there is a better way to achieve this, I'm all ears. But I was 
thinking this could be achieved by using a function query as the 
sortspec to group.sort.


Example content:

doc
  field name=f14181770/field !-- integer --
  field name=f2x_val/field   !-- text --
  field name=f3100/field !-- integer --
/doc
doc
  field name=f14181770/field
  field name=f2y_val/field
  field name=f3100/field
/doc
doc
  field name=f14181770/field
  field name=f2z_val/field
  field name=f3100/field
/doc

All 3 of the above documents will be grouped into a doclist with 
groupValue=4181770. My questions is then, How do I make the document 
with f2=y_val appear first in the doclist. I've been playing with


group.field=f1
group.sort=query({!dismax qf=f2 bq=f2:y_val^100}) asc

... but I'm getting:
org.apache.solr.common.SolrException: Can't determine a Sort Order (asc 
or desc) in sort spec 'query({!dismax qf=f2 bq=f2:y_val^100.0}) asc', 
pos=14.


Can anyone point to a some examples of this?

thanks

David



Re: handleSelect=true with SolrCloud

2014-02-11 Thread Shawn Heisey

On 2/11/2014 10:21 AM, Jeff Wartes wrote:

I’m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0 at 
present.) The old query style relied on using /solr/select?qt=foo to select the 
proper requestHandler. I know handleSelect=true is deprecated now, but it’d be 
pretty handy for testing to be able to be backwards compatible, at least until 
some time after the initial release.

So in my SolrCloud configuration, I set requestDispatcher handleSelect=true” 
and deleted the /select requestHandler as suggested here: 
http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Resolution_.28qt_param.29

However, my /solr/collection1/select?qt=foo query throws an “unknown handler: 
null” error with this configuration. Has anyone successfully tried 
handleSelect=true with the collections api?


I'm pretty sure that if you won't have a handler named /select, then you 
need to have default=true as an attribute on one of your other handler 
definitions.


See line 715 of the example solrconfig.xml for Solr 3.5:

http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/example/solr/conf/solrconfig.xml?view=annotate

Thanks,
Shawn



Re: handleSelect=true with SolrCloud

2014-02-11 Thread Jeff Wartes

Got it in one. Thanks!


On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote:

On 2/11/2014 10:21 AM, Jeff Wartes wrote:
 I¹m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0
at present.) The old query style relied on using /solr/select?qt=foo to
select the proper requestHandler. I know handleSelect=true is deprecated
now, but it¹d be pretty handy for testing to be able to be backwards
compatible, at least until some time after the initial release.

 So in my SolrCloud configuration, I set requestDispatcher
handleSelect=true² and deleted the /select requestHandler as suggested
here: 
http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Re
solution_.28qt_param.29

 However, my /solr/collection1/select?qt=foo query throws an ³unknown
handler: null² error with this configuration. Has anyone successfully
tried handleSelect=true with the collections api?

I'm pretty sure that if you won't have a handler named /select, then you
need to have default=true as an attribute on one of your other handler
definitions.

See line 715 of the example solrconfig.xml for Solr 3.5:

http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/exam
ple/solr/conf/solrconfig.xml?view=annotate

Thanks,
Shawn




Re: USER NAME Baruch Labunski

2014-02-11 Thread Baruch
Hello Wiki admin,

 I would like to some value links. Can you please add me, my user name is 
Baruch Labunski


Thank You,
Baruch!



On Thursday, January 16, 2014 2:12:32 PM, Baruch bar...@rogers.com wrote:
 
Hello Wiki admin,

 I would like to some value links. Can you please add me, my user name is 
Baruch Labunski


Thank You,

Baruch!

Re: Lowering query time

2014-02-11 Thread Joel Cohen
It's a custom ingestion process. It does a big DB query and then inserts
stuff in batches. The batch size is tuneable.


On Tue, Feb 11, 2014 at 11:23 AM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I'm still a little puzzled BTW. 300K documents, unless they're
 huge, shouldn't be taking 100 minutes. I can index 11M documents on
 my laptop (Wikipedia dump) in 45 minutes for instance Of course
 that's a single core, not cloud and not replicas...

 So possibly it' on the data acquisition side? Is your Solr CPU pegged?

 YMMV of course.

 Erick


 On Tue, Feb 11, 2014 at 6:40 AM, Joel Cohen joel.co...@bluefly.com
 wrote:

  I'd like to thank you for lending a hand on my query time problem with
  SolrCloud. By switching to a single shard with replicas setup, I've
 reduced
  my query time to 18 msec. My full ingestion of 300k+ documents went down
  from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
  that are going in that should help a bit as well. Big thanks to everyone
  that had suggestions.
 
 
  On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
   I suspect faceting is the issue here. The actual query you have shown
   seem to bring back a single document (or a single set of document for
   a product):
   fq=id:(320403401)
  
   On the other hand, you are asking for 4 field facets:
   facet.field=q_virtualCategory_ss
   facet.field=q_brand_s
   facet.field=q_color_s
   facet.field=q_category_ss
   AND 2 range facets, both clustered/grouped:
   facet.range=daysSinceStart_i
   facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
  
   And for all facets you have asked to bring back ALL of the results:
   facet.limit=-1
  
   Plus, you are doing a complex sort:
   sort=popularity_i desc,popularity_i desc
  
   So, you are probably spending quite a bit of time counting (especially
   in a shared setup) and then quite a bit more sending the response
   back.
  
   I would check the size of the result document (HTTP result) and see
   how large it is. Maybe you don't need all of the stuff that's coming
   back. I assume you are not actually querying Solr from the client's
   machine (that is I hope it is inside your data centre close to your
   web server), otherwise I would say to look at automatic content
   compression as well to minimize on-wire document size.
  
   Finally, if your documents have many stored fields (store=true in
   schema.xml) but you only return small subsets of them during search,
   you could look into using enableLazyFieldLoading flag in the
   solrconfig.
  
   Regards,
  Alex.
   P.s. As others said, you don't seem to have too many documents.
   Perhaps you want replication instead of sharding for improved
   performance.
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
  
  
   On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
   alexey_kozhemia...@epam.com wrote:
Btw timing for distributed requests are broken at this moment, it
   doesn't combine values from requests to shards.  I'm working on a
 patch.
   
https://issues.apache.org/jira/browse/SOLR-3644
   
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Tuesday, February 04, 2014 22:00
To: solr-user@lucene.apache.org
Subject: Re: Lowering query time
   
Add the debug=true parameter to some test queries and look at the
   timing
section to see which search components are taking the time.
   Traditionally, highlighting for large documents was a top culprit.
   
Are you returning a lot of data or field values? Sometimes reducing
 the
   amount of data processed can help. Any multivalued fields with lots of
   values?
   
-- Jack Krupansky
   
-Original Message-
From: Joel Cohen
Sent: Tuesday, February 4, 2014 1:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Lowering query time
   
1. We are faceting. I'm not a developer so I'm not quite sure how
 we're
   doing it. How can I measure?
2. I'm not sure how we'd force this kind of document partitioning. I
  can
   see how my shards are partitioned by looking at the clusterstate.json
  from
   Zookeeper, but I don't have a clue on how to get documents into
 specific
   shards.
   
Would I be better off with fewer shards given the small size of my
   indexes?
   
   
On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley yo...@heliosearch.com
 
   wrote:
   
On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen joel.co...@bluefly.com
 
wrote:
 I'm trying to get the query time down to ~15 msec. Anyone have any
 tuning recommendations?
   
I guess it depends on what the slowest part of the query currently
 is.
 If you are faceting, it's often that.
 

Re: handleSelect=true with SolrCloud

2014-02-11 Thread Joel Bernstein
Jeff,

I believe the shards.qt parameter is what you're looking for. For example
when using the /elevate handler with SolrCloud I use the following url to
tell Solr to use the /elevate handler on the shards:

http://localhost:8983/solr/collection1/elevate?q=ipodwt=jsonindent=trueshards.qt=/elevate







Joel Bernstein
Search Engineer at Heliosearch


On Tue, Feb 11, 2014 at 1:01 PM, Jeff Wartes jwar...@whitepages.com wrote:


 Got it in one. Thanks!


 On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote:

 On 2/11/2014 10:21 AM, Jeff Wartes wrote:
  I¹m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0
 at present.) The old query style relied on using /solr/select?qt=foo to
 select the proper requestHandler. I know handleSelect=true is deprecated
 now, but it¹d be pretty handy for testing to be able to be backwards
 compatible, at least until some time after the initial release.
 
  So in my SolrCloud configuration, I set requestDispatcher
 handleSelect=true² and deleted the /select requestHandler as suggested
 here:
 
 http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Re
 solution_.28qt_param.29
 
  However, my /solr/collection1/select?qt=foo query throws an ³unknown
 handler: null² error with this configuration. Has anyone successfully
 tried handleSelect=true with the collections api?
 
 I'm pretty sure that if you won't have a handler named /select, then you
 need to have default=true as an attribute on one of your other handler
 definitions.
 
 See line 715 of the example solrconfig.xml for Solr 3.5:
 
 
 http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/exam
 ple/solr/conf/solrconfig.xml?view=annotate
 
 Thanks,
 Shawn
 




RE: handleSelect=true with SolrCloud

2014-02-11 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Hi Jeff, it is not with elevated, I am talking in the link of Relevancy / 
Boost/ Score.

Select productid from products where SKU = 101
Select Productid from products where ManufactureSKU = 101
Select Productid from product where SKU Like 101%
Select Productid from Product where ManufactureSKU like 101%
Select Productid from product where Name Like 101%
Select Productid from Product where Description like '%101%

Is there any way in Solr can search the exact match,starts with and anywhere.. 
in single solr query.

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Tuesday, February 11, 2014 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: handleSelect=true with SolrCloud

Jeff,

I believe the shards.qt parameter is what you're looking for. For example when 
using the /elevate handler with SolrCloud I use the following url to tell 
Solr to use the /elevate handler on the shards:

http://localhost:8983/solr/collection1/elevate?q=ipodwt=jsonindent=trueshards.qt=/elevate







Joel Bernstein
Search Engineer at Heliosearch


On Tue, Feb 11, 2014 at 1:01 PM, Jeff Wartes jwar...@whitepages.com wrote:


 Got it in one. Thanks!


 On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote:

 On 2/11/2014 10:21 AM, Jeff Wartes wrote:
  I¹m working on a port of a Solr service to SolrCloud. (Targeting 
 v4.6.0 at present.) The old query style relied on using 
 /solr/select?qt=foo to select the proper requestHandler. I know 
 handleSelect=true is deprecated now, but it¹d be pretty handy for 
 testing to be able to be backwards compatible, at least until some time 
 after the initial release.
 
  So in my SolrCloud configuration, I set requestDispatcher 
 handleSelect=true² and deleted the /select requestHandler as 
 suggested
 here:
 
 http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue
 _Re
 solution_.28qt_param.29
 
  However, my /solr/collection1/select?qt=foo query throws an 
 ³unknown
 handler: null² error with this configuration. Has anyone 
 successfully tried handleSelect=true with the collections api?
 
 I'm pretty sure that if you won't have a handler named /select, then 
 you need to have default=true as an attribute on one of your other 
 handler definitions.
 
 See line 715 of the example solrconfig.xml for Solr 3.5:
 
 
 http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/
 exam
 ple/solr/conf/solrconfig.xml?view=annotate
 
 Thanks,
 Shawn
 




Solr Autosuggest - Strange issue with leading numbers in query

2014-02-11 Thread Developer
I have a strange issue with Autosuggest.

Whenever I query for a keyword along with numbers (leading) it returns the
suggestion corresponding to the alphabets (ignoring the numbers). I was
under assumption that it will return an empty result back. I am not sure
what I am doing wrong. Can someone help?

*Query:*
/autocomplete?qt=/lucidreq_type=auto_completespellcheck.maxCollations=10q=12342343243242gaspellcheck.count=10

*Result:*

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=ga
int name=numFound1/int
int name=startOffset15/int
int name=endOffset17/int
arr name=suggestion
strgalaxy/str
/arr
/lst
str name=collation12342343243242galaxy/str
/lst
/lst
/response


*My field configuration is as below:*
fieldType class=solr.TextField name=textSpell_word
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory 
enablePositionIncrements=true
ignoreCase=true words=stopwords_autosuggest.txt/
/analyzer
/fieldType

*SolrConfig.xml*

searchComponent class=solr.SpellCheckComponent name=autocomplete
lst name=spellchecker
str name=nameautocomplete/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldautocomplete_word/str
str name=storeDirautocomplete/str
str name=buildOnCommittrue/str
float name=threshold.005/float

/lst
/searchComponent
requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/autocomplete
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionaryautocomplete/str
str name=spellcheck.collatetrue/str
str name=spellcheck.count10/str
str name=spellcheck.onlyMorePopularfalse/str
/lst
arr name=components
strautocomplete/str
/arr
/requestHandler



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing question on individual field update

2014-02-11 Thread shamik
Eric,

  Thanks for your reply. I should have given a better context. I'm currently
running an incremental crawl daily on this particular source and indexing
the documents. Incremental crawl looks for any change since last crawl date
based on the document publish date. But, there's no way for me to know if a
document has been deleted. To ensure that, I ran a full crawl on a weekend,
which basically re-index the entire content. After the full index is over, I
call a purge script, which deletes any content which is more than 24 hour
old, based on the indextimestamp field. 

The issue with atomic update is that it doesn't alter the indextimstamp
field. So even if I run a full crawl with atomic updates, the timestamp will
stick to its old value. Unfortunately, I can't rely on another date field
coming from the source as they are not consistent. That translates to the
fact that I can't remove stale content.

Let me know if I'm missing something here.

- Thanks,
Shamik





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116757.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr server requirements for 100+ million documents

2014-02-11 Thread Susheel Kumar
Hi Otis,

Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1 for 
Zookeeper. Is that correct?

Thanks,
Susheel

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, January 24, 2014 5:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Hi Susheel,

Like Erick said, it's impossible to give precise recommendations, but making a 
few assumptions and combining them with experience (+ a licked finger in the 
air):
* 3 servers
* 32 GB
* 2+ CPU cores
* Linux

Assuming docs are not bigger than a few KB, that they are not being reindexed 
over and over, that you don't have a search rate higher than a few dozen QPS, 
assuming your queries are not a page long, etc. assuming best practices are 
followed, the above should be sufficient.

I hope this helps.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr  Elasticsearch 
Support * http://sematext.com/


On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar  
susheel.ku...@thedigitalgroup.net wrote:

 Hi,

 Currently we are indexing 10 million document from database (10 db 
 data
 entities)  index size is around 8 GB on windows virtual box. Indexing 
 in one shot taking 12+ hours while indexing parallel in separate cores 
  merging them together taking 4+ hours.

 We are looking to scale to 100+ million documents and looking for 
 recommendation on servers requirements on below parameters for a 
 Production environment. There can be 200+ users performing search same time.

 No of physical servers (considering solr cloud) Memory requirement 
 Processor requirement (# cores) Linux as OS oppose to windows

 Thanks in advance.
 Susheel




Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Otis Gospodnetic
Hi Susheel,

No, we wouldn't want to go with just 1 ZK. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar 
susheel.ku...@thedigitalgroup.net wrote:

 Hi Otis,

 Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1
 for Zookeeper. Is that correct?

 Thanks,
 Susheel

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Friday, January 24, 2014 5:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 Hi Susheel,

 Like Erick said, it's impossible to give precise recommendations, but
 making a few assumptions and combining them with experience (+ a licked
 finger in the air):
 * 3 servers
 * 32 GB
 * 2+ CPU cores
 * Linux

 Assuming docs are not bigger than a few KB, that they are not being
 reindexed over and over, that you don't have a search rate higher than a
 few dozen QPS, assuming your queries are not a page long, etc. assuming
 best practices are followed, the above should be sufficient.

 I hope this helps.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics Solr 
 Elasticsearch Support * http://sematext.com/


 On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar 
 susheel.ku...@thedigitalgroup.net wrote:

  Hi,
 
  Currently we are indexing 10 million document from database (10 db
  data
  entities)  index size is around 8 GB on windows virtual box. Indexing
  in one shot taking 12+ hours while indexing parallel in separate cores
   merging them together taking 4+ hours.
 
  We are looking to scale to 100+ million documents and looking for
  recommendation on servers requirements on below parameters for a
  Production environment. There can be 200+ users performing search same
 time.
 
  No of physical servers (considering solr cloud) Memory requirement
  Processor requirement (# cores) Linux as OS oppose to windows
 
  Thanks in advance.
  Susheel
 
 



RE: Solr server requirements for 100+ million documents

2014-02-11 Thread Susheel Kumar
Thanks, Otis for quick reply. So for ZK do you recommend separate servers and 
if so how many for initial Solr cloud cluster setup. 

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Tuesday, February 11, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Hi Susheel,

No, we wouldn't want to go with just 1 ZK. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr  Elasticsearch 
Support * http://sematext.com/


On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar  
susheel.ku...@thedigitalgroup.net wrote:

 Hi Otis,

 Just to confirm, the 3 servers you mean here are 2 for shards/nodes 
 and 1 for Zookeeper. Is that correct?

 Thanks,
 Susheel

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Friday, January 24, 2014 5:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 Hi Susheel,

 Like Erick said, it's impossible to give precise recommendations, but 
 making a few assumptions and combining them with experience (+ a 
 licked finger in the air):
 * 3 servers
 * 32 GB
 * 2+ CPU cores
 * Linux

 Assuming docs are not bigger than a few KB, that they are not being 
 reindexed over and over, that you don't have a search rate higher than 
 a few dozen QPS, assuming your queries are not a page long, etc. 
 assuming best practices are followed, the above should be sufficient.

 I hope this helps.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics Solr  
 Elasticsearch Support * http://sematext.com/


 On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar  
 susheel.ku...@thedigitalgroup.net wrote:

  Hi,
 
  Currently we are indexing 10 million document from database (10 db 
  data
  entities)  index size is around 8 GB on windows virtual box. 
  Indexing in one shot taking 12+ hours while indexing parallel in 
  separate cores  merging them together taking 4+ hours.
 
  We are looking to scale to 100+ million documents and looking for 
  recommendation on servers requirements on below parameters for a 
  Production environment. There can be 200+ users performing search 
  same
 time.
 
  No of physical servers (considering solr cloud) Memory requirement 
  Processor requirement (# cores) Linux as OS oppose to windows
 
  Thanks in advance.
  Susheel
 
 



Re: Indexing question on individual field update

2014-02-11 Thread shamik
Ok, I was wrong here. I can always set the indextimestamp field with current
time (NOW) for every atomic update. On a similar note, is there any
performance constraint with updates compared to add ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116772.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr server requirements for 100+ million documents

2014-02-11 Thread svante karlsson
ZK needs a quorum to keep functional so 3 servers handles one failure. 5
handles 2 node failures. If you Solr with 1 replica per shard then stick to
3 ZK. If you use 2 replicas use 5 ZK








Replica node down but zookeeper clusterstate not updated

2014-02-11 Thread Gopal Patwa
Solr = 4.6.1, attached solrcloud admin console view
Zookeeper 3.4.5  = 3 node ensemble

In my test setup, I have 3 Node SolrCloud setup with 2 shard. Today we had
power failure and all node went down.

I started 3 node zookeeper ensemble first then followed with 3 node
solrcloud, and one of replica ip address was change due to dynamic ip
allocation but zookeeper
clusterstate is not updated with new ip address and it was still holding
old ip address for that bad node.

Do I need to manually update clusterstate in zookeeper? what are my options
if this could happen in production.

Bad node:
old IP:10.249.132.35 (still exist in zookeeper)
new IP: 10.249.133.10

Log from Node1:

11:26:25,242 INFO  [STDOUT] 49170786 [Thread-2-EventThread] INFO
 org.apache.solr.common.cloud.ZkStateReader  â A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 3)
11:26:41,072 INFO  [STDOUT] 49186615 [RecoveryThread] INFO
 org.apache.solr.cloud.ZkController  â publishing
core=genre_shard1_replica1 state=recovering
11:26:41,079 INFO  [STDOUT] 49186622 [RecoveryThread] ERROR
org.apache.solr.cloud.RecoveryStrategy  â Error while trying to recover.
core=genre_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at: http://10.249.132.35:8080/solr
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:496)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:221)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:367)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)
11:26:41,079 INFO  [STDOUT] Caused by:
org.apache.http.conn.HttpHostConnectException: Connection to
http://10.249.132.35:8080 refused


11:27:14,036 INFO  [STDOUT] 49219580 [RecoveryThread] ERROR
org.apache.solr.cloud.RecoveryStrategy  â Recovery failed - trying again...
(9) core=geo_shard1_replica1
11:27:14,037 INFO  [STDOUT] 49219581 [RecoveryThread] INFO
 org.apache.solr.cloud.RecoveryStrategy  â Wait 600.0 seconds before trying
to recover again (10)
11:27:14,958 INFO  [STDOUT] 49220498 [Thread-40] INFO
 org.apache.solr.common.cloud.ZkStateReader  â Updating cloud state from
ZooKeeper...



Log from bad node with new ip address:

11:06:29,551 INFO  [STDOUT] 6234 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â Enough replicas found
to continue.
11:06:29,552 INFO  [STDOUT] 6236 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â I may be the new
leader - try and sync
11:06:29,554 INFO  [STDOUT] 6237 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.SyncStrategy  â Sync replicas to
http://10.249.132.35:8080/solr/venue_shard2_replica2/
11:06:29,555 INFO  [STDOUT] 6239 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=venue_shard2_replica2
url=http://10.249.132.35:8080/solr START replicas=[
http://10.249.132.56:8080/solr/venue_shard2_replica1/] nUpdates=100
11:06:29,556 INFO  [STDOUT] 6240 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=venue_shard2_replica2
url=http://10.249.132.35:8080/solr DONE.  We have no versions.  sync failed.
11:06:29,556 INFO  [STDOUT] 6241 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.SyncStrategy  â Leader's attempt to sync with shard
failed, moving to the next candidate
11:06:29,558 INFO  [STDOUT] 6241 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â We failed sync, but we
have no versions - we can't sync in that case - we were active before, so
become leader anyway
11:06:29,559 INFO  [STDOUT] 6243 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â I am the new leader:
http://10.249.132.35:8080/solr/venue_shard2_replica2/ shard2
11:06:29,561 INFO  [STDOUT] 6245 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.common.cloud.SolrZkClient  â makePath:
/collections/venue/leaders/shard2
11:06:29,577 INFO  [STDOUT] 6261 [Thread-2-EventThread] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=event_shard2_replica2
url=http://10.249.132.35:8080/solr  Received 18 versions from
10.249.132.56:8080/solr/event_shard2_replica1/
11:06:29,578 INFO  [STDOUT] 6263 [Thread-2-EventThread] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=event_shard2_replica2
url=http://10.249.132.35:8080/solr Requesting updates from
10.249.132.56:8080/solr/event_shard2_replica1/n=10versions=[1457764666067386368,
1456709993140060160, 1456709989863260160,
1456709986075803648, 1456709971758546944, 1456709179685208064,
1456709137524064256, 

Re: Indexing question on individual field update

2014-02-11 Thread Shawn Heisey

On 2/11/2014 2:37 PM, shamik wrote:

Eric,

   Thanks for your reply. I should have given a better context. I'm currently
running an incremental crawl daily on this particular source and indexing
the documents. Incremental crawl looks for any change since last crawl date
based on the document publish date. But, there's no way for me to know if a
document has been deleted. To ensure that, I ran a full crawl on a weekend,
which basically re-index the entire content. After the full index is over, I
call a purge script, which deletes any content which is more than 24 hour
old, based on the indextimestamp field.

The issue with atomic update is that it doesn't alter the indextimstamp
field. So even if I run a full crawl with atomic updates, the timestamp will
stick to its old value. Unfortunately, I can't rely on another date field
coming from the source as they are not consistent. That translates to the
fact that I can't remove stale content.


One possibility is this: When you send the atomic update to Solr, 
include a new value for the indextimestamp field.


Another option: You can write a custom update processor plugin for 
Solr.  When the custom code is used, it will be executed on each 
incoming document.  Depending on what it finds in the update request, it 
can make appropriate changes, like updating indextimestamp.  You can do 
pretty much anything.


http://wiki.apache.org/solr/UpdateRequestProcessor

Writing an update processor in Java typically gives the best results in 
terms of flexibility and performance, but there is also a way to use 
other programming languages:


http://wiki.apache.org/solr/ScriptUpdateProcessor

Thanks,
Shawn



Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Jason Hellman
Whether you use the same machines as Solr or separate machines is a matter 
suited to taste.

If you are the CTO, then you should make this decision.  If not, inform 
management that risk conditions are greater when you share function and control 
on a single piece of hardware.  A single failure of a replica + zookeeper node 
will be more impactful than a single failure of a replica *or* a zookeeper 
node.  Let them earn the big bucks to make the risk decision.

The good news is, zookeeper hardware can be extremely lightweight for Solr 
Cloud.  Commodity hardware should work just fine…and thus scaling to 5 nodes 
for zookeeper is not that hard at all.

Jason


On Feb 11, 2014, at 3:00 PM, svante karlsson s...@csi.se wrote:

 ZK needs a quorum to keep functional so 3 servers handles one failure. 5
 handles 2 node failures. If you Solr with 1 replica per shard then stick to
 3 ZK. If you use 2 replicas use 5 ZK
 
 
 
 
 
 



Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Shawn Heisey

On 2/11/2014 3:28 PM, Susheel Kumar wrote:

Thanks, Otis for quick reply. So for ZK do you recommend separate servers and 
if so how many for initial Solr cloud cluster setup.


In a minimal 3-server setup, all servers would run zookeeper and two of 
them would also run Solr.With this setup, you can survive the failure of 
any of those three machines, even if it dies completely.


If the third machine is only running zookeeper, two fast CPU cores and 
2GB of RAM would be plenty.  For 100 million documents, I would 
personally recommend at least 8 CPU cores on the machines running Solr, 
ideally provided by at least two separate physical CPUs.  Otis 
recommended 32GB of RAM as a starting point.  You would very likely want 
more.


One copy of my 90 million document index uses two servers to run all the 
shards.  Because I have two copies of the index, I have four servers.  
Each server has 64GB of RAM.  This is **NOT** running SolrCloud, but if 
it were, I would have zookeeper running on three of those servers.


Thanks,
Shawn



Re: FuzzyLookupFactory with exactMatchFirst not giving the exact match.

2014-02-11 Thread Hamish Campbell
I've tried the new SuggestComponent, however it doesn't work quite as
expected. It returns the full field value rather than a list of corrections
for the specific term. I can see how SuggestComponent would be excellent
for phrase suggestions and document lookups, but it doesn't seem to be
suitable for a per-word spelling suggestion. Correct me if I'm wrong.

I'm taking another look at solr.SpellCheckComponenet. I've switched on
`spellcheck.extendedResults` but the response `correctlySpelled` is always
false, regardless of other settings. It seems it's an example SOLR-4278. In
that ticket James Dyer says:

 You can tell if the user's keywords exist in the index on a term-by-term
basis by specifying spellcheck.extendedResults=true. Then look under each
lst name=ORIG_KEYWORD for int name=origFreq0/int.

This would be suit me perfectly - but `origFreq` does not appear in the
response at all. I'm looking that code but tracing down how the token
frequency is added is leading me down a deep and dark rabbit hole :). Am I
missing something basic here?


On Tue, Feb 11, 2014 at 3:59 PM, Areek Zillur areek...@gmail.com wrote:

 Dont worry about the analysis chain, I realized you are using the
 spellcheck component for suggestions. The suggestion gets returned from the
 Lucene layer, but unfortunately the Spellcheck component strips the
 suggestion out as it is mainly built for spell checking (when the query
 token == suggestion; spelling is correct, so why suggest it!). You can try
 out the SuggestComponent (SOLR-5378), it does the right thing in this
 situation.


 On Mon, Feb 10, 2014 at 9:30 PM, Areek Zillur areek...@gmail.com wrote:

  That should not be the case, Maybe the analysis-chain of 'text_spell' is
  doing something before the key hits the suggester (you want to use
  something like KeywordTokenizerFactory)? Also maybe specify the
 queryAnalyzerFieldType
  in the suggest component config? you might want to do something similar
 to
  solr-config: (
 
 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-phrasesuggest.xml
 )
  [look at suggest_analyzing component] and schema: (
 
 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test-files/solr/collection1/conf/schema-phrasesuggest.xml
 )
  [look at phrase_suggest field type].
 
 
  On Mon, Feb 10, 2014 at 8:44 PM, Hamish Campbell 
  hamish.campb...@koordinates.com wrote:
 
  Same issue with AnalyzingLookupFactory - I'll get autocomplete
 suggestions
  but not the original query.
 
 
  On Tue, Feb 11, 2014 at 1:57 PM, Areek Zillur areek...@gmail.com
 wrote:
 
   The FuzzyLookupFactory should accept all the options as that of as
   AnalyzingLookupFactory (
  
  
 
 http://lucene.apache.org/solr/4_2_1/solr-core/org/apache/solr/spelling/suggest/fst/AnalyzingLookupFactory.html
   ).
   [FuzzySuggester is a direct subclass of the AnalyzingSuggester in
  lucene].
   Have you tried the exactMatchFirst with the AnalyzingLookupFactory?
 Does
   AnalyzingLookup have the same problem with the exactMatchFirst option?
  
  
   On Mon, Feb 10, 2014 at 6:00 PM, Hamish Campbell 
   hamish.campb...@koordinates.com wrote:
  
Looking at:
   
   
   
  
 
 http://lucene.apache.org/solr/4_2_1/solr-core/org/apache/solr/spelling/suggest/fst/FuzzyLookupFactory.html
   
It seems that exactMatchFirst is not a valid option for
   FuzzyLookupFactory.
Potential workarounds?
   
   
On Mon, Feb 10, 2014 at 5:04 PM, Hamish Campbell 
hamish.campb...@koordinates.com wrote:
   
 Hi all,

 I've got a FuzzyLookupFactory spellchecker with exactMatchFirst
   enabled.
A
 query like tes will return test and testing, but a query for
   test
 will *not* return test even though it is clearly in the
  dictionary.
   Why
 would this be?

 Relevant config follows

 searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
 str name=namesuggest/str

 !-- Implementation --
 str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str

   
  
 
 name=lookupImplorg.apache.solr.spelling.suggest.fst.FuzzyLookupFactory/str

 !-- Properties --
 bool name=preserveSepfalse/bool
 bool name=exactMatchFirsttrue/bool
 str name=suggestAnalyzerFieldTypetext_spell/str
 float name=threshold0.005/float

 !--
 Do not build on each commit, bad for performance. See
 cron.
 str name=buildOnCommitfalse/str
 --

 !-- Source --
 str name=fieldsuggest/str
 /lst
 /searchComponent

 requestHandler name=/suggest class=solr.SearchHandler
 lst name=defaults
 str name=spellchecktrue/str
 str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.onlyMorePopulartrue/str
 str 

Re: FuzzyLookupFactory with exactMatchFirst not giving the exact match.

2014-02-11 Thread Hamish Campbell
Ah, I think the term frequency is only available for the Spellcheckers
rather than the Suggesters - so I tried a DirectSolrSpellChecker. This gave
me good spelling suggestions for misspelt terms, but if the term is spelled
correctly I, again, get no term information and correctlySpelled is false.
Back to square 1.


On Wed, Feb 12, 2014 at 12:37 PM, Hamish Campbell 
hamish.campb...@koordinates.com wrote:

 I've tried the new SuggestComponent, however it doesn't work quite as
 expected. It returns the full field value rather than a list of corrections
 for the specific term. I can see how SuggestComponent would be excellent
 for phrase suggestions and document lookups, but it doesn't seem to be
 suitable for a per-word spelling suggestion. Correct me if I'm wrong.

 I'm taking another look at solr.SpellCheckComponenet. I've switched on
 `spellcheck.extendedResults` but the response `correctlySpelled` is always
 false, regardless of other settings. It seems it's an example SOLR-4278. In
 that ticket James Dyer says:

  You can tell if the user's keywords exist in the index on a term-by-term
 basis by specifying spellcheck.extendedResults=true. Then look under each
 lst name=ORIG_KEYWORD for int name=origFreq0/int.

 This would be suit me perfectly - but `origFreq` does not appear in the
 response at all. I'm looking that code but tracing down how the token
 frequency is added is leading me down a deep and dark rabbit hole :). Am I
 missing something basic here?


 On Tue, Feb 11, 2014 at 3:59 PM, Areek Zillur areek...@gmail.com wrote:

 Dont worry about the analysis chain, I realized you are using the
 spellcheck component for suggestions. The suggestion gets returned from
 the
 Lucene layer, but unfortunately the Spellcheck component strips the
 suggestion out as it is mainly built for spell checking (when the query
 token == suggestion; spelling is correct, so why suggest it!). You can try
 out the SuggestComponent (SOLR-5378), it does the right thing in this
 situation.


 On Mon, Feb 10, 2014 at 9:30 PM, Areek Zillur areek...@gmail.com wrote:

  That should not be the case, Maybe the analysis-chain of 'text_spell' is
  doing something before the key hits the suggester (you want to use
  something like KeywordTokenizerFactory)? Also maybe specify the
 queryAnalyzerFieldType
  in the suggest component config? you might want to do something similar
 to
  solr-config: (
 
 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-phrasesuggest.xml
 )
  [look at suggest_analyzing component] and schema: (
 
 https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test-files/solr/collection1/conf/schema-phrasesuggest.xml
 )
  [look at phrase_suggest field type].
 
 
  On Mon, Feb 10, 2014 at 8:44 PM, Hamish Campbell 
  hamish.campb...@koordinates.com wrote:
 
  Same issue with AnalyzingLookupFactory - I'll get autocomplete
 suggestions
  but not the original query.
 
 
  On Tue, Feb 11, 2014 at 1:57 PM, Areek Zillur areek...@gmail.com
 wrote:
 
   The FuzzyLookupFactory should accept all the options as that of as
   AnalyzingLookupFactory (
  
  
 
 http://lucene.apache.org/solr/4_2_1/solr-core/org/apache/solr/spelling/suggest/fst/AnalyzingLookupFactory.html
   ).
   [FuzzySuggester is a direct subclass of the AnalyzingSuggester in
  lucene].
   Have you tried the exactMatchFirst with the AnalyzingLookupFactory?
 Does
   AnalyzingLookup have the same problem with the exactMatchFirst
 option?
  
  
   On Mon, Feb 10, 2014 at 6:00 PM, Hamish Campbell 
   hamish.campb...@koordinates.com wrote:
  
Looking at:
   
   
   
  
 
 http://lucene.apache.org/solr/4_2_1/solr-core/org/apache/solr/spelling/suggest/fst/FuzzyLookupFactory.html
   
It seems that exactMatchFirst is not a valid option for
   FuzzyLookupFactory.
Potential workarounds?
   
   
On Mon, Feb 10, 2014 at 5:04 PM, Hamish Campbell 
hamish.campb...@koordinates.com wrote:
   
 Hi all,

 I've got a FuzzyLookupFactory spellchecker with exactMatchFirst
   enabled.
A
 query like tes will return test and testing, but a query
 for
   test
 will *not* return test even though it is clearly in the
  dictionary.
   Why
 would this be?

 Relevant config follows

 searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
 str name=namesuggest/str

 !-- Implementation --
 str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str

   
  
 
 name=lookupImplorg.apache.solr.spelling.suggest.fst.FuzzyLookupFactory/str

 !-- Properties --
 bool name=preserveSepfalse/bool
 bool name=exactMatchFirsttrue/bool
 str name=suggestAnalyzerFieldTypetext_spell/str
 float name=threshold0.005/float

 !--
 Do not build on each commit, bad for performance. See
 cron.
 

Solr performance on a very huge data set

2014-02-11 Thread neerajp
Hello Dear,
I have 1000 GB of data that I want to index.
Assuming I have enough space for storing the indexes in a single machine.
*I would like to get an idea about Solr performance for searching an item
from a huge data set.
Do I need to use shards for improving the Solr search efficiency or it is OK
to search without sharding ?*

I will use SolrCloud for high availability and fault tolerance with the help
of zoo-keeper.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-on-a-very-huge-data-set-tp4116792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: USER NAME Baruch Labunski

2014-02-11 Thread Erick Erickson
Baruch:

Is that your Wiki ID? We need that. But sure, we'll be happy to add you to
the list...


On Tue, Feb 11, 2014 at 11:03 AM, Baruch bar...@rogers.com wrote:

 Hello Wiki admin,

  I would like to some value links. Can you please add me, my user name is
 Baruch Labunski


 Thank You,
 Baruch!



 On Thursday, January 16, 2014 2:12:32 PM, Baruch bar...@rogers.com
 wrote:

 Hello Wiki admin,

  I would like to some value links. Can you please add me, my user name is
 Baruch Labunski


 Thank You,

 Baruch!



Re: Lowering query time

2014-02-11 Thread Erick Erickson
So my guess is you're spending by far the largest portion of your time doing
the DB query(ies), which makes sense


On Tue, Feb 11, 2014 at 11:50 AM, Joel Cohen joel.co...@bluefly.com wrote:

 It's a custom ingestion process. It does a big DB query and then inserts
 stuff in batches. The batch size is tuneable.


 On Tue, Feb 11, 2014 at 11:23 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  Hmmm, I'm still a little puzzled BTW. 300K documents, unless they're
  huge, shouldn't be taking 100 minutes. I can index 11M documents on
  my laptop (Wikipedia dump) in 45 minutes for instance Of course
  that's a single core, not cloud and not replicas...
 
  So possibly it' on the data acquisition side? Is your Solr CPU pegged?
 
  YMMV of course.
 
  Erick
 
 
  On Tue, Feb 11, 2014 at 6:40 AM, Joel Cohen joel.co...@bluefly.com
  wrote:
 
   I'd like to thank you for lending a hand on my query time problem with
   SolrCloud. By switching to a single shard with replicas setup, I've
  reduced
   my query time to 18 msec. My full ingestion of 300k+ documents went
 down
   from 2 hours 50 minutes to 1 hour 40 minutes. There are some code
 changes
   that are going in that should help a bit as well. Big thanks to
 everyone
   that had suggestions.
  
  
   On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch 
  arafa...@gmail.com
   wrote:
  
I suspect faceting is the issue here. The actual query you have shown
seem to bring back a single document (or a single set of document for
a product):
fq=id:(320403401)
   
On the other hand, you are asking for 4 field facets:
facet.field=q_virtualCategory_ss
facet.field=q_brand_s
facet.field=q_color_s
facet.field=q_category_ss
AND 2 range facets, both clustered/grouped:
facet.range=daysSinceStart_i
facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
   
And for all facets you have asked to bring back ALL of the results:
facet.limit=-1
   
Plus, you are doing a complex sort:
sort=popularity_i desc,popularity_i desc
   
So, you are probably spending quite a bit of time counting
 (especially
in a shared setup) and then quite a bit more sending the response
back.
   
I would check the size of the result document (HTTP result) and see
how large it is. Maybe you don't need all of the stuff that's coming
back. I assume you are not actually querying Solr from the client's
machine (that is I hope it is inside your data centre close to your
web server), otherwise I would say to look at automatic content
compression as well to minimize on-wire document size.
   
Finally, if your documents have many stored fields (store=true in
schema.xml) but you only return small subsets of them during search,
you could look into using enableLazyFieldLoading flag in the
solrconfig.
   
Regards,
   Alex.
P.s. As others said, you don't seem to have too many documents.
Perhaps you want replication instead of sharding for improved
performance.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via
 GTD
book)
   
   
On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
alexey_kozhemia...@epam.com wrote:
 Btw timing for distributed requests are broken at this moment, it
doesn't combine values from requests to shards.  I'm working on a
  patch.

 https://issues.apache.org/jira/browse/SOLR-3644

 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Tuesday, February 04, 2014 22:00
 To: solr-user@lucene.apache.org
 Subject: Re: Lowering query time

 Add the debug=true parameter to some test queries and look at the
timing
 section to see which search components are taking the time.
Traditionally, highlighting for large documents was a top culprit.

 Are you returning a lot of data or field values? Sometimes reducing
  the
amount of data processed can help. Any multivalued fields with lots
 of
values?

 -- Jack Krupansky

 -Original Message-
 From: Joel Cohen
 Sent: Tuesday, February 4, 2014 1:43 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Lowering query time

 1. We are faceting. I'm not a developer so I'm not quite sure how
  we're
doing it. How can I measure?
 2. I'm not sure how we'd force this kind of document partitioning.
 I
   can
see how my shards are partitioned by looking at the clusterstate.json
   from
Zookeeper, but I don't have a clue on how to get documents into
  specific
shards.

 Would I be better off with fewer shards given the small size of my
indexes?


 On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley 
 

Re: Solr Autosuggest - Strange issue with leading numbers in query

2014-02-11 Thread Erick Erickson
Hmmm, the example you post seems correct to me, the returned
suggestion is really close to the term. What are you expecting here?

The example is inconsistent with
it returns the suggestion corresponding to the alphabets (ignoring the
numbers)

It looks like it's considering the numbers just fine, which is what makes
the returned suggestion close to the term I think.

Best,
Erick


On Tue, Feb 11, 2014 at 1:01 PM, Developer bbar...@gmail.com wrote:

 I have a strange issue with Autosuggest.

 Whenever I query for a keyword along with numbers (leading) it returns the
 suggestion corresponding to the alphabets (ignoring the numbers). I was
 under assumption that it will return an empty result back. I am not sure
 what I am doing wrong. Can someone help?

 *Query:*

 /autocomplete?qt=/lucidreq_type=auto_completespellcheck.maxCollations=10q=12342343243242gaspellcheck.count=10

 *Result:*

 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 /lst
 lst name=spellcheck
 lst name=suggestions
 lst name=ga
 int name=numFound1/int
 int name=startOffset15/int
 int name=endOffset17/int
 arr name=suggestion
 strgalaxy/str
 /arr
 /lst
 str name=collation12342343243242galaxy/str
 /lst
 /lst
 /response


 *My field configuration is as below:*
 fieldType class=solr.TextField name=textSpell_word
 positionIncrementGap=100
 analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory
 enablePositionIncrements=true
 ignoreCase=true words=stopwords_autosuggest.txt/
 /analyzer
 /fieldType

 *SolrConfig.xml*

 searchComponent class=solr.SpellCheckComponent
 name=autocomplete
 lst name=spellchecker
 str name=nameautocomplete/str
 str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldautocomplete_word/str
 str name=storeDirautocomplete/str
 str name=buildOnCommittrue/str
 float name=threshold.005/float

 /lst
 /searchComponent
 requestHandler
 class=org.apache.solr.handler.component.SearchHandler
 name=/autocomplete
 lst name=defaults
 str name=spellchecktrue/str
 str
 name=spellcheck.dictionaryautocomplete/str
 str name=spellcheck.collatetrue/str
 str name=spellcheck.count10/str
 str name=spellcheck.onlyMorePopularfalse/str
 /lst
 arr name=components
 strautocomplete/str
 /arr
 /requestHandler



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing question on individual field update

2014-02-11 Thread Erick Erickson
Update and add are basically the same thing if there's an existing document.
There will be some performance consequence since you're getting the stored
fields on the server as opposed to getting the full input from the external
source
and handing it to Solr. However, I know of at least one situation where the
atomic update rate is sky-high and it works, so I wouldn't worry about it
unless and
until I saw a problem.

Best,
Erick


On Tue, Feb 11, 2014 at 3:03 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/11/2014 2:37 PM, shamik wrote:

 Eric,

Thanks for your reply. I should have given a better context. I'm
 currently
 running an incremental crawl daily on this particular source and indexing
 the documents. Incremental crawl looks for any change since last crawl
 date
 based on the document publish date. But, there's no way for me to know if
 a
 document has been deleted. To ensure that, I ran a full crawl on a
 weekend,
 which basically re-index the entire content. After the full index is
 over, I
 call a purge script, which deletes any content which is more than 24 hour
 old, based on the indextimestamp field.

 The issue with atomic update is that it doesn't alter the indextimstamp
 field. So even if I run a full crawl with atomic updates, the timestamp
 will
 stick to its old value. Unfortunately, I can't rely on another date field
 coming from the source as they are not consistent. That translates to the
 fact that I can't remove stale content.


 One possibility is this: When you send the atomic update to Solr, include
 a new value for the indextimestamp field.

 Another option: You can write a custom update processor plugin for Solr.
  When the custom code is used, it will be executed on each incoming
 document.  Depending on what it finds in the update request, it can make
 appropriate changes, like updating indextimestamp.  You can do pretty much
 anything.

 http://wiki.apache.org/solr/UpdateRequestProcessor

 Writing an update processor in Java typically gives the best results in
 terms of flexibility and performance, but there is also a way to use other
 programming languages:

 http://wiki.apache.org/solr/ScriptUpdateProcessor

 Thanks,
 Shawn




Re: Solr performance on a very huge data set

2014-02-11 Thread Erick Erickson
Can't answer that, there are just too many variables. Here's a helpful
resource:
http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick


On Tue, Feb 11, 2014 at 5:23 PM, neerajp neeraj_star2...@yahoo.com wrote:

 Hello Dear,
 I have 1000 GB of data that I want to index.
 Assuming I have enough space for storing the indexes in a single machine.
 *I would like to get an idea about Solr performance for searching an item
 from a huge data set.
 Do I need to use shards for improving the Solr search efficiency or it is
 OK
 to search without sharding ?*

 I will use SolrCloud for high availability and fault tolerance with the
 help
 of zoo-keeper.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-performance-on-a-very-huge-data-set-tp4116792.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Need feedback: Browsing and searching solr-user list emails

2014-02-11 Thread Alexandre Rafalovitch
Hi Durgam,

You are asking a hard question. Yes, the idea looks interesting as an
experiment. Possibly even useful in some ways. And I love the fact
that you are eating your own dogfood (running Solr). And the interface
looks nice (I guess this is your hosted Nimeyo offering underneath).

Yet, I am having troubles seeing it stick around long term. Here are my reasons:
*) This oferring feels like an inverse of StackExchange. SE is a
primary source of data and they actually get most of the search
traffic from Google. This proposal has the data coming from somewhere
else and is trying to add a search on top of it.
*) Furthermore, the SE voting/participation is heavily gamified and
they spend a lot of time and manpower to keeping the balance of that
gamification vs. abuse. I think it is a lot harder to provide
incentives to vote in your approach
*) There are other dogfood-eating search websites.
http://search-lucene.com/ is one of them.
*) There are also other mailing-list navigational websites with
gateway ability to post message in. They suck, both in interface and
in monetisation around the interface. In fact, they feel like the SPAM
farms similar to those republishing Wikipedia. I am not saying this is
relevant to your effort directly, but it is an issue related to
discovery of good search website in the sea of bad ones. search-lucene
for example is discoverable because it is one of the search engines on
the Apache website. Even then, it took me (at least) very long time to
discover it.
*) In general, discoverability is a b*tch (try to multiterm this,
Solr! :-) as you need a very significant traction for people to use
your site before it becomes useful to more people. A bit of a
catch-22. Again, SE did it by having a large audience on StackOverflow
and then branching off into topics that people on SO were also
interested in. And even that was an issue (see area51 for how they do
it). You have people (who read mailing list), but are they the people
who need to search the archives? I think the mailing list is a more of
a 'flow' interface to most of the people.
*) You have Google Analytics - did you get much traction yet? I
suspect no from the lack of replies on the mailing list.

I would step back and evaluate:
*) Who specifically is a target audience? I, for example, do star some
posts on the mailing list because they are just so good that I will
want to refer to them later. But, even then, I would have no incentive
right now to do it in public. Nor would I do 3-4 steps necessary to go
from email I like to some alternative interface to find the same email
again just to vote for it. And how do I find my voted emails later?
Requiring an account (to track) is even harder to swallow.
*) Again, who specifically is a target audience? Is it beginners?
Intermediates? Advanced? What are the pain point of those different
group you are trying to solve.
*) What can you offer to the first user before the voting actually
works (bootstrap phase). Pure search? Others do that already.
*) How would people find your service (SEO, etc).
*) Why are you doing it. It may not be a lot of effort to set it up,
but to actually grow any crowd-source resource is a significant task.
What does this build towards that will make it sustainable for you.
And, I really hope it is not page ads.
*) From Nimeyo's home page, you are targeting enterprises; are you
sure the offering maps to the public resource with dynamic transient
audience the same way.

Now, if you do want to help Solr community, that would be great. I am
trying to do that in my own way and really welcome anybody try to
assist beyond their own needs. Grow the community, and so on.

Here is an example of how I thought of the above issues myself:
*) I just released the full list of UpdateRequestProcessor Factories (
http://www.solr-start.com/update-request-processor/4.6.1/ ).
*) This is information that anybody can discover for themselves, but
it takes a lot searching and clicking and getting lost. I have
discovered that problem on my own when writing my Solr book and it was
stuck with me as a problem to be solved. So, I solved it (in a very
basic way for this version) and I have more similar things on the way.
*) My target audience, just as with my book, are people trying to
skill up from the beginners to the intermediates. My goal is to reduce
the barrier of entry to the more advanced Solr knowledge.
*) My SEO (we'll see if it works) is to provide information that does
not exist anywhere else in one place and to be discoverable when
people search for the particular names of URP.
*) I also have an incentive to keep it going (version 4.7, 4.8, other
resources) because I want people to be on my mailing list for when I
do the next REALLY exciting Solr project (Github-based interactive
Solr training would be a strong hint). So, these resources are my
bootstrapping strategy as well.

Now, there is plenty of other things that can be done to assist Solr
community. Some of them would 

Re: Join Scoring

2014-02-11 Thread David Smiley (@MITRE.org)
Hi Anand.

Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
more memory efficient (particularly the worse-case memory use) to implement
the JOIN query without scoring, so that's why.  Of course, you might want it
to score and pay whatever penalty is involved.  For that you'll need to
write a Solr QueryParser that might use Lucene's join module which has
scoring variants.  I've taken this approach before.  You asked a specific
question about the purpose of JoinScorer when it doesn't actually score. 
Lucene's Query produces a Weight which in turn produces a Scorer that
is a DocIdSetIterator plus it returns a score.  So Queries have to have a
Scorer to match any document even if the score is always 1.

Solr does indeed have a lot of caching; that may be in play here when
comparing against a quick attempt at using Lucene directly.  In particular,
the matching documents are likely to end up in Solr's DocumentCache. 
Returning stored fields that come back in search results are one of the more
expensive things Lucene/Solr does.

I also think you noted that the fields on documents from the from side of
the query are not available to be returned in search results, just the to
side.  Yup; that's true.  To remedy this, you might write a Solr
SearchComponent that adds fields from the from side.  That could be tricky
to do; it would probably need to re-run the from-side query but filtered to
the matching top-N documents being returned.

~ David


anand chandak wrote
 Resending, if somebody can please respond.
 
 
 Thanks,
 
 Anand
 
 
 On 2/5/2014 6:26 PM, anand chandak wrote:
 Hi,
 
 Having a question on join score, why doesn't the solr join query return 
 the scores. Looking at the code, I see there's JoinScorer defined in 
 the  JoinQParserPlugin class ? If its not used for scoring ? where is it 
 actually used.
 
 Also, to evaluate the performance of solr join plugin vs lucene 
 joinutil, I filed same join query against same data-set and same schema 
 and in the results, I am always seeing the Qtime for Solr much lower 
 then lucenes. What is the reason behind this ?  Solr doesn't return 
 scores could that cause so much difference ?
 
 My guess is solr has very sophisticated caching mechanism and that might 
 be coming in play, is that true ? or there's difference in the way JOIN 
 happens in the 2 approach.
 
 If I understand correctly both the implementation are using 2 pass 
 approach - first all the terms from fromField and then returns all 
 documents that have matching terms in a toField
 
 If somebody can throw some light, would highly appreciate.
 
 Thanks,
 Anand





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Join Scoring

2014-02-11 Thread anand chandak

Thanks David, really helpful response.


You mentioned that if we have to add scoring support in solr then a 
possible approach would be to add a custom QueryParser, which might be 
taking Lucene's JOIN module.



Curious, if it is possible instead to enhance existing solr's 
JoinQParserPlugin and add the the scoring support in the same class ? Do 
you think its feasible and recommended ? If yes, what would it take - in 
terms of code changes, any pointers ?


Thanks,

Anand


On 2/12/2014 10:31 AM, David Smiley (@MITRE.org) wrote:

Hi Anand.

Solr's JOIN query, {!join}, constant-scores.  It's simpler and faster and
more memory efficient (particularly the worse-case memory use) to implement
the JOIN query without scoring, so that's why.  Of course, you might want it
to score and pay whatever penalty is involved.  For that you'll need to
write a Solr QueryParser that might use Lucene's join module which has
scoring variants.  I've taken this approach before.  You asked a specific
question about the purpose of JoinScorer when it doesn't actually score.
Lucene's Query produces a Weight which in turn produces a Scorer that
is a DocIdSetIterator plus it returns a score.  So Queries have to have a
Scorer to match any document even if the score is always 1.

Solr does indeed have a lot of caching; that may be in play here when
comparing against a quick attempt at using Lucene directly.  In particular,
the matching documents are likely to end up in Solr's DocumentCache.
Returning stored fields that come back in search results are one of the more
expensive things Lucene/Solr does.

I also think you noted that the fields on documents from the from side of
the query are not available to be returned in search results, just the to
side.  Yup; that's true.  To remedy this, you might write a Solr
SearchComponent that adds fields from the from side.  That could be tricky
to do; it would probably need to re-run the from-side query but filtered to
the matching top-N documents being returned.

~ David


anand chandak wrote

Resending, if somebody can please respond.


Thanks,

Anand


On 2/5/2014 6:26 PM, anand chandak wrote:
Hi,

Having a question on join score, why doesn't the solr join query return
the scores. Looking at the code, I see there's JoinScorer defined in
the  JoinQParserPlugin class ? If its not used for scoring ? where is it
actually used.

Also, to evaluate the performance of solr join plugin vs lucene
joinutil, I filed same join query against same data-set and same schema
and in the results, I am always seeing the Qtime for Solr much lower
then lucenes. What is the reason behind this ?  Solr doesn't return
scores could that cause so much difference ?

My guess is solr has very sophisticated caching mechanism and that might
be coming in play, is that true ? or there's difference in the way JOIN
happens in the 2 approach.

If I understand correctly both the implementation are using 2 pass
approach - first all the terms from fromField and then returns all
documents that have matching terms in a toField

If somebody can throw some light, would highly appreciate.

Thanks,
Anand




-
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Join-Scoring-tp4115539p4116818.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Spatial Score by overlap area

2014-02-11 Thread Smiley, David W.
Hi,
BBoxStrategy is still only in “trunk” (not the 4x branch).  And
furthermore… the Solr portion, a FieldType, is over in
Spatial-Solr-Sandbox —
https://github.com/ryantxu/spatial-solr-sandbox/blob/master/LSE/src/main/ja
va/org/apache/solr/spatial/pending/BBoxFieldType.java  It should be quite
easy to port to 4x and put independently into a JAR file plug-in to Solr 4.

It’s lacking better tests, and until your question I haven’t seen interest
from users.  Ryan McKinley ported it from GeoServer.

~ David

On 2/10/14, 12:53 AM, geoport tb.rost...@gmail.com wrote:

Hi,
i am using solr 4.6 and i´ve indexed bounding boxes. Now, i want to test
the
area overlap sorting link
http://de.slideshare.net/lucenerevolution/lucene-solr-4-spatial-extended-
deep-dive 
(slide 23), have some of you an example for me?Thanks for helping me.






--
View this message in context:
http://lucene.472066.n3.nabble.com/Spatial-Score-by-overlap-area-tp4116439
.html
Sent from the Solr - User mailing list archive at Nabble.com.



Unable to index mysql table

2014-02-11 Thread Tarun Sharma
Hi
I downloaded solr and without any changes in directory structure i just
followed the solr wiki and tried to import mysql table but unable to do...
Actualy Im using the directory as is in example folder but copied the
contrib jar files and lib tags here and there where required..

Please help in indexing my mysql table...

NOTE: Im using remote linux server by doing ssh and am able to start the
solr server.

---
Regards
*Tarun Sharma*


Re: Unable to index mysql table

2014-02-11 Thread Alexandre Rafalovitch
What's unable to do actually translates to? Are you having troubles
writing a particular config file? Are you getting an error message?
Are you getting only some of the data in?

Tell us exactly where you are stuck. Better, google first for exactly
what you are stuck with, maybe it's already been answered.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Feb 12, 2014 at 12:52 PM, Tarun Sharma
tarunsharma1...@gmail.com wrote:
 Hi
 I downloaded solr and without any changes in directory structure i just
 followed the solr wiki and tried to import mysql table but unable to do...
 Actualy Im using the directory as is in example folder but copied the
 contrib jar files and lib tags here and there where required..

 Please help in indexing my mysql table...

 NOTE: Im using remote linux server by doing ssh and am able to start the
 solr server.

 ---
 Regards
 *Tarun Sharma*


Re: Indexing question on individual field update

2014-02-11 Thread shamik
Thanks Eric and Shawn, appreciate your help.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116831.html
Sent from the Solr - User mailing list archive at Nabble.com.