How to do a reverse distance search?

2009-07-10 Thread Development Team
Hi everybody,
 Let's say we have 10,000 traveling sales-people spread throughout the
country. Each of them has has their own territory, and most of the
territories overlap (eg. 100 sales-people in a particular city alone). Each
of them also has a maximum distance they can travel. Some can travel
country-wide, others don't have a car and are limited to a 10mi radius.
 Given that we have a client at a particular location, how do we
construct a query in Solr that finds all the sales-people who can reach that
client?

 We think we have a solution for this, but I want to know what you
think. And, in SQL this is relatively easy:  select * from salespeople
where calc_distance(CLIENT_LAT, CLIENT_LONG, lat, long)  maxTravelDist But
a problem is that calc_distance() is fairly expensive. If it was our
client that specified the distance, it would be easy to include it as part
of the search criteria in the Solr query, but unfortunately it's each
individual sales-person that specifies a distance.

Sincerely,

 Daryl.


Re: Suggestions needed: Lots of updates for tiny changes

2009-07-06 Thread Development Team
Hi Otis,
 Thanks for your reply and for giving it some thought.
 Actually we have considered using something that lives outside of the
main index... We've looked into using the ExternalFileField, but abandoned
that when it became clear we'd have to use a function to use it, and that
limited how we could use the field in our searches.
 For another more-real-time data problem we're having, we've considered
writing a search handler and search component to handle it as a
filter-query. This is equivalent to the data structure outside of the main
index that you have proposed. The problem with it is that getting it to be
*part of the index* is difficult.
 Well... any more ideas would be appreciated. But thanks for your help
so far.

- Daryl.


On Fri, Jul 3, 2009 at 9:34 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:


 I don't have a very specific suggestion, but I wonder if you could have a
 data structure that lives outside of the main index and keeps only these
 dates.  Presumably this smaller data structure would be simpler/faster to
 update, and you'd just have to remain in sync with the main index
 (document-document mapping).  I think ParallelReader in Lucene is a similar
 approach, as it Solr's ExternalFileField.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Development Team dev.and...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Friday, July 3, 2009 4:46:37 PM
  Subject: Suggestions needed: Lots of updates for tiny changes
 
  Hi everybody,
   Let's say I had an index with 10M large-ish documents, and as people
  logged into a website and viewed them the last viewed date was updated
 to
  the current time. We index a document's last-viewed-date because we allow
  users to a) search on this last-viewed-date alongside all other
 searchable
  criteria, and b) we can order results of any search by the
 last-viewed-date.
   The problem is that in a given 5-minute period, we may have many
  thousands of updated documents (due to this simple last-viewed-date). We
  have a task that looks for changed documents, loads the full documents,
 and
  then feeds them into Solr to update the index, but unfortunately reading
  these changed documents and continually feeding them to Solr is
 generating *
  far* more load on our system (both Solr and the database) than any of the
  searches. In a given day, *we may have more updates to documents than we
  have total documents indexed*. (Databases don't handle this well either,
 the
  contention on rows for updates slows the database down significantly.)
   How should we approach this problem? It seems like such a waste of
  resources to be doing so much work in applications/database/solr only for
  last-viewed-dates.
 
   Solutions we've looked at include:
   1) Update only partial document. --Apparently this isn't supported
 in
  Solr yet (we're using nightly Solr 1.4 builds currently).
   2) Use near-real-time updates. --Not supported yet. Also, the
  freshness of the data isn't as much as concern as the sheer volume of
  changes that we have to make here. For example, we could update Solr
  less-fequently, but then we'd just have many more documents to update.
 The
  data only has to be, say, fresh to within 30 minutes.
   3) Use a separate index for the last-viewed-date. --This won't work
  because we need to search on the last-viewed-date alongside other
 criteria,
  and we use it as scoring criteria for all our searches.
 
   Any suggestions?
 
  Sincerely,
 
   Daryl.




Suggestions needed: Lots of updates for tiny changes

2009-07-03 Thread Development Team
Hi everybody,
 Let's say I had an index with 10M large-ish documents, and as people
logged into a website and viewed them the last viewed date was updated to
the current time. We index a document's last-viewed-date because we allow
users to a) search on this last-viewed-date alongside all other searchable
criteria, and b) we can order results of any search by the last-viewed-date.
 The problem is that in a given 5-minute period, we may have many
thousands of updated documents (due to this simple last-viewed-date). We
have a task that looks for changed documents, loads the full documents, and
then feeds them into Solr to update the index, but unfortunately reading
these changed documents and continually feeding them to Solr is generating *
far* more load on our system (both Solr and the database) than any of the
searches. In a given day, *we may have more updates to documents than we
have total documents indexed*. (Databases don't handle this well either, the
contention on rows for updates slows the database down significantly.)
 How should we approach this problem? It seems like such a waste of
resources to be doing so much work in applications/database/solr only for
last-viewed-dates.

 Solutions we've looked at include:
 1) Update only partial document. --Apparently this isn't supported in
Solr yet (we're using nightly Solr 1.4 builds currently).
 2) Use near-real-time updates. --Not supported yet. Also, the
freshness of the data isn't as much as concern as the sheer volume of
changes that we have to make here. For example, we could update Solr
less-fequently, but then we'd just have many more documents to update. The
data only has to be, say, fresh to within 30 minutes.
 3) Use a separate index for the last-viewed-date. --This won't work
because we need to search on the last-viewed-date alongside other criteria,
and we use it as scoring criteria for all our searches.

 Any suggestions?

Sincerely,

 Daryl.


Re: Solr Jetty confusion

2009-06-19 Thread Development Team
Hi Brett,
 Well, I'm running Solr in Jetty with JBoss, so I used the JBoss method
of specifying properties (properties-service.xml). However, you can supply
the solr-home to the command-line when you start Jetty by using a parameter
like, -Dsolr.solr.home=C:\solr. You can do it like how they do it for
Tomcat: http://wiki.apache.org/solr/SolrTomcat?highlight=(solr.home)

 You mention your code is not compiling... the code should be able to
compile whether or not you can actually start solr with the right solr-home.
It should also compile regardless of how what container you deploy Solr
into. What exactly are you trying to do besides getting Solr to start in
Jetty?

- Daryl.



On Thu, Jun 18, 2009 at 9:58 PM, pof melbournebeerba...@gmail.com wrote:



 Development Team wrote:
 
  To specify the
  solr-home I use a Java system property (instead of the JNDI way) since I
  already have other necessary system properties for my apps.
 

 Could you please give me a concrete example of how you did this? There is
 no
 example code or commandline examples to be found.

 Cheers, Brett.

 --
 View this message in context:
 http://www.nabble.com/Solr-Jetty-confusion-tp24087264p24104378.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Does Solr 1.4 really work nicely on Jboss 4?

2009-06-18 Thread Development Team
Hi Giovanni,

 Solr 1.4 does work fine in JBoss (all of the features, including all of
the admin pages). For example, I am running it in JBoss 4.0.5.GA on JDK
1.5.0_18 without problems. I am also using Jetty instead of Tomcat, however
instructions for getting it to work in JBoss with Tomcat can be found here:
 http://wiki.apache.org/solr/SolrJBoss  It should work fine on JBoss 4.0.1.

- Daryl.


On Thu, Jun 18, 2009 at 8:57 AM, Giovanni De Stefano 
giovanni.destef...@gmail.com wrote:

 Hello all,

 I have a simple question :-)

 In my project it is mandatory to use Jboss 4.0.1 SP3 and Java 1.5.0_06/08.
 The software relies on Solr 1.4.

 Now, I am aware that some JSP Admin pages will not be displayed due to some
 Java5/6 dependency but this is not a problem because rewriting some of the
 JSPs it is possible to have everything up and running.

 The real question is: is anybody aware of any feature that might not work
 when deploying the solr based software in Jboss 4?

 I look forward to hearing your experience.

 Cheers,
 Giovanni



Re: Solr Jetty confusion

2009-06-18 Thread Development Team
Hey,
 So... I'm assuming your problem is that you're having trouble deploying
Solr in Jetty? Or is your problem that it's deploying just fine but your
code throws an exception when you try to run it?
 I am running Solr in Jetty, and I just copied the war into the webapps
directory and it worked. It was accessible under /solr, and it was
accessible under the port that Jetty has as its HTTP listener (which is
probably 8080 by default, but probably won't be 8983). To specify the
solr-home I use a Java system property (instead of the JNDI way) since I
already have other necessary system properties for my apps. So if your
problem turns out to be with the JNDI, sorry I won't be of much help.
 Hope that helps...

- Daryl.


On Thu, Jun 18, 2009 at 2:44 AM, pof melbournebeerba...@gmail.com wrote:


 Hi, I am currently trying to write a Jetty embedded java app that
 implements
 SOLR and uses SOLRJ by excepting posts telling it to do a batch index, or a
 deletion or what have you. At this point I am completely lost trying to
 follow http://wiki.apache.org/solr/SolrJetty . In my constructor I am
 doing
 the following call:

 Server server = new Server();
 XmlConfiguration configuration = new XmlConfiguration(new
 FileInputStream(solrjetty.xml));

 My xml has two calls, an addConnector to configure the port etc. and the
 addWebApplication as specified on the solr wiki. When running the app I get
 this:

 Exception in thread main java.lang.IllegalStateException: No Method:
 Call
 name=addWebApplicationArg/solr/*/ArgArg/webapps/solr.war/ArgSet
 name=extractWARtrue/SetSet

 name=defaultsDescriptororg/mortbay/jetty/servlet/webdefault.xml/SetCall
 name=addEnvEntryArg/solr/home/ArgArg
 type=String/solr/home/Arg/Call/Call on class
 org.mortbay.jetty.Server

 Can anyone point me in the right direction? Thanks.
 --
 View this message in context:
 http://www.nabble.com/Solr-Jetty-confusion-tp24087264p24087264.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Problem getting Solr statistics

2009-06-17 Thread Development Team
So for all those wondering what the problem was:
It turns out I can't just initialize my own CoreContainer; that just gives
me a *new* set of cores, and since those are not the cores being used by the
SolrDispatchFilter, they're never accessed and thus the stats remain the
same (such as having 2 queries performed on the core) throughout the life
of the server.

What I had to do was extend the SolrDispatchFilter to gain access to its
protected CoreContainer and use that one.

(Should Solr expose the core to those who are servicing requests that are
not HTTP-based? The SolrDispatchFilter puts the core into the request, but
not all things that need access to the core are servlets or filters. For
example, I'm using an MBean whose actions are called through SNMP.)

- Daryl.


On Tue, Jun 16, 2009 at 2:42 PM, Development Team dev.and...@gmail.comwrote:

 Hi all,
  I am stumped trying to get statistics from the Solr server. It seems
 that every time I get the correct SolrInfoMBean, when I look up the proper
 value (by name) in the NamedList, I get the exact same number back each
 time. For example, upon start-up the server reports that 2 queries have
 been performed, and any time I pull the value out of the MBean after that it
 says 2 even though the stats.jsp reports an increasing number of queries
 over time. What am I doing wrong?
  Here is my sample code:

 public class SolrUtil {

   protected static final CoreContainer coreContainer;
   protected static final String DEFAULT_CORE_NAME = ;

   static {
 CoreContainer.Initializer initializer = new
 CoreContainer.Initializer();
 try {
   coreContainer = initializer.initialize();
 }
 catch (Exception e) {
   throw new ExceptionInInitializerError(Can't initialize core
 container:  + e.getMessage());
 }
 initialize();
   }

   private static SolrCore getCore() {
 return getCore(DEFAULT_CORE_NAME);
   }

   private static SolrCore getCore(String name) {
 try {
   return coreContainer.getCore(name);
 }
 catch (Exception e) {
   e.printStackTrace();
 }
 return null;
   }

   public static String getSolrInfoMBeanValue(SolrInfoMBean.Category
 category, String entryName, String statName) {
 MapString, SolrInfoMBean registry = getCore().getInfoRegistry();
 for (Map.EntryString, SolrInfoMBean entry : registry.entrySet()) {
   String key = entry.getKey();
   SolrInfoMBean solrInfoMBean = entry.getValue();
   if ((solrInfoMBean.getCategory() != category) ||
   (!entryName.equals(key.trim( {
 continue;
   }
   NamedList? nl = solrInfoMBean.getStatistics();
   if ((nl != null)  (nl.size()  0)) {
 for (int i = 0; i  nl.size(); i++) {
   if (nl.getName(i).equals(statName)) {
 return nl.getVal(i).toString();
   }
 }
   }
 }
 return null;
   }

   [...I have other methods, that also get the value as a long, etc]

 }



  This code is modeled after the SolrDispatchFilter.java, _info.jsp and
 stats.jsp.
  I'd appreciate any help. (And yes, my core is named .)

 Sincerely,

  Daryl.



Problem getting Solr statistics

2009-06-16 Thread Development Team
Hi all,
 I am stumped trying to get statistics from the Solr server. It seems
that every time I get the correct SolrInfoMBean, when I look up the proper
value (by name) in the NamedList, I get the exact same number back each
time. For example, upon start-up the server reports that 2 queries have
been performed, and any time I pull the value out of the MBean after that it
says 2 even though the stats.jsp reports an increasing number of queries
over time. What am I doing wrong?
 Here is my sample code:

public class SolrUtil {

  protected static final CoreContainer coreContainer;
  protected static final String DEFAULT_CORE_NAME = ;

  static {
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
try {
  coreContainer = initializer.initialize();
}
catch (Exception e) {
  throw new ExceptionInInitializerError(Can't initialize core
container:  + e.getMessage());
}
initialize();
  }

  private static SolrCore getCore() {
return getCore(DEFAULT_CORE_NAME);
  }

  private static SolrCore getCore(String name) {
try {
  return coreContainer.getCore(name);
}
catch (Exception e) {
  e.printStackTrace();
}
return null;
  }

  public static String getSolrInfoMBeanValue(SolrInfoMBean.Category
category, String entryName, String statName) {
MapString, SolrInfoMBean registry = getCore().getInfoRegistry();
for (Map.EntryString, SolrInfoMBean entry : registry.entrySet()) {
  String key = entry.getKey();
  SolrInfoMBean solrInfoMBean = entry.getValue();
  if ((solrInfoMBean.getCategory() != category) ||
  (!entryName.equals(key.trim( {
continue;
  }
  NamedList? nl = solrInfoMBean.getStatistics();
  if ((nl != null)  (nl.size()  0)) {
for (int i = 0; i  nl.size(); i++) {
  if (nl.getName(i).equals(statName)) {
return nl.getVal(i).toString();
  }
}
  }
}
return null;
  }

  [...I have other methods, that also get the value as a long, etc]

}



 This code is modeled after the SolrDispatchFilter.java, _info.jsp and
stats.jsp.
 I'd appreciate any help. (And yes, my core is named .)

Sincerely,

 Daryl.


Re: Solr query performance issue

2009-05-26 Thread Development Team
Yes, those terms are important in calculating the relevancy scores so they
are not in the filter queries.  I was hoping if I can cache everything about
a field, any combinations on the field values will be read from cache. Then
it does not matter if I query for field1:(02 04 05), or field1:(01 02) or
field1:03 the response time is equally quick.  Is there anyway to achieve
that?
Yeah, the range queries are also a bottleneck too, I will give the TrieRange
fields a try.  Thanks for you advice.

Best Regards,
Shi Quan He

On Tue, May 26, 2009 at 3:55 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, May 26, 2009 at 3:42 PM, Larry He shiqua...@gmail.com wrote:
  We have about 100 different fields and 1 million documents we indexed
 with
  Solr.  Many of the fields are multi-valued, and some are numbers (for
 range
  search).  We are expecting to perform solr queries contains over 30 terms
  and often the response time is well over a second.  I found that the
 caches
  in Solr such as QueryResultCache and FilterCache does not help us much in
  this case as most of the queries have combinations of terms that are
  unlikely to repeat.  An example of our query would look like:
 
  field1:(02 04 05) field2:(01 02 03) field2:(01 02 03) ...
 
  My question is how can we improve performance of these queries?

 filters are independently cached... but they are currently only AND
 filters, so you could only split it up like so:

 fq=field1:(02 04 05)fq=field2:(01 02 03)fq=field2:(01 02 03)
 But that won't help unless any of the individual fq params are
 repeated across different queries.

 Range search can also be sped up a lot via the use of the new
 TrieRange fields, or via the frange (function range query)
 capabilities in Solr 1.4 it's not clear if the range queries or
 the term queries are your current bottleneck.

 If the range queries aren't your bottleneck and separate filters don't
 work, then a query type could be developed that would help your
 situation by caching matches on term queries. Are relevancy scores
 important for the clauses like field1:(02 04 05), or do you sort by
 some other criteria?

 -Yonik
 http://www.lucidimagination.com



How to manage real-time (presence) data in a large index?

2009-04-14 Thread Development Team
Hi everybody,
   I have a relatively large index (it will eventually contain ~4M
documents and be about 3G in size, I think) that indexes user data,
settings, and the like. The documents represent a community of users
whereupon a subset of them may be online at any time. Also, we want to
score our search results across searches that span the whole index by the
online (i.e. presence) status.
   Right now the list of online members is kept in a database table,
however we very often need to search on these users. The problem is, we're
using Solr for our searches and we don't know how to approach setting up a
search system for a large amount of highly volatile data.
   How do people typically go about this? Do they do one of the
following:
 1) Set up a second core and keep only index the online
members in there? (Then we could not score normal search results by online
status.)
 2) Index the online status in our regular solr index and not
worry about it? (If it's fast to update docs in a large index, then why not
maintain real-time data in the main index?)
 3) Just use a database for the presence data and forget about
using Solr for the presence-related searches?
   Is there anything in Solr that I should be looking into to help with
this problem? I'd appreciate any help.

Sincerely,

   Daryl.


Sort by distance from location?

2009-04-14 Thread Development Team
Hi everybody,
 My index has latitude/longitude values for locations. I am required to
do a search based on a set of criteria, and order the results based on how
far the lat/long location is to the current user's location. Currently we
are emulating such a search by adding criteria of ever-widening bounding
boxes, and the more of those boxes match the document, the higher the score
and thus the closer ones appear at the start of the results. The query looks
something like this (newlines between each search term):

+criteraOne:1
+criteriaTwo:true
+latitude:[-90.0 TO 90.0] +longitude:[-180.0 TO 180.0]
(latitude:[40.52 TO 40.81] longitude:[-74.17 TO -73.79])
(latitude:[40.30 TO 41.02] longitude:[-74.45 TO -73.51])
(latitude:[39.94 TO 41.38] longitude:[-74.93 TO -73.03])
[[...etc...about 10 times...]]

 Naturally this is quite slow (query is approximately 6x slower than
normal), and... I can't help but feel that there's a more elegant way of
sorting by distance.
 Does anybody know how to do this or have any suggestions?

Sincerely,

 Daryl.


Re: Sort by distance from location?

2009-04-14 Thread Development Team
Ah, good question:  Yes, we've tried it... and it was slower. To give some
avg times:
Regular non-distance Searches: 100ms
Our expanding-criteria solution:  600ms
LocalSolr:  800ms

(We also had problems with LocalSolr in that the results didn't seem to be
cached in Solr upon doing a search. So each page of results meant another
800ms.)

- Daryl.


On Tue, Apr 14, 2009 at 5:34 PM, Smiley, David W. dsmi...@mitre.org wrote:

  Have you tried LocalSolr?
 http://www.gissearch.com/localsolr
 (I haven’t but looks cool)



How to create a query directly (bypassing the query-parser)?

2009-03-31 Thread Development Team
Hi everybody, after reading the documentation on the Solr site, I have the
following newbie-ish question:

On the Lucene query parser syntax page (
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html) linked to from
the Solr query syntax page, they mention:
If you are programmatically generating a query string and then parsing it
with the query parser then you should seriously consider building your
queries directly with the query API. In other words, the query parser is
designed for human-entered text, not for program-generated text.

What do they mean by using the API? If I use SolrJ to construct a
SolrQuery, doesn't that get processed by the query parser? How do I bypass
the query parser to set up a query directly?

Especially for token-values (values that fit a defined set, such as Enum
values), it seems silly for me to continually be appending, +tokenField:(1,
2, 3) to my query. Why should I write code to construct the query string,
then send this to the parser to parse the string into an object? Can't I set
these query parameters directly? If so, how?

- Daryl.


Birthday (that's day not date) search query?

2009-03-30 Thread Development Team
Hi everyone,
 I have an index that stores birth-dates, and I would like to search for
anybody whose birth-date is within X days of a certain month/day. For
example, I'd like to know if anybody's birthday is coming up within a
certain number of days, regardless of what year they were born. How would I
do this using Solr?
 As a follow-up, assuming this query is executed very often, should I
maybe be indexing something other than the birth-date? Such as just the
month-day pair? What is the most efficient way to do such a query?

Sincerely,

 Daryl.