bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Hello,

I'm trying to perform some queries on a location field on the index.
The requirement is to search listings inside a pair of coordinates, like a
bounding box.

Taking a look on the wiki, I noticed that there is the option to use the
bbox query but in does not create a retangular shaped box to find the docs.
Also since the LatLon field is searchable by range, it's possible to use a
range query to find.

I'm trying to search inside a pair of coordinates (the top left corner and
bottom right corner) and no result is found.

The query i'm trying is something like:
http://localhost:8984/solr/select?wt=jsonindent=truefl=local,*q=*:*fq=local:[-23.6674,-46.7314TO
-23.6705,-46.7274]

Is there any other way to find docs inside a rectangular bounding box?

Thanks
Alexandre


Re: bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Erick,

My location field is defined like in the example project:
field name=local type=location indexed=true stored=true/

Also, there is the dynamic that stores the splitted coordinates:
dynamicField name=*_coordinate type=double indexed=true
stored=false multiValued=false/

The response XML with debugQuery=on is looking like this:
response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
result name=response numFound=0 start=0/
lst name=debug
str name=rawquerystring*:*/str
str name=querystring*:*/str
str name=parsedqueryMatchAllDocsQuery(*:*)/str
str name=parsedquery_toString*:*/str
lst name=explain/
str name=QParserLuceneQParser/str
arr name=filter_queries
strlocal:[-23.6674,-46.7314 TO -23.6705,-46.7274]/str
/arr
arr name=parsed_filter_queries
str
+local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314 TO
-46.7274]
/str
/arr
lst name=timing
double name=time1.0/double
lst name=prepare
double name=time0.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
lst name=process
double name=time1.0/double
lst name=org.apache.solr.handler.component.QueryComponent
double name=time1.0/double
/lst
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response

I tried to get some docs that contains the coordinates and then created a
retangle around that doc to see it is returned between these ranges.
Don't know if this is the best way to test it, but it's quite easy.

Best,
Alexandre

On Thu, Mar 29, 2012 at 2:57 PM, Erick Erickson erickerick...@gmail.comwrote:

 What are your results? Can you show us the field definition for local
 and the results of adding debugQuery=on?

 Because this should work as far as I can tell.

 Best
 Erick

 On Thu, Mar 29, 2012 at 11:04 AM, Alexandre Rocco alel...@gmail.com
 wrote:
  Hello,
 
  I'm trying to perform some queries on a location field on the index.
  The requirement is to search listings inside a pair of coordinates, like
 a
  bounding box.
 
  Taking a look on the wiki, I noticed that there is the option to use the
  bbox query but in does not create a retangular shaped box to find the
 docs.
  Also since the LatLon field is searchable by range, it's possible to use
 a
  range query to find.
 
  I'm trying to search inside a pair of coordinates (the top left corner
 and
  bottom right corner) and no result is found.
 
  The query i'm trying is something like:
 
 http://localhost:8984/solr/select?wt=jsonindent=truefl=local,*q=*:*fq=local:[-23.6674,-46.7314TO
  -23.6705,-46.7274]
 
  Is there any other way to find docs inside a rectangular bounding box?
 
  Thanks
  Alexandre



Re: bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Erick,

Just checked on the separate fields and everything looks fine.
One thing that I'm not completely sure is if this query I tried to perform
is correct.

One sample document looks like this:
doc
str name=id200/str
str name=local-23.6696784,-46.7290193/str
double name=local_0_coordinate-23.6696784/double
double name=local_1_coordinate-46.7290193/double
/doc

So, to find for this document I tried to create a virtual rectangle that
would be queried using the range query I described:
http://localhost:8984/solr/select?q=*:*fq=local:[-23.6677,-46.7315 TO
-23.6709,-46.7261]

You see that in the first coordinate I used a smaller value (got it from
map) that is on the top left corner of the area of the doc. The other
coordinate is on the bottom right corner, and it's bigger than the doc
local field.

When I split the query in 2 parts, the first part
(local_1_coordinate:[-46.7315 TO -46.7261]) returns results but the other
part (local_0_coordinate:[-23.6709 TO -23.6677]) doesn't match any docs.

I am guessing that my query is wrong. The typical use case is to take the
bounds of part of an map, that is represented by these top left and bottom
right coordinates and find the docs inside this area. Does this range query
accomplish this kind of scenario?

Any pointers are appreciated.

Best,
Alexandre

On Thu, Mar 29, 2012 at 3:54 PM, Erick Erickson erickerick...@gmail.comwrote:

 This all looks fine, so the next question is whether or not your
 documents have the value you think.

 +local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314 TO
 -46.7274]
 is the actual translated filter.

 So I'd check the actual documents in the index to see if you have a single
 document with local_0 and local_1 that fits the above. You should be able
 to
 use the TermsComponent: http://wiki.apache.org/solr/TermsComponent
 to look. Or switch to stored=true and look at search results for
 documents you think should match, just to see the raw value Who knows?
 It could be something as silly as you have your lat/lon backwards somehow,
 I've
 spent _days_ having problems like that G...

 Best
 Erick

 On Thu, Mar 29, 2012 at 2:34 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  Erick,
 
  My location field is defined like in the example project:
  field name=local type=location indexed=true stored=true/
 
  Also, there is the dynamic that stores the splitted coordinates:
  dynamicField name=*_coordinate type=double indexed=true
  stored=false multiValued=false/
 
  The response XML with debugQuery=on is looking like this:
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime1/int
  /lst
  result name=response numFound=0 start=0/
  lst name=debug
  str name=rawquerystring*:*/str
  str name=querystring*:*/str
  str name=parsedqueryMatchAllDocsQuery(*:*)/str
  str name=parsedquery_toString*:*/str
  lst name=explain/
  str name=QParserLuceneQParser/str
  arr name=filter_queries
  strlocal:[-23.6674,-46.7314 TO -23.6705,-46.7274]/str
  /arr
  arr name=parsed_filter_queries
  str
  +local_0_coordinate:[-23.6674 TO -23.6705] +local_1_coordinate:[-46.7314
 TO
  -46.7274]
  /str
  /arr
  lst name=timing
  double name=time1.0/double
  lst name=prepare
  double name=time0.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double
  /lst
  /lst
  lst name=process
  double name=time1.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time1.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double
  /lst
  /lst
  /lst
  /lst
  /response
 
  I tried to get some docs that contains the coordinates and then created a
  retangle around that doc to see it is returned between these ranges.
  Don't know if this is the best way to test it, but it's quite easy.
 
  Best,
  Alexandre
 
  On Thu, Mar 29, 2012 at 2:57 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  What are your results? Can you show us the field definition for local
  and the results of adding debugQuery=on?
 
  Because this should work as far as I can tell

Re: bbox query and range queries

2012-03-29 Thread Alexandre Rocco
Yonik,

Thanks for the heads-up. That one worked.

Just trying to wrap around how it would work on a real case. To test this
one I just got the coordinates from Google Maps and searched within the
pair of coordinates as I got them. Should I always check which is the lower
and upper to assemble the query?
I know that this one is off-topic, just curious.

Thanks
Alexandre

On Thu, Mar 29, 2012 at 7:26 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Mar 29, 2012 at 6:20 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  http://localhost:8984/solr/select?q=*:*fq=local:[-23.6677,-46.7315 TO
  -23.6709,-46.7261]

 Range queries always need to be [lower_bound TO upper_bound]
 Try
 http://localhost:8984/solr/select?q=*:*fq=local:[-23.6709,-46.7315 TO
 -23.6677,-46.7261]

 -Yonik
 lucenerevolution.com - Lucene/Solr Open Source Search Conference.
 Boston May 7-10



Re: Slave index size growing fast

2012-03-26 Thread Alexandre Rocco
Erick,

I haven't changed the maxCommitsToKeep yet.
We stopped the slave that had issues and removed the data dir as you
pointed and afer starting it, everything started working as normal.
I guess that at some point someone commited on the slave or even copied the
master files over and made this mess. Will check on the internal docs to
prevent this from happening again.

Thanks for explaining the whole concept, will be useful to understand the
whole process.

Best,
Alexandre

On Fri, Mar 23, 2012 at 4:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 Alexandre:

 Have you changed anything like maxCommitsToKeep on your slave?
 And do you have more than one slave? If you do, have you considered
 just blowing away the entire .../data directory on the slave and letting
 it re-start from scratch? I'd take the slave out of service for the
 duration of this operation, or do it when you are OK with some number of
 requests going to an empty index

 Because having an index.timestamp directory indicates that sometime
 someone forced the slave to get out of sync, possibly as you say by
 doing a commit. Or sending docs to it to be indexed or some such. Starting
 the slave over should fix that if it's the root of your problem.

 Note a curious thing about the timestamp. When you start indexing, the
 index version is a timestamp. However, from that point on when the index
 changes, the version number is just incremented (not made the current
 time). This is to avoid problems with masters and slaves having different
 times. But a consequence of that is if your slave somehow gets an index
 that's newer, the replication process does the best it can to not delete
 indexes that are out of sync with the master and saves them away. This
 might be what you're seeing.

 I'm grasping at straws a bit here, but this seems possible.

 Best
 Erick

 On Fri, Mar 23, 2012 at 1:16 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  Tomás,
 
  The 300+GB size is only inside the index.20110926152410 dir. Inside there
  are a lot of files.
  I am almost conviced that something is messed up like someone commited on
  this slave machine.
 
  Thanks
 
  2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com
 
  Alexandre, additionally to what Erick said, you may want to check in the
  slave if what's 300+GB is the data directory or the
 index.timestamp
  directory.
 
  On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   not really, unless perhaps you're issuing commits or optimizes
   on the _slave_ (which you should NOT do).
  
   Replication happens based on the version of the index on the master.
   True, it starts out as a timestamp, but then successive versions
   just have that number incremented. The version number
   in the index on the slave is compared against the one on the master,
   but the actual time (on the slave or master) is irrelevant. This is
   explicitly to avoid problems with time synching across
   machines/timezones/whataver
  
   It would be instructive to look at the admin/info page to see what
   the index version is on the master and slave.
  
   But, if you optimize or commit (I think) on the _slave_, you might
   change the timestamp and mess things up (although I'm reaching
   here, I don't know this for certain).
  
   What's the  index look like on the slave as compared to the master?
   Are there just a bunch of files on the slave? Or a bunch of
 directories?
  
   Instead of re-indexing on the master, you could try to bring down the
   slave, blow away the entire index and start it back up. Since this is
 a
   production system, I'd only try this if I had more than one slave.
  Although
   you could bring up a new slave and attach it to the master and see
   what happens there. You wouldn't affect production if you didn't point
   incoming requests at it...
  
   Best
   Erick
  
   On Fri, Mar 23, 2012 at 11:03 AM, Alexandre Rocco alel...@gmail.com
   wrote:
Erick,
   
We're using Solr 3.3 on Linux (CentOS 5.6).
The /data dir on master is actually 1.2G.
   
I haven't tried to recreate the index yet. Since it's a production
environment,
I guess that I can stop replication and indexing and then recreate
 the
master index to see if it makes any difference.
   
Also just noticed another thread here named Simple Slave
 Replication
Question that tells that it could be a problem if I'm seeing an
/data/index with an timestamp on the slave node.
Is this info relevant to this issue?
   
Thanks,
Alexandre
   
On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson 
   erickerick...@gmail.comwrote:
   
What version of Solr and what operating system?
   
But regardless, this shouldn't be happening. Indexes can
temporarily double in size, but any extras should be
cleaned up relatively soon.
   
On the master, what's the total size of the solr home/data
  directory?
I'm a little suspicious

Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Hello,

We have a Solr index that has an average of 1.19 GB in size.
After configuring the replication, the slave machine is growing the index
size expoentially.
Currently we have an slave with 323.44 GB in size.
Is there anything that could cause this behavior?
The current replication config is below.

Master:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=master
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
str name=backupAfterstartup/str
str name=confFiles
elevate.xml,protwords.txt,schema.xml,spellings.txt,stopwords.txt,synonyms.txt
/str
/lst
/requestHandler

Slave:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=slave
str name=masterUrlhttp://master:8984/solr/Index/replication/str
/lst
/requestHandler

Any pointers will be useful.

Thanks,
Alexandre


Re: Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Erick,

We're using Solr 3.3 on Linux (CentOS 5.6).
The /data dir on master is actually 1.2G.

I haven't tried to recreate the index yet. Since it's a production
environment,
I guess that I can stop replication and indexing and then recreate the
master index to see if it makes any difference.

Also just noticed another thread here named Simple Slave Replication
Question that tells that it could be a problem if I'm seeing an
/data/index with an timestamp on the slave node.
Is this info relevant to this issue?

Thanks,
Alexandre

On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson erickerick...@gmail.comwrote:

 What version of Solr and what operating system?

 But regardless, this shouldn't be happening. Indexes can
 temporarily double in size, but any extras should be
 cleaned up relatively soon.

 On the master, what's the total size of the solr home/data directory?
 I'm a little suspicious of the backupAfter on your master, but I
 don't think that's the root of your problem

 Are you recreating the index on the master (by deleting the
 index directory and starting over)?

 This is unusual, and I suspect it's something odd in your configuration,
 but I confess I'm at a loss as to what.

 Best
 Erick

 On Fri, Mar 23, 2012 at 10:28 AM, Alexandre Rocco alel...@gmail.com
 wrote:
  Hello,
 
  We have a Solr index that has an average of 1.19 GB in size.
  After configuring the replication, the slave machine is growing the index
  size expoentially.
  Currently we have an slave with 323.44 GB in size.
  Is there anything that could cause this behavior?
  The current replication config is below.
 
  Master:
  requestHandler name=/replication class=solr.ReplicationHandler
  lst name=master
  str name=replicateAftercommit/str
  str name=replicateAfterstartup/str
  str name=backupAfterstartup/str
  str name=confFiles
 
 elevate.xml,protwords.txt,schema.xml,spellings.txt,stopwords.txt,synonyms.txt
  /str
  /lst
  /requestHandler
 
  Slave:
  requestHandler name=/replication class=solr.ReplicationHandler
  lst name=slave
  str name=masterUrlhttp://master:8984/solr/Index/replication/str
  /lst
  /requestHandler
 
  Any pointers will be useful.
 
  Thanks,
  Alexandre



Re: Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Erick,

The master /data dir contains only an index dir with a bunch of files.
In the slave, the /data dir contains an index.20110926152410 dir with a lot
more files than the master. That is quite strange for me.

I guess that the config is right, since we have another slave that is
running fine with the same config.
The best bet would be clean up this messed slave and try to sync it again
and see what happens.

Thanks

On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson erickerick...@gmail.comwrote:

 not really, unless perhaps you're issuing commits or optimizes
 on the _slave_ (which you should NOT do).

 Replication happens based on the version of the index on the master.
 True, it starts out as a timestamp, but then successive versions
 just have that number incremented. The version number
 in the index on the slave is compared against the one on the master,
 but the actual time (on the slave or master) is irrelevant. This is
 explicitly to avoid problems with time synching across
 machines/timezones/whataver

 It would be instructive to look at the admin/info page to see what
 the index version is on the master and slave.

 But, if you optimize or commit (I think) on the _slave_, you might
 change the timestamp and mess things up (although I'm reaching
 here, I don't know this for certain).

 What's the  index look like on the slave as compared to the master?
 Are there just a bunch of files on the slave? Or a bunch of directories?

 Instead of re-indexing on the master, you could try to bring down the
 slave, blow away the entire index and start it back up. Since this is a
 production system, I'd only try this if I had more than one slave. Although
 you could bring up a new slave and attach it to the master and see
 what happens there. You wouldn't affect production if you didn't point
 incoming requests at it...

 Best
 Erick

 On Fri, Mar 23, 2012 at 11:03 AM, Alexandre Rocco alel...@gmail.com
 wrote:
  Erick,
 
  We're using Solr 3.3 on Linux (CentOS 5.6).
  The /data dir on master is actually 1.2G.
 
  I haven't tried to recreate the index yet. Since it's a production
  environment,
  I guess that I can stop replication and indexing and then recreate the
  master index to see if it makes any difference.
 
  Also just noticed another thread here named Simple Slave Replication
  Question that tells that it could be a problem if I'm seeing an
  /data/index with an timestamp on the slave node.
  Is this info relevant to this issue?
 
  Thanks,
  Alexandre
 
  On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  What version of Solr and what operating system?
 
  But regardless, this shouldn't be happening. Indexes can
  temporarily double in size, but any extras should be
  cleaned up relatively soon.
 
  On the master, what's the total size of the solr home/data directory?
  I'm a little suspicious of the backupAfter on your master, but I
  don't think that's the root of your problem
 
  Are you recreating the index on the master (by deleting the
  index directory and starting over)?
 
  This is unusual, and I suspect it's something odd in your configuration,
  but I confess I'm at a loss as to what.
 
  Best
  Erick
 
  On Fri, Mar 23, 2012 at 10:28 AM, Alexandre Rocco alel...@gmail.com
  wrote:
   Hello,
  
   We have a Solr index that has an average of 1.19 GB in size.
   After configuring the replication, the slave machine is growing the
 index
   size expoentially.
   Currently we have an slave with 323.44 GB in size.
   Is there anything that could cause this behavior?
   The current replication config is below.
  
   Master:
   requestHandler name=/replication class=solr.ReplicationHandler
   lst name=master
   str name=replicateAftercommit/str
   str name=replicateAfterstartup/str
   str name=backupAfterstartup/str
   str name=confFiles
  
 
 elevate.xml,protwords.txt,schema.xml,spellings.txt,stopwords.txt,synonyms.txt
   /str
   /lst
   /requestHandler
  
   Slave:
   requestHandler name=/replication class=solr.ReplicationHandler
   lst name=slave
   str name=masterUrlhttp://master:8984/solr/Index/replication/str
   /lst
   /requestHandler
  
   Any pointers will be useful.
  
   Thanks,
   Alexandre
 



Re: Slave index size growing fast

2012-03-23 Thread Alexandre Rocco
Tomás,

The 300+GB size is only inside the index.20110926152410 dir. Inside there
are a lot of files.
I am almost conviced that something is messed up like someone commited on
this slave machine.

Thanks

2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Alexandre, additionally to what Erick said, you may want to check in the
 slave if what's 300+GB is the data directory or the index.timestamp
 directory.

 On Fri, Mar 23, 2012 at 12:25 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  not really, unless perhaps you're issuing commits or optimizes
  on the _slave_ (which you should NOT do).
 
  Replication happens based on the version of the index on the master.
  True, it starts out as a timestamp, but then successive versions
  just have that number incremented. The version number
  in the index on the slave is compared against the one on the master,
  but the actual time (on the slave or master) is irrelevant. This is
  explicitly to avoid problems with time synching across
  machines/timezones/whataver
 
  It would be instructive to look at the admin/info page to see what
  the index version is on the master and slave.
 
  But, if you optimize or commit (I think) on the _slave_, you might
  change the timestamp and mess things up (although I'm reaching
  here, I don't know this for certain).
 
  What's the  index look like on the slave as compared to the master?
  Are there just a bunch of files on the slave? Or a bunch of directories?
 
  Instead of re-indexing on the master, you could try to bring down the
  slave, blow away the entire index and start it back up. Since this is a
  production system, I'd only try this if I had more than one slave.
 Although
  you could bring up a new slave and attach it to the master and see
  what happens there. You wouldn't affect production if you didn't point
  incoming requests at it...
 
  Best
  Erick
 
  On Fri, Mar 23, 2012 at 11:03 AM, Alexandre Rocco alel...@gmail.com
  wrote:
   Erick,
  
   We're using Solr 3.3 on Linux (CentOS 5.6).
   The /data dir on master is actually 1.2G.
  
   I haven't tried to recreate the index yet. Since it's a production
   environment,
   I guess that I can stop replication and indexing and then recreate the
   master index to see if it makes any difference.
  
   Also just noticed another thread here named Simple Slave Replication
   Question that tells that it could be a problem if I'm seeing an
   /data/index with an timestamp on the slave node.
   Is this info relevant to this issue?
  
   Thanks,
   Alexandre
  
   On Fri, Mar 23, 2012 at 11:48 AM, Erick Erickson 
  erickerick...@gmail.comwrote:
  
   What version of Solr and what operating system?
  
   But regardless, this shouldn't be happening. Indexes can
   temporarily double in size, but any extras should be
   cleaned up relatively soon.
  
   On the master, what's the total size of the solr home/data
 directory?
   I'm a little suspicious of the backupAfter on your master, but I
   don't think that's the root of your problem
  
   Are you recreating the index on the master (by deleting the
   index directory and starting over)?
  
   This is unusual, and I suspect it's something odd in your
 configuration,
   but I confess I'm at a loss as to what.
  
   Best
   Erick
  
   On Fri, Mar 23, 2012 at 10:28 AM, Alexandre Rocco alel...@gmail.com
   wrote:
Hello,
   
We have a Solr index that has an average of 1.19 GB in size.
After configuring the replication, the slave machine is growing the
  index
size expoentially.
Currently we have an slave with 323.44 GB in size.
Is there anything that could cause this behavior?
The current replication config is below.
   
Master:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=master
str name=replicateAftercommit/str
str name=replicateAfterstartup/str
str name=backupAfterstartup/str
str name=confFiles
   
  
 
 elevate.xml,protwords.txt,schema.xml,spellings.txt,stopwords.txt,synonyms.txt
/str
/lst
/requestHandler
   
Slave:
requestHandler name=/replication class=solr.ReplicationHandler
lst name=slave
str name=masterUrlhttp://master:8984/solr/Index/replication
 /str
/lst
/requestHandler
   
Any pointers will be useful.
   
Thanks,
Alexandre
  
 



Re: Relevancy and random sorting

2012-01-12 Thread Alexandre Rocco
Erick,

This document already has a field that indicates the source (site).
The issue we are trying to solve is when we list all documents without any
specific criteria. Since we bring the most recent ones and the ones that
contains images, we end up having a lot of listings from a single site,
since the documents are indexed in batches from the same site. At some
point we have several documents from the same site in the same date/time
and having images. I'm trying to give some random aspect to this search so
other documents can also appear in between that big dataset from the same
source.
Does the grouping help to achieve this?

Alexandre

On Thu, Jan 12, 2012 at 12:31 AM, Erick Erickson erickerick...@gmail.comwrote:

 Alexandre:

 Have you thought about grouping? If you can analyze the incoming
 documents and include a field such that similar documents map
 to the same value, than group on that value you'll get output that
 isn't dominated by repeated copies of the similar documents. It
 depends, though, on being able to do a suitable mapping.

 In your case, could the mapping just be the site from which you
 got the data?

 Best
 Erick

 On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  Erick,
 
  Probably I really written something silly. You are right on either
 sorting
  by field or ranking.
  I just need to change the ranking to shift things around as you said.
 
  To clarify the use case:
  We have a listing aggregator that gets product listings from a lot of
  different sites and since they are added in batches, sometimes you see a
  lot of pages from the same source (site). We are working on some changes
 to
  shift things around and reduce this blocking effect, so we can present
  mixed sources on the result pages.
 
  I guess I will start with the document random field and later try to
  develop a custom plugin to make things better.
 
  Thanks for the pointers.
 
  Regards,
  Alexandre
 
  On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  I really don't understand what this means:
  random sorting for the records but also preserving the ranking
 
  Either you're sorting on rank or you're not. If you mean you're
  trying to shift things around just a little bit, *mostly* respecting
  relevance then I guess you can do what you're thinking.
 
  You could create your own function query to do the boosting, see:
  http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser
 
  which would keep you from having to re-index your data to get
  a different randomness.
 
  You could also consider external file fields, but I think your
  own function query would be cleaner. I don't think math.random
  is a supported function OOB
 
  Best
  Erick
 
 
  On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco alel...@gmail.com
  wrote:
   Hello all,
  
   Recently i've been trying to tweak some aspects of relevancy in one
  listing
   project.
   I need to give a higher score to newer documents and also boost the
   document based on a boolean field that indicates the listing has
  pictures.
   On top of that, in some situations we need a random sorting for the
  records
   but also preserving the ranking.
  
   I tried to combine some techniques described in the Solr Relevancy FAQ
   wiki, but when I add the random sorting, the ranking gets messy (as
   expected).
  
   This works well:
  
 
 http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score
  
   This does not work, gives a random order on what is already ranked
  
 
 http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc
  
   The only way I see is to create another field on the schema
 containing a
   random value and use it to boost the document the same way that was
 tone
  on
   the boolean field.
   Anyone tried something like this before and knows some way to get it
   working?
  
   Thanks,
   Alexandre
 



Re: Relevancy and random sorting

2012-01-12 Thread Alexandre Rocco
Michael,

We are using the random sorting in combination with date and other fields
but I am trying to change this to affect the ranking instead of sorting
directly.
That way we can also use other useful tweaks on the rank itself.

Alexandre

On Thu, Jan 12, 2012 at 11:46 AM, Michael Kuhlmann k...@solarier.de wrote:

 Does the random sort function help you here?

 http://lucene.apache.org/solr/**api/org/apache/solr/schema/**
 RandomSortField.htmlhttp://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html

 However, you will get some very old listings then, if it's okay for you.

 -Kuli

 Am 12.01.2012 14:38, schrieb Alexandre Rocco:

  Erick,

 This document already has a field that indicates the source (site).
 The issue we are trying to solve is when we list all documents without any
 specific criteria. Since we bring the most recent ones and the ones that
 contains images, we end up having a lot of listings from a single site,
 since the documents are indexed in batches from the same site. At some
 point we have several documents from the same site in the same date/time
 and having images. I'm trying to give some random aspect to this search so
 other documents can also appear in between that big dataset from the same
 source.
 Does the grouping help to achieve this?

 Alexandre

 On Thu, Jan 12, 2012 at 12:31 AM, Erick Ericksonerickerickson@gmail.**
 com erickerick...@gmail.comwrote:

  Alexandre:

 Have you thought about grouping? If you can analyze the incoming
 documents and include a field such that similar documents map
 to the same value, than group on that value you'll get output that
 isn't dominated by repeated copies of the similar documents. It
 depends, though, on being able to do a suitable mapping.

 In your case, could the mapping just be the site from which you
 got the data?

 Best
 Erick

 On Wed, Jan 11, 2012 at 1:58 PM, Alexandre Roccoalel...@gmail.com
 wrote:

 Erick,

 Probably I really written something silly. You are right on either

 sorting

 by field or ranking.
 I just need to change the ranking to shift things around as you said.

 To clarify the use case:
 We have a listing aggregator that gets product listings from a lot of
 different sites and since they are added in batches, sometimes you see a
 lot of pages from the same source (site). We are working on some changes

 to

 shift things around and reduce this blocking effect, so we can present
 mixed sources on the result pages.

 I guess I will start with the document random field and later try to
 develop a custom plugin to make things better.

 Thanks for the pointers.

 Regards,
 Alexandre

 On Wed, Jan 11, 2012 at 1:58 PM, Erick Ericksonerickerickson@gmail.**
 com erickerick...@gmail.com
 wrote:

  I really don't understand what this means:
 random sorting for the records but also preserving the ranking

 Either you're sorting on rank or you're not. If you mean you're
 trying to shift things around just a little bit, *mostly* respecting
 relevance then I guess you can do what you're thinking.

 You could create your own function query to do the boosting, see:
 http://wiki.apache.org/solr/**SolrPlugins#ValueSourceParserhttp://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

 which would keep you from having to re-index your data to get
 a different randomness.

 You could also consider external file fields, but I think your
 own function query would be cleaner. I don't think math.random
 is a supported function OOB

 Best
 Erick


 On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Roccoalel...@gmail.com
 wrote:

 Hello all,

 Recently i've been trying to tweak some aspects of relevancy in one

 listing

 project.
 I need to give a higher score to newer documents and also boost the
 document based on a boolean field that indicates the listing has

 pictures.

 On top of that, in some situations we need a random sorting for the

 records

 but also preserving the ranking.

 I tried to combine some techniques described in the Solr Relevancy FAQ
 wiki, but when I add the random sorting, the ranking gets messy (as
 expected).

 This works well:


  http://localhost:18979/solr/**select/?start=0rows=15q={!**
 boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
 active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
 haspicture%22fl=*,score


 This does not work, gives a random order on what is already ranked


  http://localhost:18979/solr/**select/?start=0rows=15q={!**
 boost%20b=recip(ms(NOW/HOUR,**date_updated),3.16e-11,1,1)}**
 active%3a%22true%22+AND+**featured%3a%22false%22+_val_:%**
 haspicture%22fl=*,scoresort=**random_1+desc


 The only way I see is to create another field on the schema

 containing a

 random value and use it to boost the document the same way that was

 tone

 on

 the boolean field.
 Anyone tried something like this before and knows some way to get it
 working?

 Thanks,
 Alexandre








Relevancy and random sorting

2012-01-11 Thread Alexandre Rocco
Hello all,

Recently i've been trying to tweak some aspects of relevancy in one listing
project.
I need to give a higher score to newer documents and also boost the
document based on a boolean field that indicates the listing has pictures.
On top of that, in some situations we need a random sorting for the records
but also preserving the ranking.

I tried to combine some techniques described in the Solr Relevancy FAQ
wiki, but when I add the random sorting, the ranking gets messy (as
expected).

This works well:
http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score

This does not work, gives a random order on what is already ranked
http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc

The only way I see is to create another field on the schema containing a
random value and use it to boost the document the same way that was tone on
the boolean field.
Anyone tried something like this before and knows some way to get it
working?

Thanks,
Alexandre


Re: Relevancy and random sorting

2012-01-11 Thread Alexandre Rocco
Erick,

Probably I really written something silly. You are right on either sorting
by field or ranking.
I just need to change the ranking to shift things around as you said.

To clarify the use case:
We have a listing aggregator that gets product listings from a lot of
different sites and since they are added in batches, sometimes you see a
lot of pages from the same source (site). We are working on some changes to
shift things around and reduce this blocking effect, so we can present
mixed sources on the result pages.

I guess I will start with the document random field and later try to
develop a custom plugin to make things better.

Thanks for the pointers.

Regards,
Alexandre

On Wed, Jan 11, 2012 at 1:58 PM, Erick Erickson erickerick...@gmail.comwrote:

 I really don't understand what this means:
 random sorting for the records but also preserving the ranking

 Either you're sorting on rank or you're not. If you mean you're
 trying to shift things around just a little bit, *mostly* respecting
 relevance then I guess you can do what you're thinking.

 You could create your own function query to do the boosting, see:
 http://wiki.apache.org/solr/SolrPlugins#ValueSourceParser

 which would keep you from having to re-index your data to get
 a different randomness.

 You could also consider external file fields, but I think your
 own function query would be cleaner. I don't think math.random
 is a supported function OOB

 Best
 Erick


 On Wed, Jan 11, 2012 at 8:29 AM, Alexandre Rocco alel...@gmail.com
 wrote:
  Hello all,
 
  Recently i've been trying to tweak some aspects of relevancy in one
 listing
  project.
  I need to give a higher score to newer documents and also boost the
  document based on a boolean field that indicates the listing has
 pictures.
  On top of that, in some situations we need a random sorting for the
 records
  but also preserving the ranking.
 
  I tried to combine some techniques described in the Solr Relevancy FAQ
  wiki, but when I add the random sorting, the ranking gets messy (as
  expected).
 
  This works well:
 
 http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,score
 
  This does not work, gives a random order on what is already ranked
 
 http://localhost:18979/solr/select/?start=0rows=15q={!boost%20b=recip(ms(NOW/HOUR,date_updated),3.16e-11,1,1)}active%3a%22true%22+AND+featured%3a%22false%22+_val_:%haspicture%22fl=*,scoresort=random_1+desc
 
  The only way I see is to create another field on the schema containing a
  random value and use it to boost the document the same way that was tone
 on
  the boolean field.
  Anyone tried something like this before and knows some way to get it
  working?
 
  Thanks,
  Alexandre



Re: DIH import and postImportDeleteQuery

2011-05-25 Thread Alexandre Rocco
Hi Ephraim,

Thank you so much for the input.
I was able to find your thread on the archives and got your solution to
work.

In fact, when using $deleteDocById and $skipDoc it worked like a charm. This
feature is very useful, it's a shame it's not properly documented.
The only downside is the one you mentioned that the stats are not updated,
so if I update 13 documents and delete 2, DIH would tell me that only 13
documents were processed. This is bad in my case because I check the end
result to generate an error e-mail if needed.

You also mentioned that if the query contains only deletion records, a
commit would not be automatically executed and it would be necessary to
commit manually.

How can I commit manually via DIH? I was not able to find any references on
the documentation.

Thanks!
Alexandre

On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote:

 Search the list for my post DIH - deleting documents, high performance
 (delta) imports, and passing parameters which shows my solution a
 similar problem.

 Ephraim Ofir

 -Original Message-
 From: Alexandre Rocco [mailto:alel...@gmail.com]
 Sent: Tuesday, May 24, 2011 11:24 PM
 To: solr-user@lucene.apache.org
 Subject: DIH import and postImportDeleteQuery

 Guys,

 I am facing a situation in one of our projects that I need to perform a
 cleanup to remove some documents after we perform an update via DIH.
 The big issue right now comes from the fact that when we call the DIH
 with
 clean=false, the postImportDeleteQuery is not executed.

 My setup is currently arranged like this:
 - A SQL Server stored procedure that receives a parameter (specified in
 the
 URL) and returns the records to be indexed
 - The procedure is able to return all the records (for a full-import) or
 only the updated records (for a delta-import)
 - This procedure returns valid and deleted records, from this point
 comes
 the need to run a postImportDeleteQuery to remove the deleted ones.

 Everything works fine when I run a full-import, I am running always with
 clean=true, and then the whole index is rebuilt.
 When I need to do an incremental update, the records are updated
 correctly,
 but the command to delete the other records is not executed.

 I've tried several combinations, with different results:
 - Running full-import with clean=false: the records are updated but the
 ones
 that needs to be deleted stays on the index
 - Running delta-import with clean=false: the records are updated but the
 ones that needs to be deleted stays on the index
 - Running delta-import with clean=true: all records are deleted from the
 index and then only the records returned by the procedure are on the
 index,
 except the deleted ones.

 I don't see any way to achieve my goal, without changing the process
 that I
 do to obtain the data.
 Since this is a very complex stored procedure, with tons of joins and
 custom
 processing, I am trying everything to avoid messing with it.

 See below a copy of my data-config.xml file. I made it simpler omitting
 all
 the fields, since it's out of scope of the issue:
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
 dataSource type=JdbcDataSource
 driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
 password;responseBuffering=adaptive;

 /
 document
 entity name=entity_one
 pk=entityid
 transformer=RegexTransformer
 query=EXEC some_stored_procedure ${dataimporter.request.someid}
 preImportDeleteQuery=status:1 postImportDeleteQuery=status:1
 
 field column=field1 name=field1 splitBy=; /
 field column=field2 name=field2 splitBy=; /
 field column=field3 name=field3 splitBy=; /
 /entity

 entity name=entity_two
 pk=entityid
 transformer=RegexTransformer
 query=EXEC someother_stored_procedure
 ${dataimporter.request.someotherid}
 preImportDeleteQuery=status:1 postImportDeleteQuery=status:1
 
 field column=field1 name=field1 /
 field column=field2 name=field2 /
 field column=field3 name=field2 /
 /entity
 /document
 /dataConfig

 Any ideas or pointers that might help on this one?

 Many thanks,
 Alexandre



Re: DIH import and postImportDeleteQuery

2011-05-25 Thread Alexandre Rocco
Hi James,

Thanks for the heads up!
I am currently on version 1.4.1, so I can apply this patch and see if it
works.
Just need to assess if it's best to apply the patch or to check on the
backend system to see if only delete requests were generated and then do not
call DIH.

Previously, I found another open issue, created from Ephraim:
https://issues.apache.org/jira/browse/SOLR-2104

It's the same issue, but it hasn't had any updates yet.

Regards,
Alexandre

On Wed, May 25, 2011 at 3:17 PM, Dyer, James james.d...@ingrambook.comwrote:

 The failure to commit bug with $deleteDocById can be fixed by applying
 patch SOLR-2492.  This patch also partially fixes the no updated stats bug
 in that it increments 1 for every call to $deleteDocById and
 $deleteDocByQuery.  Note that this might result in inaccurate counts if the
 id given with $deleteDocById doesn't exist or is duplicated.  Obviously this
 is not a complete fix for stats using $deleteDocByQuery as this command
 would normally be used to delete 1 doc at a time.

 The patch is for Trunk but it might work with 3.1 also.  If not, it likely
 only needs minor tweaking.

 The jira ticket is here:  https://issues.apache.org/jira/browse/SOLR-2492

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Alexandre Rocco [mailto:alel...@gmail.com]
 Sent: Wednesday, May 25, 2011 12:54 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DIH import and postImportDeleteQuery

 Hi Ephraim,

 Thank you so much for the input.
 I was able to find your thread on the archives and got your solution to
 work.

 In fact, when using $deleteDocById and $skipDoc it worked like a charm.
 This
 feature is very useful, it's a shame it's not properly documented.
 The only downside is the one you mentioned that the stats are not updated,
 so if I update 13 documents and delete 2, DIH would tell me that only 13
 documents were processed. This is bad in my case because I check the end
 result to generate an error e-mail if needed.

 You also mentioned that if the query contains only deletion records, a
 commit would not be automatically executed and it would be necessary to
 commit manually.

 How can I commit manually via DIH? I was not able to find any references on
 the documentation.

 Thanks!
 Alexandre

 On Wed, May 25, 2011 at 5:14 AM, Ephraim Ofir ephra...@icq.com wrote:

  Search the list for my post DIH - deleting documents, high performance
  (delta) imports, and passing parameters which shows my solution a
  similar problem.
 
  Ephraim Ofir
 
  -Original Message-
  From: Alexandre Rocco [mailto:alel...@gmail.com]
  Sent: Tuesday, May 24, 2011 11:24 PM
  To: solr-user@lucene.apache.org
  Subject: DIH import and postImportDeleteQuery
 
  Guys,
 
  I am facing a situation in one of our projects that I need to perform a
  cleanup to remove some documents after we perform an update via DIH.
  The big issue right now comes from the fact that when we call the DIH
  with
  clean=false, the postImportDeleteQuery is not executed.
 
  My setup is currently arranged like this:
  - A SQL Server stored procedure that receives a parameter (specified in
  the
  URL) and returns the records to be indexed
  - The procedure is able to return all the records (for a full-import) or
  only the updated records (for a delta-import)
  - This procedure returns valid and deleted records, from this point
  comes
  the need to run a postImportDeleteQuery to remove the deleted ones.
 
  Everything works fine when I run a full-import, I am running always with
  clean=true, and then the whole index is rebuilt.
  When I need to do an incremental update, the records are updated
  correctly,
  but the command to delete the other records is not executed.
 
  I've tried several combinations, with different results:
  - Running full-import with clean=false: the records are updated but the
  ones
  that needs to be deleted stays on the index
  - Running delta-import with clean=false: the records are updated but the
  ones that needs to be deleted stays on the index
  - Running delta-import with clean=true: all records are deleted from the
  index and then only the records returned by the procedure are on the
  index,
  except the deleted ones.
 
  I don't see any way to achieve my goal, without changing the process
  that I
  do to obtain the data.
  Since this is a very complex stored procedure, with tons of joins and
  custom
  processing, I am trying everything to avoid messing with it.
 
  See below a copy of my data-config.xml file. I made it simpler omitting
  all
  the fields, since it's out of scope of the issue:
  ?xml version=1.0 encoding=UTF-8 ?
  dataConfig
  dataSource type=JdbcDataSource
  driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
  url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=
  password;responseBuffering=adaptive;
 
  /
  document
  entity name=entity_one
  pk=entityid
  transformer=RegexTransformer

DIH import and postImportDeleteQuery

2011-05-24 Thread Alexandre Rocco
Guys,

I am facing a situation in one of our projects that I need to perform a
cleanup to remove some documents after we perform an update via DIH.
The big issue right now comes from the fact that when we call the DIH with
clean=false, the postImportDeleteQuery is not executed.

My setup is currently arranged like this:
- A SQL Server stored procedure that receives a parameter (specified in the
URL) and returns the records to be indexed
- The procedure is able to return all the records (for a full-import) or
only the updated records (for a delta-import)
- This procedure returns valid and deleted records, from this point comes
the need to run a postImportDeleteQuery to remove the deleted ones.

Everything works fine when I run a full-import, I am running always with
clean=true, and then the whole index is rebuilt.
When I need to do an incremental update, the records are updated correctly,
but the command to delete the other records is not executed.

I've tried several combinations, with different results:
- Running full-import with clean=false: the records are updated but the ones
that needs to be deleted stays on the index
- Running delta-import with clean=false: the records are updated but the
ones that needs to be deleted stays on the index
- Running delta-import with clean=true: all records are deleted from the
index and then only the records returned by the procedure are on the index,
except the deleted ones.

I don't see any way to achieve my goal, without changing the process that I
do to obtain the data.
Since this is a very complex stored procedure, with tons of joins and custom
processing, I am trying everything to avoid messing with it.

See below a copy of my data-config.xml file. I made it simpler omitting all
the fields, since it's out of scope of the issue:
?xml version=1.0 encoding=UTF-8 ?
dataConfig
dataSource type=JdbcDataSource
driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://myserver;databaseName=mydb;user=username;password=password;responseBuffering=adaptive;

/
document
entity name=entity_one
pk=entityid
transformer=RegexTransformer
query=EXEC some_stored_procedure ${dataimporter.request.someid}
preImportDeleteQuery=status:1 postImportDeleteQuery=status:1

field column=field1 name=field1 splitBy=; /
field column=field2 name=field2 splitBy=; /
field column=field3 name=field3 splitBy=; /
/entity

entity name=entity_two
pk=entityid
transformer=RegexTransformer
query=EXEC someother_stored_procedure ${dataimporter.request.someotherid}
preImportDeleteQuery=status:1 postImportDeleteQuery=status:1

field column=field1 name=field1 /
field column=field2 name=field2 /
field column=field3 name=field2 /
/entity
/document
/dataConfig

Any ideas or pointers that might help on this one?

Many thanks,
Alexandre


Re: Distances in spatial search (Solr 4.0)

2011-03-01 Thread Alexandre Rocco
Hi Bill,

I was using a different approach to sort by the distance with the dist()
function, since geodist() is not documented on the wiki (
http://wiki.apache.org/solr/FunctionQuery)

Tried something like:
sort=dist(2, 45.15,-93.85, lat, lng) asc

I made some tests with geodist() function as you pointed and got different
results.
Is it safe to assume that geodist() is the correct way of doing it?

Also, can you clear up how can I see the distance using the _Val_ as you
told?

Thanks!
Alexandre

On Tue, Mar 1, 2011 at 12:03 AM, Bill Bell billnb...@gmail.com wrote:

 Use sort with geodist() to sort by distance.

 Getting the distance returned us documented on the wiki if you are not
 using score. see reference to _Val_

 Bill Bell
 Sent from mobile


 On Feb 28, 2011, at 7:54 AM, Alexandre Rocco alel...@gmail.com wrote:

  Hi guys,
 
  We are implementing a separate index on our website, that will be
 dedicated
  to spatial search.
  I've downloaded a build of Solr 4.0 to try the spatial features and got
 the
  geodist working really fast.
 
  We now have 2 other features that will be needed on this project:
  1. Returning the distance from the reference point to the search hit (in
  kilometers)
  2. Sorting by the distance.
 
  On item 2, the wiki doc points that a distance function can be used but I
  was not able to find good info on how to accomplish it.
  Also, returning the distance (item 1) is noted as currently being in
  development and there is some workaround to get it.
 
  Anyone had experience with the spatial feature and could help with some
  pointers on how to achieve it?
 
  Thanks,
  Alexandre



Distances in spatial search (Solr 4.0)

2011-02-28 Thread Alexandre Rocco
Hi guys,

We are implementing a separate index on our website, that will be dedicated
to spatial search.
I've downloaded a build of Solr 4.0 to try the spatial features and got the
geodist working really fast.

We now have 2 other features that will be needed on this project:
1. Returning the distance from the reference point to the search hit (in
kilometers)
2. Sorting by the distance.

On item 2, the wiki doc points that a distance function can be used but I
was not able to find good info on how to accomplish it.
Also, returning the distance (item 1) is noted as currently being in
development and there is some workaround to get it.

Anyone had experience with the spatial feature and could help with some
pointers on how to achieve it?

Thanks,
Alexandre


DataImportHandler in Solr 4.0

2011-02-23 Thread Alexandre Rocco
Hi guys,

I'm having some issues when trying to use the DataImportHandler on Solr
4.0.
I've downloaded the latest nightly build of Solr 4.0 and configured normally
(on the example folder) solrconfig.xml file like this:

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdata-config.xml/str
/lst
/requestHandler

At this point I noticed that the DIH jar was not being loaded correctly
causing exceptions like:
Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
and
java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataImportHandler

Do I need to build to get DIH running on Solr 4.0?

Thanks!
Alexandre


DataImportHandler in Solr 4.0

2011-02-23 Thread Alexandre Rocco
Hi guys,

I'm having some issues when trying to use the DataImportHandler on Solr 4.0.

I've downloaded the latest nightly build of Solr 4.0 and configured normally
(on the example folder) solrconfig.xml file like this:

requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=configdata-config.xml/str
/lst
/requestHandler

At this point I noticed that the DIH jar was not being loaded correctly
causing exceptions like:
Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler'
and
java.lang.ClassNotFoundException:
org.apache.solr.handler.dataimport.DataImportHandler

Do I need to build to get DIH running on Solr 4.0?

Thanks!
Alexandre


Re: DataImportHandler in Solr 4.0

2011-02-23 Thread Alexandre Rocco
I got it working by building the DIH from the contrib folder and made a
change on the lib statements to map the folder that contains the .jar files.

Thanks!
Alexandre

On Wed, Feb 23, 2011 at 8:55 PM, Smiley, David W. dsmi...@mitre.org wrote:

 The DIH is no longer supplied embedded in the Solr war file.  You need to
 get it on the classpath somehow. You could add another lib... statement to
 solrconfig.xml to resolve this.

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Feb 23, 2011, at 4:11 PM, Alexandre Rocco wrote:

  Hi guys,
 
  I'm having some issues when trying to use the DataImportHandler on Solr
 4.0.
 
  I've downloaded the latest nightly build of Solr 4.0 and configured
 normally
  (on the example folder) solrconfig.xml file like this:
 
  requestHandler name=/dataimport
  class=org.apache.solr.handler.dataimport.DataImportHandler
  lst name=defaults
  str name=configdata-config.xml/str
  /lst
  /requestHandler
 
  At this point I noticed that the DIH jar was not being loaded correctly
  causing exceptions like:
  Error loading class
 'org.apache.solr.handler.dataimport.DataImportHandler'
  and
  java.lang.ClassNotFoundException:
  org.apache.solr.handler.dataimport.DataImportHandler
 
  Do I need to build to get DIH running on Solr 4.0?
 
  Thanks!
  Alexandre











Faceting and first letter of fields

2010-10-14 Thread Alexandre Rocco
Guys,

We have a website running Solr indexing books, and we use a facet to filter
books by author.
After some time, we detected that this facet is very large and we need to
create some other feature to help finding the information.

Our product team asked to create a page that can show all authors by it's
initial letter, so we can distribute this query easier.
Is it a feasible solution to create another field containing only the
initial letter for the authors? Using this approach we will be able to
filter the authors using this newly created field.
Do you think there will be any performance penalty on creating a couple of
fields with the initial letter of these other fields (author, publisher)?

I guess that this approach is way easier than other solutions we came up
with.
Am I missing other alternatives?

Thanks,
Alexandre


Re: Faceting and first letter of fields

2010-10-14 Thread Alexandre Rocco
Thank you for both responses.

Another question I have is where the processing of this first letter is
more adequate.
I am considering updating my data import handler to execute a script to
extract the first letter from the author field.

I saw other thread when someone mentioned using a field analyser to extract
the letter using a regex.
Which one is the best option?

Thanks!
Alexandre

On Thu, Oct 14, 2010 at 4:46 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Oct 14, 2010 at 3:42 PM, Jonathan Rochkind rochk...@jhu.edu
 wrote:
  I believe that should work fine in Solr 1.4.1.  Creating a field with
 just
  first letter of author is definitely the right (possibly only) way to
 allow
  facetting on first letter of author's name.
 
  I have very voluminous facets (few facet values, many docs in each value)
  like that in my app too, works fine.
 
  I get confused over the different facetting methods available in 1.4.1,
 and
  exactly when each is called for. If you see initial problems, you could
 try
  switching the facet.method and see what happens.

 Right - for faceting on first letter, you should probably use
 facet.method=enum
 since there will only be 26 values (assuming english/western languages).

 In the future, I'm hoping we can come up with a smarter way to pick
 the facet.method if it's not supplied.  The new flex API in 4.0-dev
 should help out here.

 -Yonik
 http://www.lucidimagination.com



Re: Jetty rerturning HTTP error code 413

2010-08-19 Thread Alexandre Rocco
Hi diddier,

I have updated my etc/jetty.xml and updated my headerBufferSize to 2x as:
Set name='headerBufferSize'16384/Set

But the error persists. Do you know if there is any other config that should
be updated so this setting works?
Also, is there any way to check if jetty is use this config inside Solr
admin pages? I know that we can check the Java properties but I haven't
found any way to locate the jetty config there.

Thanks!
Alexandre

On Wed, Aug 18, 2010 at 4:58 PM, didier deshommes dfdes...@gmail.comwrote:

 Hi Alexandre,
 Have you tried setting a higher headerBufferSize?  Look in
 etc/jetty.xml and search for 'headerBufferSize'; I think it controls
 the size of the url. By default it is 8192.

 didier

 On Wed, Aug 18, 2010 at 2:43 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  Guys,
 
  We are facing an issue executing very large query (~4000 bytes in the
 URL)
  in Solr.
  When we execute the query, Solr (probably Jetty) returns a HTTP 413 error
  (FULL HEAD).
 
  I guess that this is related to the very big query being executed, and
  currently we can't make it short.
  Is there any configuration that need to be tweaked on Jetty or other
  component to make this query work?
 
  Any advice is really appreciated.
 
  Thanks!
  Alexandre Rocco
 



Re: Jetty rerturning HTTP error code 413

2010-08-19 Thread Alexandre Rocco
Hi diddier,

Nevermind.
I figured it out. There was some miscommunication between me and our IT guy.

Thanks for helping. It's fixed now.

Alexandre

On Thu, Aug 19, 2010 at 9:59 AM, Alexandre Rocco alel...@gmail.com wrote:

 Hi diddier,

 I have updated my etc/jetty.xml and updated my headerBufferSize to 2x as:
 Set name='headerBufferSize'16384/Set

 But the error persists. Do you know if there is any other config that
 should be updated so this setting works?
 Also, is there any way to check if jetty is use this config inside Solr
 admin pages? I know that we can check the Java properties but I haven't
 found any way to locate the jetty config there.

 Thanks!
 Alexandre

 On Wed, Aug 18, 2010 at 4:58 PM, didier deshommes dfdes...@gmail.comwrote:

 Hi Alexandre,
 Have you tried setting a higher headerBufferSize?  Look in
 etc/jetty.xml and search for 'headerBufferSize'; I think it controls
 the size of the url. By default it is 8192.

 didier

 On Wed, Aug 18, 2010 at 2:43 PM, Alexandre Rocco alel...@gmail.com
 wrote:
  Guys,
 
  We are facing an issue executing very large query (~4000 bytes in the
 URL)
  in Solr.
  When we execute the query, Solr (probably Jetty) returns a HTTP 413
 error
  (FULL HEAD).
 
  I guess that this is related to the very big query being executed, and
  currently we can't make it short.
  Is there any configuration that need to be tweaked on Jetty or other
  component to make this query work?
 
  Any advice is really appreciated.
 
  Thanks!
  Alexandre Rocco
 





Jetty rerturning HTTP error code 413

2010-08-18 Thread Alexandre Rocco
Guys,

We are facing an issue executing very large query (~4000 bytes in the URL)
in Solr.
When we execute the query, Solr (probably Jetty) returns a HTTP 413 error
(FULL HEAD).

I guess that this is related to the very big query being executed, and
currently we can't make it short.
Is there any configuration that need to be tweaked on Jetty or other
component to make this query work?

Any advice is really appreciated.

Thanks!
Alexandre Rocco


Re: Storing RandomSortField

2010-05-19 Thread Alexandre Rocco
Leonardo,

I was able to use the feature with a dynamic field as pointed in the
documentation.
So, I was just curious to take a peek at the values that are generated, even
when the field is not dynamic, so I tried to figure out a way to do so.
Maybe some output when the debug query is enabled would be useful, but it
seems it's not implemented yet.
I will try to take a look at the classes and see what can I do about it.

Thanks!

On Wed, May 19, 2010 at 5:34 AM, Leonardo Menezes 
leonardo.menez...@googlemail.com wrote:

 Hey,
   for random sorting, random values are generated in runtime using the seed
 you passed as one of the parameters to generate the value, among other
 things. this way, if the value you use as seed is the same in different
 request, the sorting order should be the same. you could also, for debbuing
 purposes, edit the random sort field class and put some traces in there, so
 it could print the id of the document and the value generated for example.
 but the values wont be stored on the idx.

 cheers

 On Wed, May 19, 2010 at 10:00 AM, Marco Martinez 
 mmarti...@paradigmatecnologico.com wrote:

  Hi Alexandre,
 
  I am not totally sure about this, but the random sort field its only used
  to
  do a random sort on your searchs, and you will to pass differents values
 to
  have differents sorts, so this only applies in the searchs, so no value
 is
  indexed. You will find more information here:
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
 
  Marco Martínez Bautista
  http://www.paradigmatecnologico.com
  Avenida de Europa, 26. Ática 5. 3ª Planta
  28224 Pozuelo de Alarcón
  Tel.: 91 352 59 42
 
 
  2010/5/18 Alexandre Rocco alel...@gmail.com
 
   Hi guys,
  
   Is there any way to mak a RandomSortField be stored?
   I'm trying to do it for debugging purposes,
   My intention is to take a look at the values that are stored there to
   determine the sorting that is being applied to the results.
  
   I tried to make it a stored field as:
   field name=randomorder type=random stored=true /
  
   And also tried to create another text field, copying the result from
 the
   random field like this:
   field name=randomorderdebug type=text indexed=true
 stored=true/
   copyField source=randomorder dest=randomorderdebug/
  
   Neither of the approaches worked.
   Is there any restriction on this kind of field that prevents it from
  being
   displayed in the results?
  
   Thanks,
   Alexandre
  
 



Storing RandomSortField

2010-05-18 Thread Alexandre Rocco
Hi guys,

Is there any way to mak a RandomSortField be stored?
I'm trying to do it for debugging purposes,
My intention is to take a look at the values that are stored there to
determine the sorting that is being applied to the results.

I tried to make it a stored field as:
field name=randomorder type=random stored=true /

And also tried to create another text field, copying the result from the
random field like this:
field name=randomorderdebug type=text indexed=true stored=true/
copyField source=randomorder dest=randomorderdebug/

Neither of the approaches worked.
Is there any restriction on this kind of field that prevents it from being
displayed in the results?

Thanks,
Alexandre