Re: Geographical distance searching

2007-09-27 Thread Guillaume Smet
Hi Patrick,

On 9/27/07, patrick o'leary [EMAIL PROTECTED] wrote:
  p.s after a little tidy up I'll be adding this to both lucene and solr's 
 repositories if folks feel that it's a useful addition.

It's definitely very interesting. Did you compare performances of
Lucene with a database allowing you to perform real GIS queries?
I'm more a PostgreSQL guy and I must admit we usually use cube contrib
or PostGIS for this sort of thing and with both, we are capable to use
indexes for proximity queries and they can be pretty fast. Using the
method you used with MySQL is definitely too slow and not used as soon
as you have a certain amount of data in your table.

Regards,

-- 
Guillaume


Re: searching for non-empty fields

2007-09-27 Thread Pieter Berkel
While in theory -URL: should be valid syntax, the Lucene query parser
doesn't accept it and throws a ParseException.  I've considered raising this
issue on lucene-dev but it didn't seem to affect many users so I decided not
to pursue the matter.



On 27/09/2007, Chris Hostetter [EMAIL PROTECTED] wrote:

 ...and to work arround the problem untill you reindex...

 q=(URL:[* TO *] -URL:)

 ...at least: i'm 97% certain that will work.  it won't help if you empty
 values are really   oror ...




Re: Geographical distance searching

2007-09-27 Thread patrick o'leary




As far as I'm concerned nothings going to beat PG's GIS calculations,
but it's tsearch was
a lot slower than myisam. 

My goal was a single solution to reduce our complexity, but am
interested to know if combining
both an rdbms  lucene works for you. Definitely let me know how it
goes !

P

Guillaume Smet wrote:

  Hi Patrick,

On 9/27/07, patrick o'leary [EMAIL PROTECTED] wrote:
  
  
 p.s after a little tidy up I'll be adding this to both lucene and solr's repositories if folks feel that it's a useful addition.

  
  
It's definitely very interesting. Did you compare performances of
Lucene with a database allowing you to perform real GIS queries?
I'm more a PostgreSQL guy and I must admit we usually use cube contrib
or PostGIS for this sort of thing and with both, we are capable to use
indexes for proximity queries and they can be pretty fast. Using the
method you used with MySQL is definitely too slow and not used as soon
as you have a certain amount of data in your table.

Regards,

  


-- 
Patrick O'Leary


You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles.
 Do you understand this? 
And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat.
  - Albert Einstein

View
Patrick O Leary's profile





Re: searching for non-empty fields

2007-09-27 Thread Brian Whitman

thanks Peter, Hoss and Ryan..


q=(URL:[* TO *] -URL:)


This gives me 400 Query parsing error: Cannot parse '(URL:[* TO *] - 
URL:)': Lexical error at line 1, column 29. Encountered: \ (34),  
after : \




adding something like:
  filter class=solr.LengthFilterFactory min=1 max=1 /


I'll do this but the problem here is I have to wait around for all  
these docs to re-index..


Your query will work if you make sure the URL field is omitted from  
the

document at index time when the field is blank.


The thing is, I thought I was omitting the field if it's blank. It's  
in a solrj instance that takes a lucenedocument, so maybe it's a  
solrj issue?


   if( URL != null  URL.length()  5 )
  doc.add(new Field(URL, URL, Field.Store.YES,  
Field.Index.UN_TOKENIZED));


And then during indexing:

SimpleSolrDoc solrDoc = new SimpleSolrDoc();
solrDoc.setBoost( null, new Float ( doc.getBoost()));
for (EnumerationField e = doc.fields(); e.hasMoreElements();) {
  Field field = e.nextElement();
  if (!ignoreFields.contains((field.name( {
solrDoc.addField(field.name(), field.stringValue());
  }
}
try {
  solr.add(solrDoc);
...







LockObtainFailedException

2007-09-27 Thread Jae Joo
will anyone help me why and how?


org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
SimpleFSLock@/usr/local/se
archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579)
at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java
:341)
at org.apache.solr.update.SolrIndexWriter.lt;initgt;(
SolrIndexWriter.java:65)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(
UpdateHandler.java:120)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(
DirectUpdateHandler2.java:181)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(
DirectUpdateHandler2.java:259)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:166)
at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
(XmlUpdateRequestHandler
.java:84)

Thanks,

Jae Joo


Re: LockObtainFailedException

2007-09-27 Thread matt davies

quick fix

look for a lucene lock file in your tmp directory and delete it, then  
restart solr, should start


I am an idiot though, so be careful, in fact, I'm worse than an  
idiot, I know a little


:-)

you got a lock file somewhere though, deleting that will help you  
out, for me it was in my /tmp directory


On 27 Sep 2007, at 14:10, Jae Joo wrote:


will anyone help me why and how?


org.apache.lucene.store.LockObtainFailedException: Lock obtain  
timed out:

SimpleFSLock@/usr/local/se
archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:70)
at org.apache.lucene.index.IndexWriter.init 
(IndexWriter.java:579)
at org.apache.lucene.index.IndexWriter.lt;initgt; 
(IndexWriter.java

:341)
at org.apache.solr.update.SolrIndexWriter.lt;initgt;(
SolrIndexWriter.java:65)
at org.apache.solr.update.UpdateHandler.createMainIndexWriter(
UpdateHandler.java:120)
at org.apache.solr.update.DirectUpdateHandler2.openWriter(
DirectUpdateHandler2.java:181)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(
DirectUpdateHandler2.java:259)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:166)
at  
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody

(XmlUpdateRequestHandler
.java:84)

Thanks,

Jae Joo




Re: What is facet?

2007-09-27 Thread Erik Hatcher


On Sep 26, 2007, at 7:28 PM, Chris Hostetter wrote:
  cool = (popularity:[100 TO *] (+numFeatures:[10 TO *] +price:[0  
TO 10]))
  lame = (+popularity:[* TO 99] +numFeatures:[* TO 9] +price:[11  
TO *])


That example is definitely in the cool category.   I couldn't resist  
creating a SolrTerminology wiki page linking to your post and  
breaking out the definitions we Solr folks want to embrace.  I think  
it's a good idea to some common language definitions we agree upon here.


Erik



Re: searching for non-empty fields

2007-09-27 Thread Yonik Seeley
On 9/27/07, Pieter Berkel [EMAIL PROTECTED] wrote:
 While in theory -URL: should be valid syntax, the Lucene query parser
 doesn't accept it and throws a ParseException.

I don't have time to work on that now, but I did just open a bug:
https://issues.apache.org/jira/browse/LUCENE-1006

-Yonik


Request for graphics

2007-09-27 Thread Benjamin Liles
I am trying to make a presentation on SOLR and have been unable to find 
the SOLR graphic in high quality.  Could someone point me in the right 
direction or provide the graphics?


Thanks,

Benjamin Liles

Lead Software Application Developer
Digital Initiatives - Web Services
University Libraries
Texas AM University
[EMAIL PROTECTED]

3.109E Library Annex | 5000 TAMU | College Station, TX 77843

Tel. 979.862.4948x122

http://library.tamu.edu



Re: moving index

2007-09-27 Thread Yonik Seeley
On 9/27/07, Jae Joo [EMAIL PROTECTED] wrote:
 I do need to move the index files, but have a concerns any potential problem
 including performance?
 Do I have to keep the original document for querying?

I assume you posted XML documents in Solr XML format (like adddoc...)?
If so, that is just an example way to get the data into Solr.  Those
XML files aren't needed, and any high-speed indexing will avoid
creating files at all - just create the XML doc in memory and send to
solr via HTTP-POST.

-Yonik


Re: Converting German special characters / umlaute

2007-09-27 Thread Steven Rowe
Chris Hostetter wrote:
 : is there an analyzer which automatically converts all german special
 : characters to their specific dissected from, such as ü to ue and ä to
 : ae, etc.?!
 
 See also the ISOLatin1TokenFilter which does this regardless of langauge.

Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to
/ae/, etc.

Instead, it converts /ü/ to /u/, /ä/ to /a/, etc.  It *does* convert /ß/
to /ss/, though I've seen some people write that the correct
substitution for /ß/ in German is /sz/ - I don't speak or read German,
so I don't know.

Maybe there should be an option on ISOLatin1TokenFilter to use German
substitutions, in addition to the current behavior of simply stripping
diacritics?

Does anyone know if there are other (Latin-1-utilizing) languages
besides German with standardized diacritic substitutions that involve
something other than just stripping the diacritics?

Steve



Problem with handle hold deleted files

2007-09-27 Thread Danilo Fantinato
Hi,
I'm using EmbeddedSolrServer and when I start the snapinstaller  process i'm
calling the commit method of the EmbeddedSolr througth a servlet but the JVM
holds deleted files on Operating System and usage disk space excessive.
Follow line sample  from the command lsof |grep deleted
java  17255 weblogic  419r  REG  104,6  437821226462
/domains/solr-indexes/q/OPNPrecoIndex/datasolr/index_22746_preCommit/_2kb.cfs
(deleted)

When restarting the JVM process, the deleted files opened was clear and the
disk space was free.

I need help on this case.


Re: LockObtainFailedException

2007-09-27 Thread Jae Joo
In solrconfig.xml,
useCompoundFilefalse/useCompoundFile
mergeFactor10/mergeFactor
maxBufferedDocs25000/maxBufferedDocs
maxMergeDocs1400/maxMergeDocs
maxFieldLength500/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout

Does writeLockTimeout too small?

Thanks,

Jae
On 9/27/07, matt davies [EMAIL PROTECTED] wrote:

 quick fix

 look for a lucene lock file in your tmp directory and delete it, then
 restart solr, should start

 I am an idiot though, so be careful, in fact, I'm worse than an
 idiot, I know a little

 :-)

 you got a lock file somewhere though, deleting that will help you
 out, for me it was in my /tmp directory

 On 27 Sep 2007, at 14:10, Jae Joo wrote:

  will anyone help me why and how?
 
 
  org.apache.lucene.store.LockObtainFailedException: Lock obtain
  timed out:
  SimpleFSLock@/usr/local/se
  archengine/apache-solr-1.2.0/fr_companies/solr/data/index/write.lock
  at org.apache.lucene.store.Lock.obtain(Lock.java:70)
  at org.apache.lucene.index.IndexWriter.init
  (IndexWriter.java:579)
  at org.apache.lucene.index.IndexWriter.lt;initgt;
  (IndexWriter.java
  :341)
  at org.apache.solr.update.SolrIndexWriter.lt;initgt;(
  SolrIndexWriter.java:65)
  at org.apache.solr.update.UpdateHandler.createMainIndexWriter(
  UpdateHandler.java:120)
  at org.apache.solr.update.DirectUpdateHandler2.openWriter(
  DirectUpdateHandler2.java:181)
  at org.apache.solr.update.DirectUpdateHandler2.addDoc(
  DirectUpdateHandler2.java:259)
  at org.apache.solr.handler.XmlUpdateRequestHandler.update(
  XmlUpdateRequestHandler.java:166)
  at
  org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
  (XmlUpdateRequestHandler
  .java:84)
 
  Thanks,
 
  Jae Joo




Re: searching for non-empty fields

2007-09-27 Thread Yonik Seeley
On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:
 On 9/27/07, Pieter Berkel [EMAIL PROTECTED] wrote:
  While in theory -URL: should be valid syntax, the Lucene query parser
  doesn't accept it and throws a ParseException.

 I don't have time to work on that now,

OK, I lied :-)  It was simple (and a nice diversion).

-Yonik

 but I did just open a bug:
 https://issues.apache.org/jira/browse/LUCENE-1006


Re: Converting German special characters / umlaute

2007-09-27 Thread J.J. Larrea
At 12:13 PM -0400 9/27/07, Steven Rowe wrote:
Chris Hostetter wrote:
 : is there an analyzer which automatically converts all german special
 : characters to their specific dissected from, such as ü to ue and ä to
 : ae, etc.?!

 See also the ISOLatin1TokenFilter which does this regardless of langauge.

Actually, ISOLatin1TokenFilter does NOT convert /ü/ to /ue/, /ä/ to
/ae/, etc.

Instead, it converts /ü/ to /u/, /ä/ to /a/, etc.  It *does* convert /ß/
to /ss/, though I've seen some people write that the correct
substitution for /ß/ in German is /sz/ - I don't speak or read German,
so I don't know.

You and lots of other people, including myself... Thus while there is indeed a 
specific dissected form -- certainly German speakers clearly understand that 
when an input mechanism doesn't allow for umlauted vowels (e.g. ASCII, 
non-German typewriters) that the /ue/, /ae/, etc. equivalents are to be used -- 
if maximally flexible matching between input texts and queries is desired, an 
information system used by non-German speakers has to account for them simply 
ignoring the umlaut and entering /u/, /e/ etc. while /ß/ needs to be matched as 
itself, /ss/, /sz/ (/ß/ is read as 'ess zed'), and I expect even /b/.

So perhaps it would make sense for translation into a canonical format /ü/ to 
/ue/ and /ß/ to /ss/ at both index and query time, but also to then emit 
synonym (overlapping) tokens with /ue/ - /u/, /sz/ - /ss/, and perhaps even 
/b/ - /ss/.

(This is just thinking aloud and I'd love to be corrected by someone with more 
experience in this realm)

Maybe there should be an option on ISOLatin1TokenFilter to use German
substitutions, in addition to the current behavior of simply stripping
diacritics?

As for implementation, the first part could easily and flexibly accomplished 
with the current PatternReplaceFilter, and I'm thinking the second could be 
done with an extension to that or better yet a new Filter which allows parsing 
synonymous tokens from a flat to overlaid format, e.g. something on the order 
of:

filter class=solr.PatternReplaceFilterFactory
 pattern=(.*)(ü|ue)(.*)
 replacement=$1ue$3|$1u$3
 tokensep=|  !-- not currently implemented --
 replace=first/

or perhaps better,

filter class=solr.PatternReplaceFilterFactory
 pattern=(.*)(ü|ue)(.*)
 replacement=$1ue$3|$1u$3
 replace=first/
filter class=solr.OverlayTokenFilterFactory
 tokensep=|/   !-- not currently implemented --

which in my fantasy implementation would map:

Müller - Mueller|Muller
Mueller - Mueller|Muller
Muller - Muller

and could be run at index-time and/or query-time as appropriate.

Does anyone know if there are other (Latin-1-utilizing) languages
besides German with standardized diacritic substitutions that involve
something other than just stripping the diacritics?

I'm curious about this too.

- J.J.


Re: Converting German special characters / umlaute

2007-09-27 Thread Walter Underwood
Accent transforms are language-specific, so an accent filter
should take an ISO langauge code as an argument.

Some examples:

* In French and English, a diereses is a hint to pronounce neighboring
vowels separateley, as in coöp, naïve, or Noël.

* In German, ü transformes to ue.

* In Swedish, ö is a different letter than o, and should
not be transformed. The same is true for ø in Danish and
Norwegian.

* Then there is Motörhead and Motley Crüe, see:
http://en.wikipedia.org/wiki/Heavy_metal_umlaut

* I don't know of an ISO language code for Tolkein's
Elvish, so we're out of luck for Manwë.

Another approach would be to generate the accent-transformed
terms as synonyms at the same token position. Then you could
generate multiple options.

Obviously, we had to do this right for Ultraseek a few years ago.

wunder

On 9/27/07 9:13 AM, Steven Rowe [EMAIL PROTECTED] wrote:

 Maybe there should be an option on ISOLatin1TokenFilter to use German
 substitutions, in addition to the current behavior of simply stripping
 diacritics?
 
 Does anyone know if there are other (Latin-1-utilizing) languages
 besides German with standardized diacritic substitutions that involve
 something other than just stripping the diacritics?



Re: Date facetting and ranges overlapping

2007-09-27 Thread Chris Hostetter
: I'm now using date facetting to browse events. It works really fine
: and is really useful. The only problem so far is that if I have an
: event which is exactly on the boundary of two ranges, it is referenced
: 2 times.

yeah, this is one of the big caveats with date faceting right now ... i 
struggled with this a bit when designing it, and ultimately decided to 
punt on the issue.  the biggest hangup was that even if hte facet counting 
code was smart about making sure the ranges don't overlap, the range query 
syntax in the QueryParser doesn't support ranges that exclude one input 
(so there wouldn't be a lot you can do with the ranges once you know the 
counts in them)

one idea i had in SOLR-258 was that we could add an interval option that 
would define how much to add to the end or one range to get the start 
of another range (think of the current implementation having interval 
hardcoded to 0) which would solve the problem and work with range 
queries that were inclusive of both endpoints, but would require people to 
use -1MILLI a lot.

a better option (assuming a query parser change) would be a new option 
thta says wether each computed range should be enclusive of the low poin,t 
the high point, both end points, neither end points, or be smart (where 
smart is the same as low except for the last range where the it includes 
both)

(I think there's already a lucene issue to add the query parser support, i 
just haven't had time to look at it)

The simple workarround: if you know all of your data is indexed with 
perfect 0.000second precision, then put -1MILLI at the end of your start 
and end date faceting params.



-Hoss



Re: custom sorting

2007-09-27 Thread Erik Hatcher


On Sep 27, 2007, at 2:50 PM, Chris Hostetter wrote:

to answer the broader question of using customized
LUcene SortComparatorSource objects in solr -- it is in fact possible.

In Solr, all decisisons about how to sort are driven by  
FieldTypes.  You
can subclass any of the FieldTypes that come with Solr and override  
just
the getSortField method to use whatever sort logic you want and  
then use

your new FieldType as you would any other plugin...

http://wiki.apache.org/solr/SolrPlugins

In the case where you have a custom SortComparatorSource that is not
field specific (or uses data from morethen one field) you would  
need to
make your field type smart enough to let you cofigure (via the  
fieldType
declaration in the schema) which fields (if any) to get it's data  
from,
and then create a marker field of that type, which you don't use to  
index

or store any data, but you use to indicate when to trigger your custom
sort logic, ie...


fieldType name=distance class=solr.YourField
   latFieldName=latitude lonFieldName=longitute
   stored=false indexed=false /
...
   field name=latitude type=sint indexed=true stored=true /
   field name=latitude type=sint indexed=true stored=true /
   field name=distance type=distance /

...and then use sort=distance+asc in your query


Using something like this, how would the custom SortComparatorSource  
get a parameter from the request to use in sorting calculations?


I haven't looked under the covers of the local-solr stuff that flew  
by earlier, but looks quite well done.  I think I can speak for many  
that would love to have geo field types / sorting capability built  
into Solr.


Erik



Selecting Distinct values?

2007-09-27 Thread David Whalen
Hi there.

Is there a query I can use to select distinct values in an index?
I thought I could use a facet, but the facets don't seem to return
all the distinct values in the index, only the highest-count ones.

Is there another query I can try?  Or, can I adjust the facets
somehow to make this work?

Thanks,

DW



Re: custom sorting

2007-09-27 Thread Yonik Seeley
On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 Using something like this, how would the custom SortComparatorSource
 get a parameter from the request to use in sorting calculations?

perhaps hook in via function query:
  dist(10.4,20.2,geoloc)

And either manipulate the score with that and sort by score,

q=+(foo bar)^0 dist(10.4,20.2,geoloc)
sort=score asc

or extend solr's sorting mechanisms to allow specifying a function to sort by.

sort=dist(10.4,20.2,geoloc) asc

-Yonik


Re: Date facetting and ranges overlapping

2007-09-27 Thread Guillaume Smet
On 9/27/07, Chris Hostetter [EMAIL PROTECTED] wrote:
 a better option (assuming a query parser change) would be a new option
 thta says wether each computed range should be enclusive of the low poin,t
 the high point, both end points, neither end points, or be smart (where
 smart is the same as low except for the last range where the it includes
 both)

That could be really cool.

 The simple workarround: if you know all of your data is indexed with
 perfect 0.000second precision, then put -1MILLI at the end of your start
 and end date faceting params.

Good idea. The only problem is that I'll have to modify my client code
to deal with the fact that solr now returns 17:59:59 instead of
18:00:00. Not difficult but less clean than before.

Thanks for the advice. I'll give it a try.

--
Guillaume


Re: Selecting Distinct values?

2007-09-27 Thread Mike Klaas

On 27-Sep-07, at 12:01 PM, David Whalen wrote:


Hi there.

Is there a query I can use to select distinct values in an index?
I thought I could use a facet, but the facets don't seem to return
all the distinct values in the index, only the highest-count ones.

Is there another query I can try?  Or, can I adjust the facets
somehow to make this work?


http://wiki.apache.org/solr/ 
SimpleFacetParameters#head-1b281067d007d3fb66f07a3e90e9b1704cbc59a3


cheers,
-Mike


Re: Date facetting and ranges overlapping

2007-09-27 Thread Guillaume Smet
On 9/27/07, Chris Hostetter [EMAIL PROTECTED] wrote:
 The simple workarround: if you know all of your data is indexed with
 perfect 0.000second precision, then put -1MILLI at the end of your start
 and end date faceting params.

It fixed my problem. Thanks.

--
Guillaume


RE: What is facet?

2007-09-27 Thread Teruhiko Kurosaka
Thank you Ezra and Chris for explaining this,
and I like your idea, Erik.  This will make intro to Solr
easier for new comers, and make Solr more popular.

-Kuro 


 That example is definitely in the cool category.   I couldn't resist  
 creating a SolrTerminology wiki page linking to your post and  
 breaking out the definitions we Solr folks want to embrace.  I think  
 it's a good idea to some common language definitions we agree 
 upon here.
 
   Erik


RE: Selecting Distinct values?

2007-09-27 Thread David Whalen
grin  Silly me.  Thanks!

  

 -Original Message-
 From: Mike Klaas [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 27, 2007 4:46 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Selecting Distinct values?
 
 On 27-Sep-07, at 12:01 PM, David Whalen wrote:
 
  Hi there.
 
  Is there a query I can use to select distinct values in an index?
  I thought I could use a facet, but the facets don't seem to 
 return all 
  the distinct values in the index, only the highest-count ones.
 
  Is there another query I can try?  Or, can I adjust the 
 facets somehow 
  to make this work?
 
 http://wiki.apache.org/solr/SimpleFacetParameters#head-1b28106
 7d007d3fb66f07a3e90e9b1704cbc59a3
 
 cheers,
 -Mike
 
 


Re: anyone can send me jetty-plus

2007-09-27 Thread Matt Kangas
If you're using Jetty 6, there's no need for a separate Jetty Plus  
download. The plus jarfiles come in the standard distribution.


--matt

On Sep 27, 2007, at 12:10 AM, James liu wrote:

i can't download it from http://jetty.mortbay.org/jetty5/plus/ 
index.html


--
regards
jl


--
Matt Kangas / [EMAIL PROTECTED]