Re: Newbie with Java + typo

2008-01-21 Thread Michael Kimsal
Daniel:

As a fellow 'non-java' person I feel your pain (well, felt it anyway).  A
lot depends on your load and the machine, but I successfully ran the stock
jetty system on a box last summer for work and didn't have performance
problems.  The bigger issue was from the other java people complaining that
I hadn't used the standard jboss setup they had already working.  However, I
didnt' have access to that machine, nor would anyone give it to me at the
time, so it was a catch 22.  Performance-wise, the stock jetty will probably
do just fine for you.  Longer term, you may want to learn more about jboss
or tomcat or something else which can give you more application management
options and such.

But don't let those things stop you from running jetty/solr in production -
it's worked fine for me.


On Jan 21, 2008 10:48 AM, Daniel Andersson [EMAIL PROTECTED] wrote:

 Hi people

 First the typo on http://wiki.apache.org/solr/mySolr:
 Production
 Typically it's not recommended do have your front end

 it should probably be ..recommended To have..



 Second, I don't know much about Java, nor about Jetty/Resin/JBoss/
 Tomcat. I went through the tutorial and was impressed with how easy
 it all seemed. Until the tutorial ended..

 As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing
 that comes with the example (Jetty, or?)?

 All the installation pages talk about this and that that doesn't make
 much sense to non-Java people like myself :-/

 Would be MUCH appreciated with some after-tutorial page for us
 newbies. Right now I'm just looking for something that can be used
 on a production level machine. It doesn't have to be the fastest, as
 long as it's fairly easy to install.

 Recommendations and pointers are very welcome :)



 Thanks in advance!



 / d




-- 
Michael Kimsal
http://webdevradio.com


Re: Leading WildCard in Query

2007-12-12 Thread Michael Kimsal
Please vote for SOLR-218.  I'm not aware of any other way to accomplish the
leading wildcard functionality that would be convenient.  SOLR-218 is not
asking that it be enabled by default, only that it be functionality that is
exposed to SOLR admins via config.xml.


On Dec 12, 2007 6:29 AM, Eswar K [EMAIL PROTECTED] wrote:

 Hi All,

 I understand that a leading Wild card search is not allowed as it is a
 very
 costly operation. There is an issues logged for it . (
 http://issues.apache.org/jira/browse/SOLR-218). Is there any other way of
 enabling leading wildcards apart from doing it in code by calling *
 QueryParser.setAllowLeadingWildcard( true );*?

 Regards,
 Eswar




-- 
Michael Kimsal
http://webdevradio.com


Re: can I do *thing* substring searches at all?

2007-12-02 Thread Michael Kimsal
https://issues.apache.org/jira/browse/SOLR-218

Please vote for SOLR 218 and perhaps this setting will make it in to the
next version.  It's explicitly shut off in SOLR, but available in Lucene.

On Dec 2, 2007 9:56 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Would n- -g gr ra am mi in ng that field work for you?

 foothingbar -.fo oo ot th hi in ng gn ba ar
 *thing* - th hi in ng - bingo, a match

 Otis

 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

 - Original Message 
 From: Brian Whitman [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, November 29, 2007 11:51:37 PM
 Subject: can I do *thing* substring searches at all?

 With a fieldtype of string, can I do any sort of *thing* search? I
 can do thing* but not *thing or *thing*. Workarounds?










-- 
Michael Kimsal
http://webdevradio.com


Re: leading wildcards

2007-11-12 Thread Michael Kimsal
Vote for that issue and perhaps it'll gain some more traction.  A former
colleague of mine was the one who contributed the patch in SOLR 218 and it
would be nice to have that configuration option 'standard' (if off by
default) in the next SOLR release.


On Nov 12, 2007 11:18 AM, Traut [EMAIL PROTECTED] wrote:

 Seems like there is no way to enable leading wildcard queries except
 code editing and files repacking. :(

 On 11/12/07, Bill Au [EMAIL PROTECTED] wrote:
  The related bug is still open:
 
  http://issues.apache.org/jira/browse/SOLR-218
 
  Bill
 
  On Nov 12, 2007 10:25 AM, Traut [EMAIL PROTECTED] wrote:
   Hi
I found the thread about enabling leading wildcards in
   Solr as additional option in config file. I've got nightly Solr build
   and I can't find any options connected with leading wildcards in
   config files.
  
How I can enable leading wildcard queries in Solr? Thank
 you
  
  
   --
   Best regards,
   Traut
  
 


 --
 Best regards,
 Traut




-- 
Michael Kimsal
http://webdevradio.com


Re: Term extraction

2007-09-20 Thread Michael Kimsal
Not sure if this is in the same league or not, but Yahoo offers a term
extraction
web service.

http://developer.yahoo.com/search/content/V1/termExtraction.html



On 9/20/07, Grant Ingersoll [EMAIL PROTECTED] wrote:

 You might investigate some tools like Alias-i's LingPipe or do some
 searches for phrase recognition software, etc.

 -Grant

 On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

  I'm currently looking at methods of term extraction and automatic
  keyword
  generation from indexed documents.  I've been experimenting with
  MoreLikeThis and values returned by the mlt.interestingTerms
  parameter and
  so far this approach has worked well.  However, I'd like to be able to
  analyze documents more intelligently to recognize phrase keywords
  such as
  open source, Microsoft Office, Bill Gates rather than
  splitting each
  word into separate tokens (the field is never used in search
  queries so
  matching is not an issue).  I've been looking at
  SynonymFilterFactory as a
  possible solution to this problem but haven't been able to work out
  the
  specifics of how to configure it for phrase mappings.
 
  Has anybody else dealt with this problem before or able to offer any
  insights into achieve the desired results?
 
  Thanks in advance,
  Pieter

 --
 Grant Ingersoll
 http://lucene.grantingersoll.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ





-- 
Michael Kimsal
http://webdevradio.com


Re: Using Ruby to POST to Solr

2007-09-11 Thread Michael Kimsal
The curl man page states:

  If you start the data with the letter @, the rest should be a
file name to read the data from, or - if you want curl to read the data
  from  stdin.   The  contents  of the file must already be
url-encoded. Multiple files can also be specified. Posting data from a file
  named 'foobar' would thus be done with --data @foobar.




On 9/11/07, Matt Mitchell [EMAIL PROTECTED] wrote:

 Hi, I just posted this to the ruby/google group. It probably belongs
 here! Also, anyone know exactly what the @ symbol in the curl command
 is doing?
 Thanks,
 Matt


 I've got a script that uses curl, and would like (for educational
 purposes mind you) to use ruby instead. This is the curl command that
 works:

 F=./my_data.xml
 curl 'http://localhost:8080/update' --data-binary @$F -H 'Content-
 type:text/xml; charset=utf-8'

 I've been messing with Net::Http using something like below, with
 variations (Base64.encode64) but nothing works yet. Anyone know the
 ruby equivlent to the curl version above?

 Thanks!

 # NOT WORKING:
 my_url = 'http://localhost:8080/update'
 data = File.read('my_data.xml')
 url = URI.parse(my_url)
 post = Net::HTTP::Post.new(url.path)
 post.body = data
 post.content_type = 'application/x-www-form-urlencoded; charset=utf-8'
response = Net::HTTP.start(url.host, url.port) do |http|
  http.request(post)
end
 puts response.body




-- 
Michael Kimsal
http://webdevradio.com


Indexing HTML

2007-08-27 Thread Michael Kimsal
Hello

I'm trying to index individual lines of an HTML file, and I'm hitting this
error:

TEXT must be immediately followed by END_TAG and not START_TAG

I've got something that looks like

add
doc
field name=id4/field
field name=linea href=foobarbilinktext/i/b/a/field
/doc
/add

Actually, that sample code above, as its own data file POSTed to SOLR,
throws

parser must be on START_TAG or TEXT to read text (position: START_TAG seen
...lt;field name=linegt;lt;a href=foobargt;... @4:37

as an error.

Any clues as to how I can do this?  I'd like to keep the original copy of
each line intact in the index.

Thanks!

-- 
Michael Kimsal
http://webdevradio.com


Re: I'm using PHP curl post xml command to Solr,Is it the only way to post data?

2007-06-25 Thread Michael Kimsal

Using PHP5 (5.1 or higher I think)
http://us.php.net/manual/en/function.http-post-fields.php
is available.


From the example on that page:


$fields = array(
   'name' = 'mike',
   'pass' = 'passwordt'
);
$response = http_post_fields(http://www.example.com/;, $fields);


Looks pretty simple, but I haven't tried it yet.

On 6/25/07, Kijiji Xu, Ping [EMAIL PROTECTED] wrote:


What about fsockopen, Or any other simple method?



Thanks



--

Regards

Xp from china





--
Michael Kimsal
http://webdevradio.com


Re: Date range problem

2007-06-25 Thread Michael Kimsal

I've only been able to get date/time stuff to work when the entire full
date/time format is used

2007-05-30T12:34:56Z

Or is there a + in there too?

On 6/25/07, Stu Hood [EMAIL PROTECTED] wrote:


Hello,

Searching by date ranges doesn't seem to work in the example Solr install.
A query like `timestamp:[20070101 TO 20080101]` returns:

*message* *Invalid Date String:'20070101'*

*description* *The request sent by the client was syntactically incorrect
(Invalid Date String:'20070101').*
That query should be valid according to
http://lucene.apache.org/java/docs/queryparsersyntax.html#Range%20Searches

Any ideas?

Stu Hood
Webmail.us
You manage your business. We'll manage your email.(r)





--
Michael Kimsal
http://webdevradio.com


Benefit of schema

2007-06-23 Thread Michael Kimsal

Is there any benefit to using a fixed schema as opposed to the 'wildcard'
approach demonstrated in the sample schema.xml file?


--
Michael Kimsal
http://webdevradio.com


Re: Benefit of schema

2007-06-23 Thread Michael Kimsal

I wasn't sure if I was perhaps missing some sort of optimization that may
occur under the hood during querying.  I sort of thought that what you just
wrote may be the case.

Thanks!

On 6/23/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Jun 23, 2007, at 1:38 PM, Michael Kimsal wrote:
 Is there any benefit to using a fixed schema as opposed to the
 'wildcard'
 approach demonstrated in the sample schema.xml file?

It's nice to have known straightforward field names for querying, like:

title:web development AND author:kimsal

With wildcarded fields, you won't end up with such clean field names.

Other than aesthetics and how the client application will interact
with Solr, there really is no difference.

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Question to php to do with multi index

2007-04-27 Thread Michael Kimsal

The curl_multi is probably the most effective way, using straight PHP.
Another option would be to spawn several jobs, assuming unix/linux, and wait
for them to get done.  It doesn't give you very good error handling (well,
none at all actually!) but would let you run multiple indexing jobs at once.

Visit http://us.php.net/shell_exec and look at the 'class exec' contributed
note about halfway down the page.  It'll give you an idea of how to easily
spawn multiple jobs.

If you're using PHP5, the proc_open function may be another way to go.
proc_open was available in 4, but there were a number of extra parameters
and controls made available in 5.
http://us.php.net/manual/en/function.proc-open.php

An adventurous soul could combine the two concepts in to one class to manage
pipes communication between multiple child processes effectively.

On 4/26/07, James liu [EMAIL PROTECTED] wrote:


php not support multi thread,,,and how can u solve with multi index in
parallel?

now i use curl_multi

maybe more effect way i don't know,,,so if u know, tell me. thks.


--
regards
jl





--
Michael Kimsal
http://webdevradio.com


Re: case sensitivity

2007-04-27 Thread Michael Kimsal

Can you point me to the process for submitting these small patches?  I'm
looking at the jira site but don't see much of anything there outlining a
process for submitting patches.  Sorry to be so basic about this, but I'm
trying to follow correct procedures on both sides of the aisle, so to speak.


On 4/27/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:
 We're (and by 'we' I mean my esteemed colleague!) working on patching a
few
 of these items to be in the solrconf.xml file and should likely have
some
 patches submitted next week.  It's being done on 'company time' and I'm
not
 sure about the exact policy/procedure for this sort of thing here (or
 indeed, if there is one at all).

That's fine, as long as your company has agreed to contribute back the
patch (under the Apache license).  Apache enjoys a lot of business
support (being business friendly) and a *lot* of contributions is done
on company time.

Anything really big would probably need a CLA, but patches only
require clicking the grant license to ASF button in JIRA.

-Yonik





--
Michael Kimsal
http://webdevradio.com


case sensitivity

2007-04-26 Thread Michael Kimsal

I've looked through the mailing lists and can't find much of anything
regarding case sensitivity.  It
seems SOLR is case sensitive by default - I'm using the default settings
with a very basic schema - just text fields.

Is there any way to tell the query parser to be case insensitive during a
query?  Or do I have to reindex
all my data again with lowercase values?



--
Michael Kimsal
http://webdevradio.com


Re: case sensitivity

2007-04-26 Thread Michael Kimsal

I was just writing a followup.

I'm using the default text field type

   fieldtype name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   !-- in this example, we will only use synonyms at query time
   filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
   --
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldtype


That looks to me like it's got LowerCaseFilterFactory in the query analyzer
and the index analyzer.

I'm still digging in to this, but are there any other things to look for
anyone can point me to?  (Thanks Erik!)




On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
 I've looked through the mailing lists and can't find much of anything
 regarding case sensitivity.  It
 seems SOLR is case sensitive by default - I'm using the default
 settings
 with a very basic schema - just text fields.

All depends on the analysis you have set up for the fields.  If
you're indexing string-type fields in the default example schema,
there is effectively no analysis so searches must be exact matches
case and all.

 Is there any way to tell the query parser to be case insensitive
 during a
 query?  Or do I have to reindex
 all my data again with lowercase values?

Terms are indexed in a case-sensitive manner, so if you need case
insensitivity you need to lowercase on the way in and on querying.

Erik






--
Michael Kimsal
http://webdevradio.com


Re: case sensitivity

2007-04-26 Thread Michael Kimsal

type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) ) )
and
type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) ) )

Is this to do with the wildcards?

Actually, I've just answered my own question.

type:changelog AND ( ( (listing:fox) ) )
and
type:changelog AND ( ( (listing:Fox) ) )

give the same results.

But adding in the or listing:fox* or listing:*fox is always case-sensitive.
However,
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
to say that wildcard searches are not case-sensitive.

Unless someone can point out a way around this, it seems I'll need to
manually reindex and lower-case everything on the way in, then reformat my
search queries to be lower-case as well.



On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:


I was just writing a followup.

I'm using the default text field type

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index

tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/

--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=1 catenateNumbers=1 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=
solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=
solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true 
expand=true/
filter class=solr.StopFilterFactory ignoreCase=true words=
stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=
solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldtype


That looks to me like it's got LowerCaseFilterFactory in the query
analyzer and the index analyzer.

I'm still digging in to this, but are there any other things to look for
anyone can point me to?  (Thanks Erik!)




On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:


 On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
  I've looked through the mailing lists and can't find much of anything
  regarding case sensitivity.  It
  seems SOLR is case sensitive by default - I'm using the default
  settings
  with a very basic schema - just text fields.

 All depends on the analysis you have set up for the fields.  If
 you're indexing string-type fields in the default example schema,
 there is effectively no analysis so searches must be exact matches
 case and all.

  Is there any way to tell the query parser to be case insensitive
  during a
  query?  Or do I have to reindex
  all my data again with lowercase values?

 Terms are indexed in a case-sensitive manner, so if you need case
 insensitivity you need to lowercase on the way in and on querying.

 Erik





--
Michael Kimsal
http://webdevradio.com





--
Michael Kimsal
http://webdevradio.com


Re: case sensitivity

2007-04-26 Thread Michael Kimsal

My colleague, after some digging, found in SolrQueryParser

(around line 62)
setLowercaseExpandedTerms(false);

The default for Lucene is true.  Was this intentional?  Or an oversight?

Perhaps it's not related to my problem, but it seems that it might be.

Thanks in advance!

On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:


type:changelog AND ( ( (listing:Fox) or (listing:Fox*) or (listing:*Fox) )
)
and
type:changelog AND ( ( (listing:fox) or (listing:fox*) or (listing:*fox) )
)

Is this to do with the wildcards?

Actually, I've just answered my own question.

type:changelog AND ( ( (listing:fox) ) )
and
type:changelog AND ( ( (listing:Fox) ) )

give the same results.

But adding in the or listing:fox* or listing:*fox is always
case-sensitive. However,
http://wiki.apache.org/lucene-java/LuceneFAQ#head-133cf44dd3dff3680c96c1316a663e881eeac35aseems
 to say that wildcard searches are not case-sensitive.

Unless someone can point out a way around this, it seems I'll need to
manually reindex and lower-case everything on the way in, then reformat my
search queries to be lower-case as well.



On 4/26/07, Michael Kimsal [EMAIL PROTECTED] wrote:

 I was just writing a followup.

 I'm using the default text field type

 fieldtype name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index


 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/


 --
 filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=1 catenateNumbers=1 catenateAll=0/


 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=

 solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=

 solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true 
expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true words=

 stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 
catenateWords=0 catenateNumbers=0 catenateAll=0/


 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
 filter class=

 solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldtype


 That looks to me like it's got LowerCaseFilterFactory in the query
 analyzer and the index analyzer.

 I'm still digging in to this, but are there any other things to look for
 anyone can point me to?  (Thanks Erik!)




 On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 
 
  On Apr 26, 2007, at 5:43 PM, Michael Kimsal wrote:
   I've looked through the mailing lists and can't find much of
  anything
   regarding case sensitivity.  It
   seems SOLR is case sensitive by default - I'm using the default
   settings
   with a very basic schema - just text fields.
 
  All depends on the analysis you have set up for the fields.  If
  you're indexing string-type fields in the default example schema,
  there is effectively no analysis so searches must be exact matches
  case and all.
 
   Is there any way to tell the query parser to be case insensitive
   during a
   query?  Or do I have to reindex
   all my data again with lowercase values?
 
  Terms are indexed in a case-sensitive manner, so if you need case
  insensitivity you need to lowercase on the way in and on querying.
 
  Erik
 
 
 


 --
 Michael Kimsal
 http://webdevradio.com




--
Michael Kimsal
http://webdevradio.com





--
Michael Kimsal
http://webdevradio.com


Re: case sensitivity

2007-04-26 Thread Michael Kimsal

We're (and by 'we' I mean my esteemed colleague!) working on patching a few
of these items to be in the solrconf.xml file and should likely have some
patches submitted next week.  It's being done on 'company time' and I'm not
sure about the exact policy/procedure for this sort of thing here (or
indeed, if there is one at all).


On 4/26/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 26, 2007, at 6:03 PM, Michael Kimsal wrote:
 My colleague, after some digging, found in SolrQueryParser

 (around line 62)
 setLowercaseExpandedTerms(false);

 The default for Lucene is true.  Was this intentional?  Or an
 oversight?

I was just about to respond that this is likely the issue with your
non-totally-lowercased wildcard terms.

I don't consider it an oversight, but rather this whole analysis
business and wildcards are things that vary from project to project
on how they should be handled.  If you, have, for example, a string
field and want to do prefixed queries on them (trailing asterisk) you
wouldn't want the term to be lowercased.

I think we should open up as many of the switches as we can to
QueryParser, allowing users to tinker with them if they want, setting
the defaults to the most common reasonable settings we can agree upon.

Erik





--
Michael Kimsal
http://webdevradio.com


expressing this logic

2007-04-25 Thread Michael Kimsal

Hello all:

I'm trying to find a record in my index where the 'type' is changelog and
the 'filename' has 'angel' in it.

Expressing this as
type:changelog filename:+angel or filename:+angel* or filename:+*angel

throws a parse error (probably understandably)

type:changelog (filename:+angel or filename:+angel* or filename:+*angel)
doesn't seem to work either.

I've tried this a number of ways and I either get a parse error or
*everything* is returned - I only want
records where the type is 'changelog' and the filename has 'angel' in it.
How would this be expressed?


--
Michael Kimsal
http://webdevradio.com


Re: AW: Leading wildcards

2007-04-20 Thread Michael Kimsal

Maarten:

Would you mind sharing your custom query parser?


On 4/20/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


thanks, this worked like a charm !!

we built a custom QueryParser and we integrated the *foo** in it, so
basically we can now search leading, trailing and both ...

only crappy thing is the max Boolean clauses, but i'm going to look into
that after the weekend

for the next release of Solr :
do not make this default, too many risks
but do make an option in the config to enable it, it's a very nice feature


thanks everybody for the help and have a nice weekend,
maarten





Burkamp, Christian [EMAIL PROTECTED]
19/04/2007 12:37
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
AW: Leading wildcards






Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard
QueryParser class without changing the defaults. You can easily change
this by inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser
instance in QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It
would be nice to have an option in the schema to switch leading wildcards
on or off per field. Leading wildcards really make no sense on richly
populated fields because queries tend to result in too many clauses
exceptions most of the time.

This works for leading wildcards. Unfortunately it does not enable
searches with leading AND trailing wildcards. (E.g. searching for *lega*
does not find results even if the term elegance is in the index. If you
put a second asterisk at the end, the term elegance is found. (search
for *lega** to get hits).
Can anybody explain this though it seems to be more of a lucene
QueryParser issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question :
- is the latest version of lucene capable of handling leading wildcards ?
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten






--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

I've investigated this recently, and it looks like the latest lucene dev
supposedly supports leading/trailing at the same time.  However, I couldn't
get the latest dev solr to build with the latest dev lucene (as of two weeks
ago).  A lucene mailing list seemed to indicate that lucene as of the last
official build support both leading/trailing at the same time, but it then
seemed to indicate that it was a 'in development branch only' state still.
I can't find that thread, but that's my understanding of the current
situation.  It's bugged us a little bit, because it's something that we need
(to be able to emulate the previous foo LIKE '%bar%' SQL behaviour we're
replacing), but can't offer our users yet.

On 4/19/07, Burkamp, Christian [EMAIL PROTECTED] wrote:


Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard
QueryParser class without changing the defaults. You can easily change this
by inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser
instance in QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It
would be nice to have an option in the schema to switch leading wildcards on
or off per field. Leading wildcards really make no sense on richly populated
fields because queries tend to result in too many clauses exceptions most of
the time.

This works for leading wildcards. Unfortunately it does not enable
searches with leading AND trailing wildcards. (E.g. searching for *lega*
does not find results even if the term elegance is in the index. If you
put a second asterisk at the end, the term elegance is found. (search for
*lega** to get hits).
Can anybody explain this though it seems to be more of a lucene
QueryParser issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question :
- is the latest version of lucene capable of handling leading wildcards ?
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten





--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

Agreed, but in our tests (100M index) it wasn't a performance hit, and much
better (as in it actually worked) than MSSQL  ;)



On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 19, 2007, at 6:56 AM, Michael Kimsal wrote:
 It's bugged us a little bit, because it's something that we need
 (to be able to emulate the previous foo LIKE '%bar%' SQL behaviour
 we're
 replacing), but can't offer our users yet.

I have also run into this issue and have intended to fix up Solr to
allow configuring that switch on QueryParser.  I'll eventually get to
this, but someone supply a patch with a test case would get it done
sooner.

I must, however, caveat discussion of leading wildcards with the
underlying effect you get.  If you use standard analysis and perform
a leading wildcard query, you incur a (possibly) dramatic hit in
terms of performance.  Lucene has to scan *every* term in the
specified field.  In fact, with my 3.7M index, a fuzzy query for the
very same reason, kills the query.  There is also a switch on fuzzy
query that needs to be configurable through Solr, to adjust the
number of leading characters that are fixed to avoid this all term
scanning.

There are techniques that can be used to improve the performance of
in-string types of queries like this, at the expense of indexing time
and size and clever query creation.   One such technique I've used
successfully is term rotation enumeration (cat = cat$, at$c, t
$ca).   This involves custom analyzers and query creation.

Once Solr supports this switch, you may find performance fine with
leading wildcard queries, but at least be forewarned that there are
scalability skeletons in this closet.

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

It still seems like it's only something that would be invoked by a user's
query.

If I queried for *foobar and leading wildcards were not on in the server,
I'd get back nothing, which isn't really correct.  I'd think the application
should
tell the user that that syntax isn't supported.

Perhaps I'm simplifying it a bit.  It would certainly help out our comfort
level
to have it either be on or configurable by default, rather than having to
maintain a
'patched' version (yes, the patch is only one line, but it's the principle
of the thing).
I suspect this would be the same for others.



On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 19, 2007, at 10:39 AM, Yonik Seeley wrote:
 On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 parser.setAllowLeadingWildcards(true);

 I have also run into this issue and have intended to fix up Solr to
 allow configuring that switch on QueryParser.

 Any reason that parser.setAllowLeadingWildcards(true) shouldn't be
 the default?

That's fine by me.  But...

 Does it really need to be configurable?

It all depends on how bad of a hit it'd take on Solr.   What's the
breaking point where the performance of full-term scanning (and
subsequently faceting, of course) kills over or dies?   FuzzyQuery's
die on my 3.7M index and not-super-beefy hardware and system setup.

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Leading wildcards

2007-04-19 Thread Michael Kimsal

I'm in the middle of looking in to that.  For *you* ;)  it may seem like a
quick
thing to do.  For me, who's not an expert at this stuff, it's a balance
between delving in
deeply enough to figure how to do it and hitting our deadlines.

It's actually on someone else's plate here, but he's backed up with two
other projects here first.

It's not that I don't *want* to contribute, but hardly have enough time to
get the basics
done some days.

On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote:



On Apr 19, 2007, at 11:04 AM, Michael Kimsal wrote:
 Perhaps I'm simplifying it a bit.  It would certainly help out our
 comfort
 level
 to have it either be on or configurable by default, rather than
 having to
 maintain a
 'patched' version (yes, the patch is only one line, but it's the
 principle
 of the thing).
 I suspect this would be the same for others.

And here's where your effort could go the extra mile to help
_yourself_ out as well as the community... instead of the one-line
change, make it a few more lines and make it a switch from the
configuration (like the toggle for AND/OR default operator) and even
better round it out with a test case.  Submit it, lobby for it to be
reviewed and applied, and step 3... profit!  :)

Erik





--
Michael Kimsal
http://webdevradio.com


Re: Solr logo poll

2007-04-09 Thread Michael Kimsal

My wife votes for A.  :)

On 4/9/07, Nitin Borwankar [EMAIL PROTECTED] wrote:


B

Yonik Seeley wrote:

 Quick poll...  Solr 2.1 release planning is underway, and a new logo
 may be a part of that.
 What form of logo do you prefer, A or B?  There may be further
 tweaks to these pictures, but I'd like to get a sense of what the user
 community likes.

 A)
 http://issues.apache.org/jira/secure/attachment/12349897/logo-solr-d.jpg

 B)

http://issues.apache.org/jira/secure/attachment/12353535/12353535_solr-nick.gif


 Just respond to this thread with your preference.

 -Yonik



--


Nitin Borwankar

http://walruscarpenter.wordpress.comOf shoes  and ships  and sealing
wax  of cabbages and kings
http://greener.comFind, Learn, Act  Greener, the search engine for
the planet
http://tagschema.com  Implementation of tag database applications

[EMAIL PROTECTED]
510-872-7066






--
Michael Kimsal
http://webdevradio.com


Re: SOLR hosting

2007-03-23 Thread Michael Kimsal

Thanks.  Perhaps I should have clarified a bit.  I was looking more for the
first option.  And part of what I was asking for was to gauge some interest.
If there are no companies offering that, is there any demand in a service
like that?

On 3/23/07, Tim Archambault [EMAIL PROTECTED] wrote:


Is your question inherently asking if someone out there provides a service
that manages the indexes, etc for you and pre-installs and configures the
software?

If NOT, I can tell you that I bought a Linux VPS at Hostmysite.com cheaply
and dedicated 1 virtual domain to my SOLR instance and it worked fairly
easily. I'm no tech expert and got it to run.

Hope that helps.

Tim

On 3/21/07, Michael Kimsal [EMAIL PROTECTED] wrote:

 Are there any companies that offer hosted SOLR services?

 If not, is there any interest in the community in a service like this?


 --
 Michael Kimsal
 http://webdevradio.com






--
Michael Kimsal
http://webdevradio.com


Wildcards

2007-03-21 Thread Michael Kimsal

Hello all:

While I realize this goes against the grain of an indexing server, is there
any way to do wildcard searching like the following:

Term indexed is 123456789

Searching for *456* would find 123456789

Is there any mechanism to enable or allow for that scenario?

Thanks!


--
Michael Kimsal
http://webdevradio.com


Re: Wildcards

2007-03-21 Thread Michael Kimsal

This looks like a lucene issue.

http://www.nabble.com/-jira--Created%3A-%28LUCENE-839%29-WildcardQuery-do-not-find-documents-if-leading-and-trailing-*-is-used-tf3435336.html

And it appears to have been fixed recently:

This problem was already fixed since 2.1.0. 

When was 2.1.0 out?  Oh - last month.  Will there be new SOLR package
bundles with the latest lucene?



On 3/21/07, Michael Kimsal [EMAIL PROTECTED] wrote:


I changed the 'leading wildcard' setting in the query parser (well,
actually someone else here did, but it works).

*789 works

but

*456* still doesn't work.

Yeah, I guess I'm seeing the same behaviour as you are.  Does
 this seem like a potential bug?  Like the first * is cancelling out the
logic for the second * ?



On 3/21/07, Erik Hatcher  [EMAIL PROTECTED] wrote:

 Lucene now supports *456* type queries, however it requires setting
 an attribute to allow leading wildcards on the QueryParser.  Solr
 does not set this flag (that I can tell in my quick search) so I
 don't believe you can do this with Solr currently, until/unless an
 option is made to set that flag.  However, I just tried with my
 dataset and I don't get parse errors from a *foo* query, but I don't
 get results either (strange, it seems).

 Erik

 On Mar 21, 2007, at 2:59 PM, Michael Kimsal wrote:

  Hello all:
 
  While I realize this goes against the grain of an indexing server,
  is there
  any way to do wildcard searching like the following:
 
  Term indexed is 123456789
 
  Searching for *456* would find 123456789
 
  Is there any mechanism to enable or allow for that scenario?
 
  Thanks!
 
 
  --
  Michael Kimsal
  http://webdevradio.com




--
Michael Kimsal
http://webdevradio.com





--
Michael Kimsal
http://webdevradio.com


Re: Wildcards

2007-03-21 Thread Michael Kimsal

Well, I recompiled SOLR against the latest lucene release (2.1.0) and it
still doesn't work.  The nabble reference page there indicates that it might
not have worked right in 2.1.0 but someone there is suggesting that it works
in the latest trunk.

Is there perhaps something else that would need to be enabled in SOLR beyong
the leadingWildCard to have this work?

Thanks for everyone's patience.


On 3/21/07, Michael Kimsal [EMAIL PROTECTED] wrote:


This looks like a lucene issue.

http://www.nabble.com/-jira--Created%3A-%28LUCENE-839%29-WildcardQuery-do-not-find-documents-if-leading-and-trailing-*-is-used-tf3435336.html


And it appears to have been fixed recently:

This problem was already fixed since 2.1.0. 

When was 2.1.0 out?  Oh - last month.  Will there be new SOLR package
bundles with the latest lucene?



On 3/21/07, Michael Kimsal [EMAIL PROTECTED] wrote:

 I changed the 'leading wildcard' setting in the query parser (well,
 actually someone else here did, but it works).

 *789 works

 but

 *456* still doesn't work.

 Yeah, I guess I'm seeing the same behaviour as you are.  Does
  this seem like a potential bug?  Like the first * is cancelling out the

 logic for the second * ?



 On 3/21/07, Erik Hatcher  [EMAIL PROTECTED] wrote:
 
  Lucene now supports *456* type queries, however it requires setting
  an attribute to allow leading wildcards on the QueryParser.  Solr
  does not set this flag (that I can tell in my quick search) so I
  don't believe you can do this with Solr currently, until/unless an
  option is made to set that flag.  However, I just tried with my
  dataset and I don't get parse errors from a *foo* query, but I don't
  get results either (strange, it seems).
 
  Erik
 
  On Mar 21, 2007, at 2:59 PM, Michael Kimsal wrote:
 
   Hello all:
  
   While I realize this goes against the grain of an indexing server,
   is there
   any way to do wildcard searching like the following:
  
   Term indexed is 123456789
  
   Searching for *456* would find 123456789
  
   Is there any mechanism to enable or allow for that scenario?
  
   Thanks!
  
  
   --
   Michael Kimsal
   http://webdevradio.com
 
 


 --
 Michael Kimsal
 http://webdevradio.com




--
Michael Kimsal
http://webdevradio.com





--
Michael Kimsal
http://webdevradio.com


Re: Spelling

2007-02-09 Thread Michael Kimsal


Any opinions on commenting out the stemmer in the default text field?
It might be less confusing to have a more intuitive example, while
easily showing the way to the more advanced analysis.




I'm in favor of that.  I imagine there's others like me that want to get
started with the defaults first, and having them be more useful for
'average' use cases would be helpful, with comments on how to do advanced
stuff left in.

Thanks!


--
Michael Kimsal
http://webdevradio.com


Re: Spelling

2007-02-06 Thread Michael Kimsal

This isn't something I use that approach on.  Let me explain.

I work in a call center, and I'm doing a search for specific key word in
customer notes every night.

For example, we might need a report of which customers called up about
apple, banana or pear.
I have a script which generates a report for the required key words, and the
report is mailed to
the appropriate staff for review/action.  The highlighting comes in to help
them quickly locate the problem
words.  But not being able to highlight the misspellings (bannana,
peaar, etc.) means that they
may overlook the particular entries when reviewing the email.

When you say rewrite the query, what specifically do you mean?  I'm googling
(direct and on the solr site)
for query.rewrite, but nothing is jumping out at me as anything that's
useful/pertinent.  It sounds like
you're telling me to do some manipulation on the query first, but I'm
currently
just passing queries as part of the GET string in an HTTP request (this was
my main
attraction to SOLR in the first place)  Is there a way to trigger the
'rewrite' functionality via
another GET parameter?

Thanks all!

On 2/6/07, karl wettin [EMAIL PROTECTED] wrote:



6 feb 2007 kl. 04.19 skrev Michael Kimsal:

 Thanks Erik.  That worked, then threw me for another loop, which I
 sort of
 have fixed I think.

 I'm using the highligher functionality, but it doesn't seem to
 highlight the
 'matched' word if it's a partial match, although it does in fact
 return that
 record.  Am I missing something obvious here, or is highlighting of
 partial
 matches not supported?

You need to rewrite the query. See Query.rewrite.

(I think that's it.)


But,

fuzzy queries are sort of slow, at least compared to many other things.
Depending on your server load and corpus size, perhaps I would
recommend you
using some sort of did you mean- functionallity rather than fuzzy
queries.


--
karl





--
Michael Kimsal
http://webdevradio.com


Re: Date ranges

2007-02-03 Thread Michael Kimsal

Thanks Hoss - I'll give that a try - intuitively that sounds like it'll work
(I'm still new to this - it's not second nature to me just yet!)

On 2/3/07, Chris Hostetter [EMAIL PROTECTED] wrote:



: However, when I run the following search
: foobar date:[2005-08-01T00:00:00Z TO 2005-08-01T23:59:59Z]

: I get values back that do not have a date value in the 08/01/2005 range.

unless you changed somethine else to mkae queries default to all clauses
mandatroy (aka: and AND query) that's searching for anythign mathcing
foobar, or anything in that date range)

try this...

+foobar +date:[2005-08-01T00:00:00Z TO 2005-08-01T23:59:59Z]

: Does anyone have any clues/pointers to help me debug this?

adding debugQuery=1 to any URL will help you see exactly what query is
being used, and show you an explanation of why each document matched.

:
: Thanks!
:



-Hoss





--
Michael Kimsal
http://webdevradio.com


Date ranges

2007-02-02 Thread Michael Kimsal

I'm having a devil of a time getting date seaching to work properly.  I've
created a 'date' field in my schema, and I put values like
2005-08-01T23:59:59Z in it.

However, when I run the following search
foobar date:[2005-08-01T00:00:00Z TO 2005-08-01T23:59:59Z]


I get values back that do not have a date value in the 08/01/2005 range.

Does anyone have any clues/pointers to help me debug this?

Thanks!


possible FAQ - lucene interop

2007-01-17 Thread Michael Kimsal

Hello all:

We've got one java-based project at work using lucene.  I'm looking to use
solr as a search system for some other projects at work.  Once data is
indexed in solr, can we get at it using standard lucene libraries?  I know
how I want to use solr, but if the java devs need to get at the data as
well, I'd rather that 1) they be able to use their existing tech and skills
and 2) I not have to reindex everything in lucene-only indexes.

I've read the FAQs and some of the mailing list and couldn't find this
question addressed.

Thanks.

--
Michael Kimsal
http://webdevradio.com