Solr Admin Schema Browser and field named keywords

2010-08-23 Thread Shawn Heisey
 I have a field named keywords in my index.  The schema browser page 
is not able to deal with this, so I have trouble getting statistical 
information on this field.  When I click on the field, Firefox hangs for 
a minute and then gives the unresponsive script warning.  I assume 
(without actually checking) that this is due to keywords being already 
used for something in the javascript code.


Is this already a known problem, or should I create a Jira?

Related to this, would it be difficult to make this feature display 
something like a status bar when it is first grabbing information, 
indicating how many fields there are and which one it's working on at 
the moment?  It takes a few minutes for it to load on my indexes, so 
some indication of how far along it is would be very nice.


Shawn



How to do Spatial Search with Solr?

2010-08-23 Thread Savannah Beckett
Hi,
  I am using nutch to do the crawling and solr to do the searching.  The index 
has City and State.  I want to able to get all nearby cities by entering city 
name.  e.g. when I type New York, I want to get the following as facet:

New York, NY (1905) 
Brooklyn, NY (89) 
Jersey City, NJ (55) 
New York City, NY (34) 
Montclair, NJ (25) 

How do I do that?  More importantly, where do I get all the latitute and 
longitude data for all cities?  

Thanks.



  

Re: How to do Spatial Search with Solr?

2010-08-23 Thread Mattmann, Chris A (388J)
Hi Savannah,

Check out the patches I just threw up for SOLR-2073, SOLR-2074, SOLR-2075, 
SOLR-2076 and SOLR-2077.

There's code in there to deal with Geonames.org data. There's more patches 
coming so hopefully it will be clearer as I add them. Thanks to W. Quach for 
leading the charge on these patches!

Cheers,
Chris


On 8/22/10 11:21 PM, Savannah Beckett savannah_becket...@yahoo.com wrote:

Hi,
  I am using nutch to do the crawling and solr to do the searching.  The index
has City and State.  I want to able to get all nearby cities by entering city
name.  e.g. when I type New York, I want to get the following as facet:

New York, NY (1905)
Brooklyn, NY (89)
Jersey City, NJ (55)
New York City, NY (34)
Montclair, NJ (25)

How do I do that?  More importantly, where do I get all the latitute and
longitude data for all cities?

Thanks.







++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Solr Admin Schema Browser and field named keywords

2010-08-23 Thread Shawn Heisey

 On 8/23/2010 12:07 AM, Shawn Heisey wrote:
 I have a field named keywords in my index.  The schema browser page 
is not able to deal with this, so I have trouble getting statistical 
information on this field.  When I click on the field, Firefox hangs 
for a minute and then gives the unresponsive script warning.  I 
assume (without actually checking) that this is due to keywords 
being already used for something in the javascript code.




I've just looked over the javascript sent to my browser, and do not see 
anything even close to keywords in it.  I also did not see it in the 
old version of jQuery that it loads.  I also looked through the 
branch_3x source code with the following command, and did not find 
anything that actually looked like a problem:


grep -irl keywords * | grep -v svn

This has been a problem for me the entire time I've used Solr.  I 
started with a 1.5-dev version, my production is now completely stock 
1.4.1, and I've been doing all my tests today on a 3.1 build from 
2010-06-29.  The code tree that I grepped is from 2010-08-13.


If there's any troubleshooting that someone needs done on my end, let me 
know.


Thanks,
Shawn



help refactoring from 3.x to 4.x

2010-08-23 Thread Ryan McKinley
I have a function that works well in 3.x, but when I tried to
re-implement in 4.x it runs very very slow (~20ms vs 45s on an index w
~100K items).

Big picture, I am trying to calculate a bounding box for items that
match the query.  To calculate this, I have two fields bboxNS, and
bboxEW that get filled with the min and max values for that doc.  To
get the bounding box, I just need the first matching term in the index
and the last matching term.

In 3.x the code looked like this:

public class FirstLastMatchingTerm
{
  String first = null;
  String last = null;

  public static FirstLastMatchingTerm read(SolrIndexSearcher searcher,
String field, DocSet docs) throws IOException
  {
FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm();
if( docs.size()  0 ) {
  IndexReader reader = searcher.getReader();
  TermEnum te = reader.terms(new Term(field,));
  do {
Term t = te.term();
if( null == t || !t.field().equals(field) ) {
  break;
}

if( searcher.numDocs(new TermQuery(t), docs)  0 ) {
  firstLast.last = t.text();
  if( firstLast.first == null ) {
firstLast.first = firstLast.last;
  }
}
  }
  while( te.next() );
}
return firstLast;
  }
}


In 4.x, I tried:

public class FirstLastMatchingTerm
{
  String first = null;
  String last = null;

  public static FirstLastMatchingTerm read(SolrIndexSearcher searcher,
String field, DocSet docs) throws IOException
  {
FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm();
if( docs.size()  0 ) {
  IndexReader reader = searcher.getReader();

  Terms terms = MultiFields.getTerms(reader, field);
  TermsEnum te = terms.iterator();
  BytesRef term = te.next();
  while( term != null ) {
if( searcher.numDocs(new TermQuery(new Term(field,term)), docs)  0 ) {
  firstLast.last = term.utf8ToString();
  if( firstLast.first == null ) {
firstLast.first = firstLast.last;
  }
}
term = te.next();
  }
}
return firstLast;
  }
}

but the results are slow (and incorrect).  I tried some variations of
using ReaderUtil.Gather(), but the real hit seems to come from
  if( searcher.numDocs(new TermQuery(new Term(field,term)), docs)  0 )

Any ideas?  I'm not tied to the approach or indexing strategy, so if
anyone has other suggestions that would be great.  Looking at it
again, it seems crazy that you have to run a query for each term, but
in 3.x

thanks
ryan


possible to have multiple elevation file?

2010-08-23 Thread Chamnap Chhorn
Hi,

I need multiple elevation file for each site (around 200). I think one big
elevation file is difficult to manage. How could I manage each elevation
file differently?

Thanks
-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: Autosuggest on PART of cityname

2010-08-23 Thread gwk

 On 8/20/2010 7:04 PM, PeterKerk wrote:

@Markus: thanks, will try to work with that.

@Gijs: I've looked at the site and the search function on your homepage is
EXACTLY what I need! Do you have some Solr code samples for me to study
perhaps? (I just need the relevant fields in the schema.xml and the query
url) It would help me a lot! :)

Thanks to you both!

The fields in our schema are:
field name=id type=string indexed=true stored=true 
required=true /

- Just an id based on type, depth and a number, not important
field name=type type=string indexed=true stored=true 
required=true /
- This is either buy or rent as our sections have separate 
autocompleters

field name=depth type=string indexed=true stored=true /
- Since you can search by country, region or city, this stores 
the type of this document (well, since we use geonames.org geographical 
data we actually have 4 regions)

field name=name type=text indexed=true stored=true /
- The canonical name of the country/region/city
dynamicField name=name_* type=text indexed=true stored=true /
- The name of the country/region/city in various languages
field name=parent type=text indexed=true stored=true /
- The name of the country/region/city with any of it's parents 
comma separated, this is used for phrase searches so if you enter 
Amsterdam, Netherlands the dutch Amsterdam will match before any of 
the Amsterdams in other countries.

dynamicField name=parent_* type=text indexed=true stored=true /
- The same as parent but in different languages
field name=data type=string indexed=false stored=true /
- This is some internal data used to create the correct filters 
when this particular suggestion is selected

dynamicField name=data_* type=text indexed=true stored=true /
- The same as parent but in different languages, as our filters 
are on the actual name of countries/regions/cities

field name=count type=tint indexed=true stored=true /
- The number of documents, i.e. the number on the right of the 
suggestions

field name=names type=text indexed=true multiValued=true /
- Multivalued field which is copyfield-ed from name and name_*
field name=parents type=text indexed=true multiValued=true /
- Multivalued field which is copyfield-ed from parent and parent_*

Where text is
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=1 
maxGramSize=30/

/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType


Our autocompletion requests are dismax request where the most important 
parameters are:

- q=the text the user has entered into the searchbox so far
- fq=type:sale (or rent)
- qf=name_lang^4 name^4 names (Where lang is the currently selected 
language on the website)

- pf=name_lang^4 name^4 names parents

Honestly, those parameters are basically just tweaked without quite 
understanding their meaning until I got something that worked 
adequately. Hope this helps.


Regards,

gwk


Re: possible to have multiple elevation file?

2010-08-23 Thread Chamnap Chhorn
Hi,

Here, I talk about
QueryElevationComponenthttp://wiki.apache.org/solr/QueryElevationComponent?action=fullsearchcontext=180value=linkto%3A%22QueryElevationComponent%22
.
Anyone has some idea?

Thanks

On Mon, Aug 23, 2010 at 3:10 PM, Chamnap Chhorn chamnapchh...@gmail.comwrote:

 Hi,

 I need multiple elevation file for each site (around 200). I think one big
 elevation file is difficult to manage. How could I manage each elevation
 file differently?

 Thanks
 --
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/




-- 
Chhorn Chamnap
http://chamnapchhorn.blogspot.com/


Re: Proper Escaping of Ampersands

2010-08-23 Thread Nikolas Tautenhahn
Hi Yonik,

I got it working, but I think the Stopword Filter is not behaving as
expected - (The document could be found when I disabled the stopword
filter, details later in this mail...)

On 20.08.2010 16:57, Yonik Seeley wrote
 On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn
 nik_s...@livinglogic.de wrote:
 But when I search for q=at%26s (=ats), I get nothing.
 
 That's the correct encoding if you're typing it directly into a
 browser address box.
 http://localhost:8983/solr/select?defType=dismaxqf=textq=at%26sdebugQuery=true
 
 But you should be able to verify that solr is getting the correct
 query string by checking out params in the response (in the example
 server, by default they are echoed back).  And adding debugQuery=true
 to the request should show you exactly what query is being generated.
 
 But the real issue likely lies with your fieldType definition.  Can
 you show that?

As I (normally) query multiple fields, I changed my request URL to
http://127.0.0.1:8983/solr/select?q=at%26sfl=titelqt=dismaxqf=titeldebugQuery=truefl=*qt=dismaxqf=titeldebugQuery=true
in order to narrow it down and got this response (cut to, as I think,
relevant stuff)

 str name=rawquerystringats/str
 str name=querystringats/str
 str name=parsedquery+DisjunctionMaxQuery((titel:(ats at) s)~0.1) 
 ()/str
 str name=parsedquery_toString+(titel:(ats at) s)~0.1 ()/str
 lst name=explain/
 str name=QParserDisMaxQParser/str

on my local debugging instance, using standard dismax config (from the
examples directory at solr).

The titel-Field is configured like this:

   field name=titel type=textgen indexed=true stored=true/

and textgen is configured like this

 fieldType name=textgen class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.HTMLStripStandardTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=0 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=0 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

The document is indexed correctly, a search for at s found it and all
fields looked great (ats and not for example, atamp;s).

As my stopword list does not contain at or  or amp;, I don't
quite understand, why my result is found, when I disable the
stopword-list. My stopwordlist can be found here

http://pastebin.com/RfLuBHqd

Do you happen to see bad things for a string like ats here?

The analysis page in the admin panel tells me, these steps for the Index
Analyzer:

(HTMLStripStandardTokenizer) ats = ats
(SynonymFilter) ats = ats
(WordDelimiterFilter) ats = term position 1: ats, at; term pos 2: s, ats
(LowerCaseFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: s, ats
(StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats

So, according to this, it should be found even with my stopwords enabled...


best regards and thanks for your response,
Nikolas Tautenhahn


Re: help refactoring from 3.x to 4.x

2010-08-23 Thread Michael McCandless
Spooky that you see incorrect results!  The code looks correct.  What
are the specifics on when it produces an invalid result?

Also spooky that you see it running slower -- how much slower?  Did
you rebuild the index in 4.x (if not, you are using the preflex
codec)?  And is the index otherwise identical?

You could improve perf by not using SolrIndexSearcher.numDocs?  Ie you
don't need the count; you just need to know if it's  0.  So you could
make your own loop that breaks out on the first docID in common.  You
could also stick w/ BytesRef the whole time (only do .utf8ToString()
in the end on the first/last), though this is presumably a net/nets
tiny cost.

But, we should still dig down on why numDocs is slower in 4.x; that's
unexpected; Yonik any ideas?  I'm not familiar with this part of
Solr...

Mike

On Mon, Aug 23, 2010 at 2:38 AM, Ryan McKinley ryan...@gmail.com wrote:
 I have a function that works well in 3.x, but when I tried to
 re-implement in 4.x it runs very very slow (~20ms vs 45s on an index w
 ~100K items).

 Big picture, I am trying to calculate a bounding box for items that
 match the query.  To calculate this, I have two fields bboxNS, and
 bboxEW that get filled with the min and max values for that doc.  To
 get the bounding box, I just need the first matching term in the index
 and the last matching term.

 In 3.x the code looked like this:

 public class FirstLastMatchingTerm
 {
  String first = null;
  String last = null;

  public static FirstLastMatchingTerm read(SolrIndexSearcher searcher,
 String field, DocSet docs) throws IOException
  {
    FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm();
    if( docs.size()  0 ) {
      IndexReader reader = searcher.getReader();
      TermEnum te = reader.terms(new Term(field,));
      do {
        Term t = te.term();
        if( null == t || !t.field().equals(field) ) {
          break;
        }

        if( searcher.numDocs(new TermQuery(t), docs)  0 ) {
          firstLast.last = t.text();
          if( firstLast.first == null ) {
            firstLast.first = firstLast.last;
          }
        }
      }
      while( te.next() );
    }
    return firstLast;
  }
 }


 In 4.x, I tried:

 public class FirstLastMatchingTerm
 {
  String first = null;
  String last = null;

  public static FirstLastMatchingTerm read(SolrIndexSearcher searcher,
 String field, DocSet docs) throws IOException
  {
    FirstLastMatchingTerm firstLast = new FirstLastMatchingTerm();
    if( docs.size()  0 ) {
      IndexReader reader = searcher.getReader();

      Terms terms = MultiFields.getTerms(reader, field);
      TermsEnum te = terms.iterator();
      BytesRef term = te.next();
      while( term != null ) {
        if( searcher.numDocs(new TermQuery(new Term(field,term)), docs)  0 ) {
          firstLast.last = term.utf8ToString();
          if( firstLast.first == null ) {
            firstLast.first = firstLast.last;
          }
        }
        term = te.next();
      }
    }
    return firstLast;
  }
 }

 but the results are slow (and incorrect).  I tried some variations of
 using ReaderUtil.Gather(), but the real hit seems to come from
  if( searcher.numDocs(new TermQuery(new Term(field,term)), docs)  0 )

 Any ideas?  I'm not tied to the approach or indexing strategy, so if
 anyone has other suggestions that would be great.  Looking at it
 again, it seems crazy that you have to run a query for each term, but
 in 3.x

 thanks
 ryan



Re: How to Debug Sol-Code in Eclipse ?!

2010-08-23 Thread stockii

can nobody help me or want :D
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1288705.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to get most indexed keyword from SOLR

2010-08-23 Thread Grijesh.singh

Hi Pawan,

If u r using solr1.4 or latter version then u can see terms info by using
terms request handler like

http://localhost:8080/solr/terms/?terms.fl=textterms.sort=count
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-most-indexed-keyword-from-SOLR-tp1240552p1289084.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Debug Sol-Code in Eclipse ?!

2010-08-23 Thread Bernd Fehling


 
 can nobody help me or want :D

As already someone said:
- install Eclipse
- add Jetty Webapp Plugin to Eclipse
- add svn plugin to Eclipse
- download with svn the repository from trunk
- change to lucene dir and run ant package
- change to solr dir and run ant dist
- setup with Run configure... a Jetty Webapp for solr
- start debugging :-)

If debugging below solr level into lucene level just add
lucene src path to debugging source.

May be you should read:
http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse

Regards,
Bernd


Re: How to Debug Sol-Code in Eclipse ?!

2010-08-23 Thread Drew Farris
On Sun, Aug 22, 2010 at 8:29 PM, stockii st...@shopgate.com wrote:

 okay, thx. but it want work =(

 i checkout solr1.4.1 as dynamic web project into eclipse. startet jetty with
 XDebug. In eclpise i add WebLogic exactly how the tutorial shows but eclipse
 cannot connect =(

 any idea what im doing wrong ?

No idea. Check your arguments and verify that the port is the same on
both the command-line you're using to start jetty and in the eclipse
remote debugger configuration. Make sure you don't have a firewall
running on your machine because that might block the connection from
eclipse to jetty.


Re: Doing Shingle but also keep special single word

2010-08-23 Thread Ahmet Arslan
 1. We have over ten million news articles to build into
 Solr index.
 2. We copy several fields, such as title, author, body,
 caption of attahed photos into a new field for default
 search.
 3. We then wanna use shingle filter on this new field.
 4. We can't predict what new single-word noun that our
 users may be interesting cause it's news, you know. For
 exmple, the word ECFA is only very popular word in news
 here recently, so I wish users can type in 'ECFA' to search
 and Solr will output see some relevant news articles.
 5. I wish to keep index as smaller as possible.
 6. I also wish to do same thing descirbed in 5 when I
 search by explicitly specifyng field name of those fields,
 too.

Can i ask why do you need/use shingle filter?


  


Re: SolrException log

2010-08-23 Thread Tommaso Teofili
Hi Bastian,
this seems to be related to IO and file deletion (optimization compacts and
removes index files), are you running Solr on NFS or a distributed file
system?
You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in
solrconfig.xml to handle this.
My 2 cents,
Tommaso

2010/8/11 Bastian Spitzer bspit...@magix.net

 Hi,

 we are using solr 1.4.1 in a master-slave setup with replication,
 requests are loadbalanced to both instances. this is just working fine,
 but the slave
 behaves strange sometimes with a SolrException log (trace below). We
 are using 1.4.1 for weeks now, and this has happened only a few times
 so far, and it only occured on the Slave. The Problem seemed to be gone
 when we added a cron-job to send a periodic optimize/ (once a day)
 to the master, but today it did happen again. The Index contains 55
 files right now, after optimize there are only 10. So it seems its a
 problem when
 the index is spread among a lot files. The Slave wont ever recover once
 this Exception shows up, the only thing that helps is a restart.

 Is this a known issue? Only workaround would be to track the
 commit-counts and send additional optimize/ requests after a certain
 amount of
 commits, but id prefer solving this problem rather than building a
 workaround..

 Any hints/thoughts on this issue are verry much appreciated, thanks in
 advance for your help.

 cheers Bastian.

 Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=media_id,keyword_1004sort=priority_1000+desc,+score+descind
 ent=offstart=0q=mandant_id:1000+AND+partner_id:1000+AND+active_1000:tr
 ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004rows=24version=
 2.2} status=500 QTime=2
 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: read past EOF
at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.jav
 a:151)
at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.j
 ava:38)
at
 org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112)
at
 org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheI
 mpl.java:461)
at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:22
 4)
at
 org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at
 org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCacheI
 mpl.java:445)
at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:22
 4)
at
 org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at
 org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(Fie
 ldComparator.java:332)
at
 org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringColl
 ector.setNextReader(TopFieldCollector.java:435)
at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.
 java:988)
at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j
 ava:884)
at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:3
 41)
at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
 java:182)
at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search
 Handler.java:195)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
 ase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
 va:338)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
 ava:241)
at
 org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(Web
 ApplicationHandler.java:821)
at
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationH
 andler.java:471)
at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1530)
at
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationCon
 text.java:633)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1482)
at org.mortbay.http.HttpServer.service(HttpServer.java:909)
at
 org.mortbay.http.HttpConnection.service(HttpConnection.java:820)
at
 org.mortbay.http.ajp.AJP13Connection.handleNext(AJP13Connection.java:295
 )
at
 org.mortbay.http.HttpConnection.handle(HttpConnection.java:837)
at
 org.mortbay.http.ajp.AJP13Listener.handleConnection(AJP13Listener.java:2
 12)
at
 

Re: Tokenising on Each Letter

2010-08-23 Thread Scottie

Probably a good idea to post the relevant information! I guess I thought it
would be a really obvious answer but it seems its a bit more complex ;)

field name=productsModel type=textTight indexed=true stored=true
omitNorms=true/

!-- Less flexible matching, but less false matches.  Probably not ideal
for product names,
 but may be good for SKUs.  Can insert dashes in the wrong place and
still match. --
fieldType name=textTight class=solr.TextField
positionIncrementGap=100 
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
!-- this filter can remove any duplicate tokens that appear at the
same position - sometimes
 possible with WordDelimiterFilter in conjuncton with stemming.
--
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

It seems you may be correct about the catenateAll option, but I'm not sure
if adding in a wildcard at the end of every search would be a great idea?
This is meant to be applied to a general search box, but still retain
flexibility for model numbers. Right now, we are using mySQL % % wildcards
so it matches pretty much anything on the model number, whether you cut off
the start or the end etc, and I wanted to retain that.

Could you elaborate about N gram for me, based on my schema?

The main reason I picked TextTight was for model numbers like
EQW-500DBE-1AVER etc, I thought it would produce better results?

Thanks a lot for the detailed reply.

Scott
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1291984.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Debug Sol-Code in Eclipse ?!

2010-08-23 Thread stockii

ant package  

BUILD FAILED run program perl ...

it`s necessary to install perl on my computer ?!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1291992.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tokenising on Each Letter

2010-08-23 Thread Nikolas Tautenhahn
Hi Scottie,

 Could you elaborate about N gram for me, based on my schema?

just a quick reply:

 fieldType name=textNGram class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory 
 synonyms=index_synonyms.txt ignoreCase=true expand=false/ --
 
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EdgeNGramFilterFactory side=front 
 minGramSize=2 maxGramSize=30 /
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/
   filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

Will produce any NGrams from 2 up to 30 Characters, for Info check
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

Be sure to adjust those sizes (minGramSize/maxGramSize) so that
maxGramSize is big enough to keep the whole original serial number/model
number and minGramSize is not so small that you fill your index with
useless information.

Best regards,
Nikolas Tautenhahn




Re: SolrException log

2010-08-23 Thread Bastian Spitzer
Hi Tommaso,

Thanks for your Reply. The Solr Files are on local disk, on a reiserfs. I'll 
try to set a Deletion Policy and report back if
that solved the problem, thank you for the hint.

cheers,
Bastian

-Ursprüngliche Nachricht-
Von: Tommaso Teofili [mailto:tommaso.teof...@gmail.com] 
Gesendet: Montag, 23. August 2010 15:31
An: solr-user@lucene.apache.org
Betreff: Re: SolrException log

Hi Bastian,
this seems to be related to IO and file deletion (optimization compacts and 
removes index files), are you running Solr on NFS or a distributed file system?
You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in 
solrconfig.xml to handle this.
My 2 cents,
Tommaso

2010/8/11 Bastian Spitzer bspit...@magix.net

 Hi,

 we are using solr 1.4.1 in a master-slave setup with replication, 
 requests are loadbalanced to both instances. this is just working 
 fine, but the slave behaves strange sometimes with a SolrException 
 log (trace below). We are using 1.4.1 for weeks now, and this has 
 happened only a few times so far, and it only occured on the Slave. 
 The Problem seemed to be gone when we added a cron-job to send a 
 periodic optimize/ (once a day) to the master, but today it did 
 happen again. The Index contains 55 files right now, after optimize 
 there are only 10. So it seems its a problem when the index is spread 
 among a lot files. The Slave wont ever recover once this Exception 
 shows up, the only thing that helps is a restart.

 Is this a known issue? Only workaround would be to track the 
 commit-counts and send additional optimize/ requests after a certain 
 amount of commits, but id prefer solving this problem rather than 
 building a workaround..

 Any hints/thoughts on this issue are verry much appreciated, thanks in 
 advance for your help.

 cheers Bastian.

 Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=media_id,keyword_1004sort=priority_1000+desc,+score+desci
 nd 
 ent=offstart=0q=mandant_id:1000+AND+partner_id:1000+AND+active_1000:
 tr
 ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004rows=24versio
 ue+AND+n=
 2.2} status=500 QTime=2
 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: read past EOF
at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.j
 av
 a:151)
at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput
 .j
 ava:38)
at
 org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112)
at
 org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach
 eI
 mpl.java:461)
at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:
 22
 4)
at
 org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at
 org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach
 eI
 mpl.java:445)
at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:
 22
 4)
at
 org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at
 org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(F
 ie
 ldComparator.java:332)
at
 org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCo
 ll
 ector.setNextReader(TopFieldCollector.java:435)
at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.
 java:988)
at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher
 .j
 ava:884)
at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java
 :3
 41)
at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.
 java:182)
at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear
 ch
 Handler.java:195)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
 rB
 ase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
 ja
 va:338)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
 .j
 ava:241)
at
 org.mortbay.jetty.servlet.WebApplicationHandler$CachedChain.doFilter(W
 eb
 ApplicationHandler.java:821)
at
 org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicatio
 nH
 andler.java:471)
at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1530)
at
 org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationC
 on
 text.java:633)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1482)
  

Re: SolrException log

2010-08-23 Thread Bastian Spitzer
 I dont seem to find a decent documentation on how those  parameters actually 
work.

this is the default, example block:

deletionPolicy class=solr.SolrDeletionPolicy
  !-- The number of commit points to be kept --
  str name=maxCommitsToKeep1/str
  !-- The number of optimized commit points to be kept --
  str name=maxOptimizedCommitsToKeep0/str
  !--
  Delete all commit points once they have reached the given age.
  Supports DateMathParser syntax e.g.
  
  str name=maxCommitAge30MINUTES/str
  str name=maxCommitAge1DAY/str
  --
/deletionPolicy

so do i have to increase the maxCommitsToKeep to a value of 2 when i add a 
maxCommitAge Parameter? Or will 1 still be enough? Do i have to
call optimize more than once a day when i add maxOptimizedCommitsToKeep with a 
value of 1?

can some1 please explain how this is supposed to work?

-Ursprüngliche Nachricht-
Von: Bastian Spitzer [mailto:bspit...@magix.net] 
Gesendet: Montag, 23. August 2010 16:40
An: solr-user@lucene.apache.org
Betreff: Re: SolrException log

Hi Tommaso,

Thanks for your Reply. The Solr Files are on local disk, on a reiserfs. I'll 
try to set a Deletion Policy and report back if that solved the problem, thank 
you for the hint.

cheers,
Bastian

-Ursprüngliche Nachricht-
Von: Tommaso Teofili [mailto:tommaso.teof...@gmail.com]
Gesendet: Montag, 23. August 2010 15:31
An: solr-user@lucene.apache.org
Betreff: Re: SolrException log

Hi Bastian,
this seems to be related to IO and file deletion (optimization compacts and 
removes index files), are you running Solr on NFS or a distributed file system?
You could set a propert IndexDeletionPolicy (SolrDeletionPolicy) in 
solrconfig.xml to handle this.
My 2 cents,
Tommaso

2010/8/11 Bastian Spitzer bspit...@magix.net

 Hi,

 we are using solr 1.4.1 in a master-slave setup with replication, 
 requests are loadbalanced to both instances. this is just working 
 fine, but the slave behaves strange sometimes with a SolrException 
 log (trace below). We are using 1.4.1 for weeks now, and this has 
 happened only a few times so far, and it only occured on the Slave.
 The Problem seemed to be gone when we added a cron-job to send a 
 periodic optimize/ (once a day) to the master, but today it did 
 happen again. The Index contains 55 files right now, after optimize 
 there are only 10. So it seems its a problem when the index is spread 
 among a lot files. The Slave wont ever recover once this Exception 
 shows up, the only thing that helps is a restart.

 Is this a known issue? Only workaround would be to track the 
 commit-counts and send additional optimize/ requests after a certain 
 amount of commits, but id prefer solving this problem rather than 
 building a workaround..

 Any hints/thoughts on this issue are verry much appreciated, thanks in 
 advance for your help.

 cheers Bastian.

 Aug 11, 2010 4:51:58 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=media_id,keyword_1004sort=priority_1000+desc,+score+desci
 nd
 ent=offstart=0q=mandant_id:1000+AND+partner_id:1000+AND+active_1000:
 tr
 ue+AND+cat_id_path_1000:7231/7258*+AND+language_id:1004rows=24versio
 ue+AND+n=
 2.2} status=500 QTime=2
 Aug 11, 2010 4:51:58 PM org.apache.solr.common.SolrException log
 SEVERE: java.io.IOException: read past EOF
at
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.j
 av
 a:151)
at
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput
 .j
 ava:38)
at
 org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
at
 org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:112)
at
 org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach
 eI
 mpl.java:461)
at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:
 22
 4)
at
 org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at
 org.apache.lucene.search.FieldCacheImpl$IntCache.createValue(FieldCach
 eI
 mpl.java:445)
at
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:
 22
 4)
at
 org.apache.lucene.search.FieldCacheImpl.getInts(FieldCacheImpl.java:430)
at
 org.apache.lucene.search.FieldComparator$IntComparator.setNextReader(F
 ie
 ldComparator.java:332)
at
 org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCo
 ll
 ector.setNextReader(TopFieldCollector.java:435)
at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:249)
at org.apache.lucene.search.Searcher.search(Searcher.java:171)
at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.
 java:988)
at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher
 .j
 ava:884)
at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java
 :3
 41)
at
 

Re: Tokenising on Each Letter

2010-08-23 Thread Scottie

Nikolas, thanks a lot for that, I've just gave it a quick test and it 
definitely seems to work for the examples I've gave.

Thanks again,

Scott


From: Nikolas Tautenhahn [via Lucene] 
Sent: Monday, August 23, 2010 3:14 PM
To: Scottie 
Subject: Re: Tokenising on Each Letter


Hi Scottie, 

 Could you elaborate about N gram for me, based on my schema? 

just a quick reply: 


 fieldType name=textNGram class=solr.TextField 
 positionIncrementGap=100 
   analyzer type=index 
 tokenizer class=solr.WhitespaceTokenizerFactory/ 
 !-- in this example, we will only use synonyms at query time 
 filter class=solr.SynonymFilterFactory 
 synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- 
 
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ 
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.EdgeNGramFilterFactory side=front minGramSize=2 
 maxGramSize=30 / 
 filter class=solr.RemoveDuplicatesTokenFilterFactory/ 
   /analyzer 
   analyzer type=query 
 tokenizer class=solr.WhitespaceTokenizerFactory/ 
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/ 
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 splitOnNumerics=0 preserveOriginal=1/ 
 filter class=solr.LowerCaseFilterFactory/ 
 filter class=solr.RemoveDuplicatesTokenFilterFactory/ 
   /analyzer 
 /fieldType 

Will produce any NGrams from 2 up to 30 Characters, for Info check 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

Be sure to adjust those sizes (minGramSize/maxGramSize) so that 
maxGramSize is big enough to keep the whole original serial number/model 
number and minGramSize is not so small that you fill your index with 
useless information. 

Best regards, 
Nikolas Tautenhahn 







View message @ 
http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1292238.html
 
To unsubscribe from Tokenising on Each Letter, click here. 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1294586.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to Debug Sol-Code in Eclipse ?!

2010-08-23 Thread stockii

thx, for your help. now it works fine. its very simple when you kno how :D
haha

i try bernds suggest =) 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Debug-Sol-Code-in-Eclipse-tp1262050p1296175.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Proper Escaping of Ampersands

2010-08-23 Thread Chris Hostetter
: The document is indexed correctly, a search for at s found it and all
: fields looked great (ats and not for example, atamp;s).
: 
: As my stopword list does not contain at or  or amp;, I don't
: quite understand, why my result is found, when I disable the
: stopword-list. My stopwordlist can be found here
: 
: http://pastebin.com/RfLuBHqd
: 
: Do you happen to see bad things for a string like ats here?

s is in your stopwords file, which may be part of the problem (but i 
didn't look hard at your query string to verify that)

: The analysis page in the admin panel tells me, these steps for the Index
: Analyzer:
...
: (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats
: 
: So, according to this, it should be found even with my stopwords enabled...

Strange, based on the stopwords file you posted the s should definitely 
be getting removed at index time -- it would also get removed at query 
time, but because you have it *before* WDF at query time that wouldn't 
affect this query (even though it did affect the index)

There was a bug with analysis.jsp and stopwords recently, but that 
shouldn't have affected 1.4 (you are definitely using 1.4, correct?)

https://issues.apache.org/jira/browse/SOLR-2051






-Hoss



Re: Proper Escaping of Ampersands

2010-08-23 Thread Yonik Seeley
I'd recommend going back to the textgen field type as defined in the
example schema.
Your move of the StopFilter is what is causing the problem.
At index time, the s gets removed (because the StopFilter is now
after the WDF).
But a query of ats is transformed into at s (the s isn't removed
because StopFilter is before WDF for the query analyzer).  Since s
isn't in the index, no docs are found.

Also, I notice you're using preserveOriginal=1 - make sure you really
need that... it's normally only useful if you are doing wildcard
searches (for example at*).

-Yonik
http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8


On Mon, Aug 23, 2010 at 5:43 AM, Nikolas Tautenhahn
nik_s...@livinglogic.de wrote:
 Hi Yonik,

 I got it working, but I think the Stopword Filter is not behaving as
 expected - (The document could be found when I disabled the stopword
 filter, details later in this mail...)

 On 20.08.2010 16:57, Yonik Seeley wrote
 On Thu, Aug 19, 2010 at 11:33 AM, Nikolas Tautenhahn
 nik_s...@livinglogic.de wrote:
 But when I search for q=at%26s (=ats), I get nothing.

 That's the correct encoding if you're typing it directly into a
 browser address box.
 http://localhost:8983/solr/select?defType=dismaxqf=textq=at%26sdebugQuery=true

 But you should be able to verify that solr is getting the correct
 query string by checking out params in the response (in the example
 server, by default they are echoed back).  And adding debugQuery=true
 to the request should show you exactly what query is being generated.

 But the real issue likely lies with your fieldType definition.  Can
 you show that?

 As I (normally) query multiple fields, I changed my request URL to
 http://127.0.0.1:8983/solr/select?q=at%26sfl=titelqt=dismaxqf=titeldebugQuery=truefl=*qt=dismaxqf=titeldebugQuery=true
 in order to narrow it down and got this response (cut to, as I think,
 relevant stuff)

 str name=rawquerystringats/str
 str name=querystringats/str
 str name=parsedquery+DisjunctionMaxQuery((titel:(ats at) s)~0.1) 
 ()/str
 str name=parsedquery_toString+(titel:(ats at) s)~0.1 ()/str
 lst name=explain/
 str name=QParserDisMaxQParser/str

 on my local debugging instance, using standard dismax config (from the
 examples directory at solr).

 The titel-Field is configured like this:

   field name=titel type=textgen indexed=true stored=true/

 and textgen is configured like this

     fieldType name=textgen class=solr.TextField 
 positionIncrementGap=100
       analyzer type=index
         tokenizer class=solr.HTMLStripStandardTokenizerFactory/
       filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=false/
         filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0 
 preserveOriginal=1/
         filter class=solr.LowerCaseFilterFactory/
         filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
       /analyzer
       analyzer type=query
         tokenizer class=solr.WhitespaceTokenizerFactory/
         filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
         filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
         filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=0 
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 
 preserveOriginal=1/
         filter class=solr.LowerCaseFilterFactory/
       /analyzer
     /fieldType

 The document is indexed correctly, a search for at s found it and all
 fields looked great (ats and not for example, atamp;s).

 As my stopword list does not contain at or  or amp;, I don't
 quite understand, why my result is found, when I disable the
 stopword-list. My stopwordlist can be found here

 http://pastebin.com/RfLuBHqd

 Do you happen to see bad things for a string like ats here?

 The analysis page in the admin panel tells me, these steps for the Index
 Analyzer:

 (HTMLStripStandardTokenizer) ats = ats
 (SynonymFilter) ats = ats
 (WordDelimiterFilter) ats = term position 1: ats, at; term pos 2: s, ats
 (LowerCaseFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: s, ats
 (StopFilter) 1: ats, at; 2: s, ats = 1: ats, at; 2: ats

 So, according to this, it should be found even with my stopwords enabled...


 best regards and thanks for your response,
 Nikolas Tautenhahn



Re: Problem in setting the request writer in SolrJ (wiki page wrong?)

2010-08-23 Thread Ryan McKinley
Note that the 'setRequestWriter' is not part of the SolrServer API, it
is on the CommonsHttpSolrServer:
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.html#setRequestWriter%28org.apache.solr.client.solrj.request.RequestWriter%29

If you are using EmbeddedSolrServer, the params are not serialized via
RequestWriter, so you don't have any options there.

ryan


On Mon, Aug 23, 2010 at 9:24 AM, Constantijn Visinescu
baeli...@gmail.com wrote:
 Hello,

 I'm using an embedded solrserver in my Java webapp, but as far as i
 can tell it's defaulting to sending updates in XML, which seems like a
 huge waste compared to sending it in Java binary format.

 According to this page:
 http://wiki.apache.org/solr/Solrj#Setting_the_RequestWriter

 I'm supposed to be able to set the requestwriter like so:
 server.setRequestWriter(new BinaryRequestWriter());

 However this method doesn't seem to exists in the SolrServer class of
 SolrJ 1.4.1 ?

 How do i set it to process updates in the java binary format?

 Thanks in advance,
 Constantijn Visinescu

 P.S.
 I'm creating my SolrServer instance like this:
        private SolrServer solrServer;
        CoreContainer container = new CoreContainer.Initializer().initialize();
        solrServer = new EmbeddedSolrServer(container, );

 this solrServer wont let me set a request writer.



ANNOUNCE: Stump Hoss @ Lucene Revolution

2010-08-23 Thread Chris Hostetter


Hey everybody,

As you (hopefully) have heard by now, Lucid Imagination is sponsoring a 
Lucene/Solr conference in Boston about 6 weeks from now.  We've got a lot 
of really great speakers lined up to give some really interesting 
technical talks, so I offered to do something a little bit different.


I'm going to be in the hot seat for a Stump The Chump style session, 
where I'll be answering Solr questions live and unrehearsed...


http://bit.ly/stump-hoss

The goal is to really make me sweat and work hard to think of creative 
solutions to non-trivial problems on the spot -- like when I answer 
questions on the solr-user mailing list, except in a crowded room with 
hundreds of people staring at me and laughing.


But in order to be a success, we need your questions/problems/challenges!

If you had a tough situation with Solr that you managed to solve with a 
creative solution (or haven't solved yet) and are interesting to see what 
type of solution I might come up with under pressure, please email a 
description of your problem to st...@lucenerevolution.org -- More details 
online...


http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter

Even if you won't be able to make it to Boston, please send in any 
challenging problems you would be interested to see me tackle under the 
gun.  The session will be recorded, and the video will be posted online 
shortly after the conference has ended.  And if you can make it to Boston: 
all the more fun to watch live and in person (and maybe answer follow up 
questions)


In any case, it should be a very interesting session: folks will either 
get to learn a lot, or laugh at me a lot, or both.  (win/win/win)



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Doing Shingle but also keep special single word

2010-08-23 Thread MitchK

No, I mean that you use an additional field (indexed) for searching (i.e.
whitespace-tokenized, so every word - seperated by a whitespace - becomes to
a token .
So you have got two fields (shingle-token-field and single-token-field).
So you can search accross both fields.
This provides several benefits: i.e. you can boost the shingle-field at
query-time, since a match in a shingle-field would mean, that there matches
an exact phrase.

Additionally: You can search with single-word-queries as well as
multi-word-queries.
Furthermore you can apply synonyms to your single-token-field. 

If you want to keep your index as small as possible but as large as needed,
try to understand Lucene's similarity implementation to consider, whether
you can set the field option omitNorms=true or
omitTermFreqAndPositions=true. 
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/Similarity.html
Keep in mind what happens, if you omit one of those options.

A small example of the consequences of setting omitNorms = true;.
doc1: this is a short example doc
doc2: this is a longer example doc for presenting the effect of omitNorms

If you are searching for doc while omitNorms=false your response will look
like this:
doc1,
doc2
This is because the norm-value for doc1 is smaller as the norm-value for
doc2, because doc1 is shorter than doc2 (have a look at the provided link).

If omitNorms=true, the scores for both docs will be equal.

Kind regards,
- Mitch


scott chu wrote:
 
 I don't quite understand additional-field-way? Do you mean making another 
 field that stores special words particularly but no indexing for that
 field?
 
 Scott
 
 - Original Message - 
 From: MitchK mitc...@web.de
 To: solr-user@lucene.apache.org
 Sent: Sunday, August 22, 2010 11:48 PM
 Subject: Re: Doing Shingle but also keep special single word
 
 

 Hi,

 keepword-filter is no solution for this problem, since this would lead to
 the problematic that one has to manage a word-dictionary. As explained, 
 this
 would lead to too much effort.

 You can easily add outputUnigrams=true and check out the analysis.jsp for
 this field. So you can see how much bigger a single field will become
 with
 this option.
 However, I am quite sure that the difference between using
 outputUnigrams=true and indexing in a seperate field is not noteworthy.

 I would suggest you to do it the additionally-field-way, since this would
 lead to more flexibility in boosting the different fields.

 Unfortunately, I haven't understood your explanation about the use-case. 
 But
 it sounds a little bit like tagging?

 Kind regards,
 - Mitch


 iorixxx wrote:

 Isn't set outputUnigrams=true will
 make index size about twice than when it's set to false?

 Sure index will be bigger. I didn't know that this is problem for you. 
 But
 if you have a list of special single words that you want to keep,
 keepwordfilter can eliminate other tokens. So index size will be okey.


 Scott

 - Original Message - From: Ahmet Arslan iori...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Saturday, August 21, 2010 1:15 AM
 Subject: Re: Doing Shingle but also keep special single
 word


  I am building index with Shingle
  filter. We know it's minimum 2-gram but I also
 want keep
  some special single word, e.g. IBM, Microsoft,
 etc. i.e. I
  want to do a minimum 2-gram but also want to have
 these
  single word in my index, Is it possible?
 
  outputUnigrams=true parameter does not work for
 you?
 
  After that you can cast filter
 class=solr.KeepWordFilterFactory words=keepwords.txt
 ignoreCase=true/ with keepwords.txt=IBM, Microsoft.
 
 
 
 







 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 
 
 
 ¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10 
 14:35:00
 
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1300497.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj ContentStreamUpdateRequest Slow

2010-08-23 Thread Chris Hostetter

: ContentStreamUpdateRequest req = new
: ContentStreamUpdateRequest(/update/extract);
: 
: System.out.println(setting params...);
: req.setParam(stream.url, fileName);
: req.setParam(literal.content_id, solrId);

ContentStreamUpdateRequest exists so that you can stream content directly 
from the client to the server -- you aren't doing that, you are asking the 
server t ogo fetch the stream.url itself.

The NullPointerException happens because you've never called 
ContentStreamUpdateRequest.addFile or 
ContentStreamUpdateRequest.addContentStream so it gets into a state where 
it doesn't know what it's doing (admitedely the error message is less then 
ideal)

If you just use a plain old regular UpdateRequest (or even a 
QueryRequest) instead, your code works as written.

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: How to use synonms on a faceted field with multiple words

2010-08-23 Thread Chris Hostetter
: A quick and dirty work around using Solr 1.4 is to replace spaces in the 
synonm file with 
: some other character/pattern. I used ## (i.e. video = digital##media). Then 
add the 
: solr.PatternReplaceFilterFactory after the synonm filter to replace pattern 
with space. 
: This works, but I'd love to know if there is a better way.

A feature was added a little while back (i think by Koji) to let you 
specify a tokenizerFactory attribute when you declare a 
SynonymFilterFactory -- it's then used to parse the synonyms file.

i think it was included in 1.4, but i may be wrong.

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



minMergeDocs supported ?

2010-08-23 Thread stockii

Heya:

IS minMergeDocs SUPPORTED IN soLR ?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/minMergeDocs-supported-tp1302856p1302856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ANNOUNCE: Stump Hoss @ Lucene Revolution

2010-08-23 Thread Israel Ekpo
Chris,

I have a couple of questions I would like to through your way.

Is there a place where one can sign up for this.

Is sounds very interesting.

On Mon, Aug 23, 2010 at 4:49 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 Hey everybody,

 As you (hopefully) have heard by now, Lucid Imagination is sponsoring a
 Lucene/Solr conference in Boston about 6 weeks from now.  We've got a lot of
 really great speakers lined up to give some really interesting technical
 talks, so I offered to do something a little bit different.

 I'm going to be in the hot seat for a Stump The Chump style session,
 where I'll be answering Solr questions live and unrehearsed...

http://bit.ly/stump-hoss

 The goal is to really make me sweat and work hard to think of creative
 solutions to non-trivial problems on the spot -- like when I answer
 questions on the solr-user mailing list, except in a crowded room with
 hundreds of people staring at me and laughing.

 But in order to be a success, we need your questions/problems/challenges!

 If you had a tough situation with Solr that you managed to solve with a
 creative solution (or haven't solved yet) and are interesting to see what
 type of solution I might come up with under pressure, please email a
 description of your problem to st...@lucenerevolution.org -- More details
 online...

 http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter

 Even if you won't be able to make it to Boston, please send in any
 challenging problems you would be interested to see me tackle under the gun.
  The session will be recorded, and the video will be posted online shortly
 after the conference has ended.  And if you can make it to Boston: all the
 more fun to watch live and in person (and maybe answer follow up questions)

 In any case, it should be a very interesting session: folks will either get
 to learn a lot, or laugh at me a lot, or both.  (win/win/win)


 -Hoss

 --
 http://lucenerevolution.org/  ...  October 7-8, Boston
 http://bit.ly/stump-hoss  ...  Stump The Chump!




-- 
°O°
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/


Solr jam after all my jvm thread pool hang in blocked state

2010-08-23 Thread AlexxelA

I,

I'm running solr 1.3 in production for now 1 year and i never had any
problem with it since 2 weeks.  It happen 6-7 times a day, all of my thread
but one are in a blocked state.  All thread that are blocked are waiting on
the Console monitor owned by the Runnable thread.

We did not changed anything on the application / server.  I have monitored
the thread count and there's no accumulation of thread during the period
solr is ok.  

The problem don't seem to be related to high load of queries since it also
happen during low load period.  

Anyone got a clue of is going on ?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-jam-after-all-my-jvm-thread-pool-hang-in-blocked-state-tp1303361p1303361.html
Sent from the Solr - User mailing list archive at Nabble.com.


lucene + solr: corrupt index

2010-08-23 Thread ANurag
Hi,
I am using lucene 3.0 jars and built a lucene index with 200
documents. The index files were then copied over to my solr 1.4.1
installation. I get the following error every time I start SOLR:
What could I be doing wrong?

SEVERE: Could not start SOLR. Check solr/home property
java.lang.RuntimeException:
org.apache.lucene.index.CorruptIndexException: Incompatible format
version: 2 expected 1 or lower
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: org.apache.lucene.index.CorruptIndexException: Incompatible
format version: 2 expected 1 or lower
at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:117)
at 
org.apache.lucene.index.SegmentReader$CoreReaders.openDocStores(SegmentReader.java:291)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:654)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:613)
at 
org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104)
at 
org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27)
at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:683)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:403)
at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1057)
... 27 more


Re: lucene + solr: corrupt index

2010-08-23 Thread Koji Sekiguchi

 (10/08/24 10:02), ANurag wrote:

Hi,
I am using lucene 3.0 jars and built a lucene index with 200
documents. The index files were then copied over to my solr 1.4.1
installation. I get the following error every time I start SOLR:
What could I be doing wrong?


Solr 1.4 can read Lucene 2.9 index or older.

Koji

--
http://www.rondhuit.com/en/



Re: lucene + solr: corrupt index

2010-08-23 Thread ANurag
Thx Koji, I tried 2.9.3 and it works :-)


On Mon, Aug 23, 2010 at 6:15 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
  (10/08/24 10:02), ANurag wrote:

 Hi,
 I am using lucene 3.0 jars and built a lucene index with 200
 documents. The index files were then copied over to my solr 1.4.1
 installation. I get the following error every time I start SOLR:
 What could I be doing wrong?

 Solr 1.4 can read Lucene 2.9 index or older.

 Koji

 --
 http://www.rondhuit.com/en/




about readercycle script

2010-08-23 Thread Koji Sekiguchi
 I'm working on SOLR-2046 and realized that readercycle script
might be looking for old(?) Solr response format, therefore,
today it always fails:

https://issues.apache.org/jira/browse/SOLR-2046

Since I've looked for issues regarding readercycle in jira and
maling list archives so far, nobody complain about it, I think there
is no users.

Is there anybody get in trouble for deleting readercycle script?

Thanks,

Koji

-- 
http://www.rondhuit.com/en/



Re: ANNOUNCE: Stump Hoss @ Lucene Revolution

2010-08-23 Thread Chris Hostetter

: I have a couple of questions I would like to through your way.
: 
: Is there a place where one can sign up for this.

Heh  sure, all the details were in my email...

: http://bit.ly/stump-hoss

...and...

:  type of solution I might come up with under pressure, please email a
:  description of your problem to st...@lucenerevolution.org -- More details
:  online...
: 
:  http://lucenerevolution.org/Presentation-Abstracts-Day1#stump-hostetter

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Re: Doing Shingle but also keep special single word

2010-08-23 Thread 朱炎詹
The request is from our business team, they wish user of our product can 
type in partial string of a word that exists in title or body field. But now 
I also doubt if this request is really necessary?


Scott

- Original Message - 
From: Ahmet Arslan iori...@yahoo.com

To: solr-user@lucene.apache.org
Sent: Monday, August 23, 2010 8:35 PM
Subject: Re: Doing Shingle but also keep special single word



1. We have over ten million news articles to build into
Solr index.
2. We copy several fields, such as title, author, body,
caption of attahed photos into a new field for default
search.
3. We then wanna use shingle filter on this new field.
4. We can't predict what new single-word noun that our
users may be interesting cause it's news, you know. For
exmple, the word ECFA is only very popular word in news
here recently, so I wish users can type in 'ECFA' to search
and Solr will output see some relevant news articles.
5. I wish to keep index as smaller as possible.
6. I also wish to do same thing descirbed in 5 when I
search by explicitly specifyng field name of those fields,
too.


Can i ask why do you need/use shingle filter?











¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3088 - Release Date: 08/23/10 
02:35:00




Re: Doing Shingle but also keep special single word

2010-08-23 Thread 朱炎詹
Thanks! I'll give more effort to understand your suggestion  that Norm 
thing.


- Original Message - 
From: MitchK mitc...@web.de

To: solr-user@lucene.apache.org
Sent: Tuesday, August 24, 2010 5:28 AM
Subject: Re: Doing Shingle but also keep special single word



No, I mean that you use an additional field (indexed) for searching (i.e.
whitespace-tokenized, so every word - seperated by a whitespace - becomes to
a token .
So you have got two fields (shingle-token-field and single-token-field).
So you can search accross both fields.
This provides several benefits: i.e. you can boost the shingle-field at
query-time, since a match in a shingle-field would mean, that there matches
an exact phrase.

Additionally: You can search with single-word-queries as well as
multi-word-queries.
Furthermore you can apply synonyms to your single-token-field.

If you want to keep your index as small as possible but as large as needed,
try to understand Lucene's similarity implementation to consider, whether
you can set the field option omitNorms=true or
omitTermFreqAndPositions=true.
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/Similarity.html
Keep in mind what happens, if you omit one of those options.

A small example of the consequences of setting omitNorms = true;.
doc1: this is a short example doc
doc2: this is a longer example doc for presenting the effect of omitNorms

If you are searching for doc while omitNorms=false your response will look
like this:
doc1,
doc2
This is because the norm-value for doc1 is smaller as the norm-value for
doc2, because doc1 is shorter than doc2 (have a look at the provided link).

If omitNorms=true, the scores for both docs will be equal.

Kind regards,
- Mitch


scott chu wrote:


I don't quite understand additional-field-way? Do you mean making another
field that stores special words particularly but no indexing for that
field?

Scott

- Original Message - 
From: MitchK mitc...@web.de

To: solr-user@lucene.apache.org
Sent: Sunday, August 22, 2010 11:48 PM
Subject: Re: Doing Shingle but also keep special single word




Hi,

keepword-filter is no solution for this problem, since this would lead to
the problematic that one has to manage a word-dictionary. As explained,
this
would lead to too much effort.

You can easily add outputUnigrams=true and check out the analysis.jsp for
this field. So you can see how much bigger a single field will become
with
this option.
However, I am quite sure that the difference between using
outputUnigrams=true and indexing in a seperate field is not noteworthy.

I would suggest you to do it the additionally-field-way, since this would
lead to more flexibility in boosting the different fields.

Unfortunately, I haven't understood your explanation about the use-case.
But
it sounds a little bit like tagging?

Kind regards,
- Mitch


iorixxx wrote:



Isn't set outputUnigrams=true will
make index size about twice than when it's set to false?


Sure index will be bigger. I didn't know that this is problem for you.
But
if you have a list of special single words that you want to keep,
keepwordfilter can eliminate other tokens. So index size will be okey.



Scott

- Original Message - From: Ahmet Arslan iori...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Saturday, August 21, 2010 1:15 AM
Subject: Re: Doing Shingle but also keep special single
word


 I am building index with Shingle
 filter. We know it's minimum 2-gram but I also
want keep
 some special single word, e.g. IBM, Microsoft,
etc. i.e. I
 want to do a minimum 2-gram but also want to have
these
 single word in my index, Is it possible?

 outputUnigrams=true parameter does not work for
you?

 After that you can cast filter
class=solr.KeepWordFilterFactory words=keepwords.txt
ignoreCase=true/ with keepwords.txt=IBM, Microsoft.













--
View this message in context:
http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1276506.html
Sent from the Solr - User mailing list archive at Nabble.com.








¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3083 - Release Date: 08/20/10
14:35:00




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-Shingle-but-also-keep-special-single-word-tp1241204p1300497.html

Sent from the Solr - User mailing list archive at Nabble.com.







___b___J_T_f_r_C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3090 - Release Date: 08/24/10 
02:34:00




Why it's boosted up?

2010-08-23 Thread 朱炎詹

In Lucene's web page, there's a paragraph:

Indexing time boosts are preprocessed for storage efficiency and written to 
the directory (when writing the document) in a single byte (!) as follows: 
For each field of a document, all boosts of that field (i.e. all boosts 
under the same field name in that doc) are multiplied. The result is 
multiplied by the boost of the document, and also multiplied by a field 
length norm value that represents the length of that field in that doc (so 
shorter fields are automatically boosted up). 


I though the greater the value, the boosting is upper. Then why short fields 
are boost up? Isn't Norm value for short fields smaller?




Solr Hangs up after couple of hours

2010-08-23 Thread Manepalli, Kalyan
Hi all,
   I am facing a peculiar problem with Solr querying. During our indexing 
process we analyze the existing index. For this we query the index. We found 
that the solr server just hangs on a arbitrary query. If we access the 
admin/stats.jsp, it again resumes executing the queries. The thread count and 
memory utilization looks very normal.

Any clues on whats going on will be very helpful.

Thanks
Kalyan

SolrJ addField with Reader

2010-08-23 Thread Bojan Vukojevic
I am using SolrJ with embedded  Solr server and some documents have a lot of
text. Solr will be running on a small device with very limited memory. In my
tests I cannot process more than 3MB of text (in a body) with 64MB heap.
According to Java there is about 30MB free memory before I call server.add
and with 5MB of text it runs out of memory.

Is there a way around this?

Is there a plan to enhance SolrJ to allow a reader to be passed in instead
of a string?

thx!

b


Re: Solr jam after all my jvm thread pool hang in blocked state

2010-08-23 Thread Bill Au
It would be helpful it you can attached a threads dump.

BIll

On Mon, Aug 23, 2010 at 6:00 PM, AlexxelA alexandre.boudrea...@canoe.cawrote:


 I,

 I'm running solr 1.3 in production for now 1 year and i never had any
 problem with it since 2 weeks.  It happen 6-7 times a day, all of my thread
 but one are in a blocked state.  All thread that are blocked are waiting on
 the Console monitor owned by the Runnable thread.

 We did not changed anything on the application / server.  I have monitored
 the thread count and there's no accumulation of thread during the period
 solr is ok.

 The problem don't seem to be related to high load of queries since it also
 happen during low load period.

 Anyone got a clue of is going on ?


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-jam-after-all-my-jvm-thread-pool-hang-in-blocked-state-tp1303361p1303361.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Hangs up after couple of hours

2010-08-23 Thread Bill Au
It would be very useful if you can take a threads dump while Solr is
hanging.  That will give indication where/why Solr is hanging.

Bill

On Mon, Aug 23, 2010 at 9:32 PM, Manepalli, Kalyan 
kalyan.manepa...@orbitz.com wrote:

 Hi all,
   I am facing a peculiar problem with Solr querying. During our
 indexing process we analyze the existing index. For this we query the index.
 We found that the solr server just hangs on a arbitrary query. If we access
 the admin/stats.jsp, it again resumes executing the queries. The thread
 count and memory utilization looks very normal.

 Any clues on whats going on will be very helpful.

 Thanks
 Kalyan