Spellchecker index cannot be optimized

2010-06-15 Thread Pumpenmeier, Lutz SZ/HZA-ZSB3
Hello,
when I rebuild the spellchecker index ( by optimizing the data index or
by calling cmd=rebuild ) the spellchecker index is not optimized. I even
cannot delete the old indexfiles on the filesystem, because they are
locked by the solr server. I have to stop the  solr server(resin) to
optimize the spellchecker index with luke or by deleting the old files.
How can I optimize the index without stopping the solr server?

Thanks
 Lutz Pumpenmeier



RE: Copyfield multi valued to single value

2010-06-15 Thread Marc Ghorayeb

Thanks for the update, i'll have to find another way then :s.
Marc

 Date: Mon, 14 Jun 2010 13:44:30 -0700
 From: hossman_luc...@fucit.org
 To: solr-user@lucene.apache.org
 Subject: Re: Copyfield multi valued to single value
 
 
 : Is there a way to copy a multivalued field to a single value by taking 
 : for example the first index of the multivalued field?
 
 Unfortunately no.  This would either need to be done with an 
 UpdateProcessor, or on the client constructing hte doc (either the remote 
 client, or in your DIH config if that's how you are using Tika)
 
 
 
 -Hoss
 
  
_
Installez gratuitement les nouvelles Emoch'ticones !
http://www.ilovemessenger.fr/emoticones/telecharger-emoticones-emochticones.aspx

RE: custom scorer in Solr

2010-06-15 Thread Fornoville, Tom
Hello Hoss,

So far we have been using the default SearchHandler.

I also looked into a solution proposed on this mailing list by Geert-Jan
Brits using extra sort fields and functions to pick out the maximum.
This however proved rather cumbersome to integrate in our SolrJ client
and I also have some concerns about performance. The actual data has
about 2.5 million documents in it, with some popular categories of more
than 200K docs.

I did look into the dismax query but the problem there was that name and
category are not the only fields we search in. They are only the what
field and we also have a where field.

The code that actually came closest to the desired results was this:

  private String makeQuery(String what, String where) {
StringBuilder sb = new StringBuilder();
sb.append(category:);
sb.append(what);
sb.append(^32 OR );
sb.append(name:);
sb.append(what);
sb.append(^16 AND ();
sb.append(locality2:);
sb.append(where);
sb.append(^8 OR locality3:);
sb.append(where);
sb.append(^4 OR locality1:);
sb.append(where);
sb.append(^2 OR locality4:);
sb.append(where);
sb.append());
return sb.toString();
  }

  ...

  SolrQuery query = new SolrQuery();
  query.setQuery(makeQuery(what, where));
  QueryResponse rsp;
  query.addSortField(score, ORDER.desc);
  query.addSortField(producttier, ORDER.asc);
  query.addSortField(random_ + System.currentTimeMillis(), ORDER.asc);

So the actual query string was something like category:restaurant^32 OR
name:restaurant^16 AND(locality2:Antwerp^8 OR locality3:Antwerp^4 OR
locality1:Antwerp^2 OR locality4:Antwerp).

I have no idea how this can be rewritten in SolrJ using a standard
dismax query. 

So in conclusion I think this client will probably need a custom
QParser.
Time to start reading and experimenting I guess.

Regards,
Tom

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: maandag 14 juni 2010 22:29
To: solr-user@lucene.apache.org
Subject: Re: custom scorer in Solr


: Problem is that they want scores that make results fall in buckets:
: 
: * Bucket 1: exact match on category (score = 4)
: * Bucket 2: exact match on name (score = 3)
: * Bucket 3: partial match on category (score = 2)
: * Bucket 4: partial match on name (score = 1)
...
: First thing we did was develop a custom similarity class that would
: return the correct score depending on the field and an exact or
partial
: match.
...
: The only problem now is that when a document matches on both the
: category and name the scores are added together.

what QParser are you using?  what does the resulting Query data
structure 
look like?

I think with your custom Similarity class you might be able to achieve 
your goal using the DisMaxQParser w/o any other custom code -- just set 
your qf=category name (i'm assuming your Similarity already handles
the 
relative weighting) and set the tie=0 ... that will ensure that the 
final score only comes from the Max scoring field (ie: no tie breaking

values fro mthe other fields)

if thta doesn't do what you want -- then your best bet is probably to 
write a custom QParser that generates *exactly* the query structure you 
want (likely using a DisjunctionMaxQuery) thta will give you the scores 
you want in conjunction with your similarity class.


-Hoss



Indexing HTML files in SOLR

2010-06-15 Thread seesiddharth

Hi,
I am using SOLR with Apache Tomcat. I have some .html
files(contains the articles) stored at XYZ location. How can I index these
.html files in SOLR? 

Regards,
Siddharth 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-HTML-files-in-SOLR-tp896530p896530.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help on Solr Cell usage with specific Tika parser

2010-06-15 Thread olivier sallou
Thanks,
moving it to direcxt child worked.

Olivier

2010/6/14 Chris Hostetter hossman_luc...@fucit.org


 : In solrconfig, in update/extract requesthandler I specified str
 : name=tika.config./tika-config.xml/str , where tika-config.xml is in
 : conf directory (same as solrconfig).

 can you show us the full requestHandler decalration? ... tika.config needs
 to be a direct child of the requestHandler (not in the defaults)

 I also don't know if using a local path like that will work -- depends
 on how that file is loaded (if solr loads it, then you might want to
 remove the ./;  if solr just gives the path to tika, then you probably
 need an absolute path.


 -Hoss




HOWTO get a working copy of SOLR?

2010-06-15 Thread Bernd Fehling
Dear list,

this sounds stupid, but how to get a full working copy of SOLR?

What I have tried so far:
- started with LucidWorks SOLR. Installs fine, runs fine but has an old tika 
version
  and can only handle some PDFs.

- changed to SOLR trunk. Installs fine, runs fine but luke 1.0.1 argues about
  Unknown format version: -10. I guess because luke 1.0.1 compiles with
  lucene-core-3.0.1.jar but trunk has lucene-core-4.0-dev.jar ???
  Anyway, no luck with this version.

- changed to SOLR branch_3x. Installs fine, runs fine, luke works fine but
  the extraction with /update/extract (ExtractingRequestHandler) only replies
  the metadata but not the content.
  No luck with this version.

Is there any full working recent copy at all?

Or a luke working with SOLR trunk?

Regards,
Bernd


Re: question about the fieldCollapseCache

2010-06-15 Thread Rakhi Khatwani
Hi,
  I tried downloading solr 1.4.1 from the site. but it shows an empty
directory. where did u get solr 1.4.1 from?

Regards,
Raakhi

On Tue, Jun 8, 2010 at 10:35 PM, Jean-Sebastien Vachon 
js.vac...@videotron.ca wrote:

 Hi All,

 I've been running some tests using 6 shards each one containing about 1
 millions documents.
 Each shard is running in its own virtual machine with 7 GB of ram (5GB
 allocated to the JVM).
 After about 1100 unique queries the shards start to struggle and run out of
 memory. I've reduced all
 other caches without significant impact.

 When I remove completely the fieldCollapseCache, the server can keep up for
 hours
 and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM)

 The size of the fieldCollapseCache was set to 5000 items. How can 5000
 items eat 3 GB of ram?

 Can someone tell me what is put in this cache? Has anyone experienced this
 kind of problem?

 I am running Solr 1.4.1 with patch 236. All requests are collapsing on a
 single field (pint) and
 collapse.maxdocs set to 200 000.

 Thanks for any hints...




Re: how to use q=string in solrconfig.xml `?

2010-06-15 Thread stockii

okay thx. good idea with mod_rewrite =) thx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-use-q-string-in-solrconfig-xml-tp861870p896902.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: question about the fieldCollapseCache

2010-06-15 Thread Jean-Sebastien Vachon
They used to be in the branches if I recall correctly but you're right. They 
aren't there anymore.

Maybe someone else can explain why... it looks like they restructure the 
repository for the Solr/lucene merge.

On 2010-06-15, at 4:54 AM, Rakhi Khatwani wrote:

 Hi,
  I tried downloading solr 1.4.1 from the site. but it shows an empty
 directory. where did u get solr 1.4.1 from?
 
 Regards,
 Raakhi
 
 On Tue, Jun 8, 2010 at 10:35 PM, Jean-Sebastien Vachon 
 js.vac...@videotron.ca wrote:
 
 Hi All,
 
 I've been running some tests using 6 shards each one containing about 1
 millions documents.
 Each shard is running in its own virtual machine with 7 GB of ram (5GB
 allocated to the JVM).
 After about 1100 unique queries the shards start to struggle and run out of
 memory. I've reduced all
 other caches without significant impact.
 
 When I remove completely the fieldCollapseCache, the server can keep up for
 hours
 and use only 2 GB of ram. (I'm even considering returning to a 32 bits JVM)
 
 The size of the fieldCollapseCache was set to 5000 items. How can 5000
 items eat 3 GB of ram?
 
 Can someone tell me what is put in this cache? Has anyone experienced this
 kind of problem?
 
 I am running Solr 1.4.1 with patch 236. All requests are collapsing on a
 single field (pint) and
 collapse.maxdocs set to 200 000.
 
 Thanks for any hints...
 
 



Re: HOWTO get a working copy of SOLR?

2010-06-15 Thread Sixten Otto
On Tue, Jun 15, 2010 at 12:58 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
 - changed to SOLR branch_3x. Installs fine, runs fine, luke works fine but
  the extraction with /update/extract (ExtractingRequestHandler) only replies
  the metadata but not the content.

Sounds like https://issues.apache.org/jira/browse/SOLR-1902

Sixten


how to get tf-idf values in solr

2010-06-15 Thread sarfaraz masood
I am Sarfaraz, working on a Search Engine
project which is based on Nutch  Solr. I am trying to implement a
new Search Algorithm for this engine.

Our search engine is crawling the web and storing the documents in form of 
large strings in the database indexed by their urls.

Now
to implement my algorithm i need tf - idf values(0 - 1) for each
document given by the crawler. but i m unable to find any method in
solr or lucene which can serve my purpose.

For my algorithm i need to maintain a relevance matrix of the following type :

eg 
    term1   term2    term3    term4...
url1    0.7   0.8  
 0.3    0.1
url2    0.4   0.1   0.4   0.5
url3

.
.
.
and
for this purpose i need a core java method/function in solr that
returns me the tf idf values for all terms in all documents for the
available document list..

Plz help

I will highly grateful to you all

-Sarfaraz Masood



Re: how to get tf-idf values in solr

2010-06-15 Thread didier deshommes
Have you taken a look at Solr's TermVector component? It's probably
what you want:

http://wiki.apache.org/solr/TermVectorComponent

didier

On Tue, Jun 15, 2010 at 8:38 AM, sarfaraz masood
sarfarazmasood2...@yahoo.com wrote:
 I am Sarfaraz, working on a Search Engine
 project which is based on Nutch  Solr. I am trying to implement a
 new Search Algorithm for this engine.

 Our search engine is crawling the web and storing the documents in form of 
 large strings in the database indexed by their urls.

 Now
 to implement my algorithm i need tf - idf values(0 - 1) for each
 document given by the crawler. but i m unable to find any method in
 solr or lucene which can serve my purpose.

 For my algorithm i need to maintain a relevance matrix of the following type :

 eg
     term1   term2    term3    term4...
 url1    0.7   0.8
  0.3    0.1
 url2    0.4   0.1   0.4   0.5
 url3

 .
 .
 .
 and
 for this purpose i need a core java method/function in solr that
 returns me the tf idf values for all terms in all documents for the
 available document list..

 Plz help

 I will highly grateful to you all

 -Sarfaraz Masood




Re: how to get tf-idf values in solr

2010-06-15 Thread Erik Hatcher

The TermVectorComponent can return tf/idf:

  http://wiki.apache.org/solr/TermVectorComponent


On Jun 15, 2010, at 9:38 AM, sarfaraz masood wrote:


I am Sarfaraz, working on a Search Engine
project which is based on Nutch  Solr. I am trying to implement a
new Search Algorithm for this engine.

Our search engine is crawling the web and storing the documents in  
form of large strings in the database indexed by their urls.


Now
to implement my algorithm i need tf - idf values(0 - 1) for each
document given by the crawler. but i m unable to find any method in
solr or lucene which can serve my purpose.

For my algorithm i need to maintain a relevance matrix of the  
following type :


eg
term1   term2term3term4...
url10.7   0.8
0.30.1
url20.4   0.1   0.4   0.5
url3

.
.
.
and
for this purpose i need a core java method/function in solr that
returns me the tf idf values for all terms in all documents for the
available document list..

Plz help

I will highly grateful to you all

-Sarfaraz Masood





Re: Multiple location filters per search

2010-06-15 Thread Aaron Chmelik
Hoss,

Thanks for the response.

I was able to get multiple dist queries working, however, I've noticed
another problem.

when using

fq=_query_:{!frange l=0 u=25 v=$qa}
qa=dist(2,44.844833,-93.03528,latitude,longitude)

it returns 9,975 documents. When I change the upper limit to 250 it
returns 33,241 documents. So, the filter is doing something.

But, the lat/long on the documents returned puts them well beyond the
limit - for example, the first document returned with an upper limit
of 25 has the following values:

[latitude] = 36.0275
[longitude] = -80.2073

which is 907 miles away from the originating point.

Currently, my lat/lon fields are indexed using

field name=latitude type=tdouble indexed=true /
field name=longitude type=tdouble indexed=true /

I have no doubt there is something I am missing, and any help would be
greatly appreciated.

Aaron

On Mon, Jun 14, 2010 at 7:43 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 : I am currently working with the following:
 :
 : {code}
 : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, longitude)
 : {/code}
        ...
 : {code}
 : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude,
 : longitude) OR {!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152,
 : latitude, longitude)
 : {/code}
        ...
 : I get an error. Hoping someone has an idea of how to work with
 : multiple locations in a single search.

 I think yo uare confused about how that query is getting parsed ... when
 SOlr sees the {!frange at the begining of hte param, that tells it that
 the *entire* praam value should be parsed by the frange parser.  The
 frange parser doesn't know anything about keywords like OR

 What you probably want is to utilize the _query_ hack of the
 LuceneQParser so that you can parse some Lucene syntax (ie: A OR B)
 where the clauses are then generated by using another parser...

 http://wiki.apache.org/solr/SolrQuerySyntax

 fq=_query_={!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude, 
 longitude) OR _query_:{!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152, 
 latitude, longitude)

   ...or a little more readable...

 fq=_query_={!frange l=0 u=1 unit=mi v=$qa} OR _query_:{!frange l=0 u=1 
 unit=mi v=$qb}
 qa=dist(2,32.6126, -86.3950, latitude, longitude)
 qb=dist(2,44.1457, -73.8152, latitude, longitude)




 -Hoss




Re: VelocityResponseWriter in Solr Core ?! configuration

2010-06-15 Thread Jon Baer
Are you using Ubuntu by any chance? 

It's a somewhat common problem ... 
@http://stackoverflow.com/questions/2854356/java-classpath-problems-in-ubuntu

I'm unsure if this has been resolved but a similar thing happened to me on a 
recent VMware image in a dev environment.  It worked everywhere else.

- Jon 

On Jun 14, 2010, at 9:12 AM, stockii wrote:

 
 ah okay.
 
 i tried it with 1.4 and put the jars into lib of solr.home but it want be
 work. i get the same error ...
 
 i use 2 cores. and my solr.home is ...path/cores in this folder i put
 another folder with the name: lib and put all these Jars into it: 
 apache-solr-velocity-1.4-dev.jar 
 velocity-1.6.1.jar 
 velocity-tools-2.0-beta3.jar 
 commons-beanutils-1.7.0.jar 
 commons-collections-3.2.1.jar
 commons-lang-2.1.jar
 
 and then in solrconfig.xml this line: queryResponseWriter name=velocity
 class=org.apache.solr.response.VelocityResponseWriter/ 
 
 solr cannot find the jars =(
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/VelocityResponseWriter-in-Solr-Core-configuration-tp894262p894354.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Custom faceting question

2010-06-15 Thread Blargy

Got it. Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-faceting-question-tp868015p897390.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple location filters per search

2010-06-15 Thread darren

From what I've seen so far, using separate fields for latitude and
longitude, especially with multiple values of each, does not work correctly
in all situations.

The hole in my understanding is how Solr knows how to pair a latitude and
longitude field _back_ into a POINT.

I can say that it doesn't work with multiple values and ranges, which is
an approach floated on the wiki and various blogs.

I can have qualifying longitude and latitude values that match a range
(not sure about distance function tho), but taken together should result in
a true negative, and not a false positive.

I'm hoping to learn how its done myself. But so far the wiki isn't clear
about it.


On Tue, 15 Jun 2010 09:35:15 -0500, Aaron Chmelik
aaron.chme...@gmail.com
wrote:
 Hoss,
 
 Thanks for the response.
 
 I was able to get multiple dist queries working, however, I've noticed
 another problem.
 
 when using
 
 fq=_query_:{!frange l=0 u=25 v=$qa}
 qa=dist(2,44.844833,-93.03528,latitude,longitude)
 
 it returns 9,975 documents. When I change the upper limit to 250 it
 returns 33,241 documents. So, the filter is doing something.
 
 But, the lat/long on the documents returned puts them well beyond the
 limit - for example, the first document returned with an upper limit
 of 25 has the following values:
 
 [latitude] = 36.0275
 [longitude] = -80.2073
 
 which is 907 miles away from the originating point.
 
 Currently, my lat/lon fields are indexed using
 
 field name=latitude type=tdouble indexed=true /
 field name=longitude type=tdouble indexed=true /
 
 I have no doubt there is something I am missing, and any help would be
 greatly appreciated.
 
 Aaron
 
 On Mon, Jun 14, 2010 at 7:43 PM, Chris Hostetter
 hossman_luc...@fucit.org wrote:
 : I am currently working with the following:
 :
 : {code}
 : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude,
longitude)
 : {/code}
        ...
 : {code}
 : {!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950, latitude,
 : longitude) OR {!frange l=0 u=1 unit=mi}dist(2,44.1457, -73.8152,
 : latitude, longitude)
 : {/code}
        ...
 : I get an error. Hoping someone has an idea of how to work with
 : multiple locations in a single search.

 I think yo uare confused about how that query is getting parsed ...
when
 SOlr sees the {!frange at the begining of hte param, that tells it
that
 the *entire* praam value should be parsed by the frange parser.  The
 frange parser doesn't know anything about keywords like OR

 What you probably want is to utilize the _query_ hack of the
 LuceneQParser so that you can parse some Lucene syntax (ie: A OR B)
 where the clauses are then generated by using another parser...

 http://wiki.apache.org/solr/SolrQuerySyntax

 fq=_query_={!frange l=0 u=1 unit=mi}dist(2,32.6126, -86.3950,
latitude,
 longitude) OR _query_:{!frange l=0 u=1 unit=mi}dist(2,44.1457,
 -73.8152, latitude, longitude)

   ...or a little more readable...

 fq=_query_={!frange l=0 u=1 unit=mi v=$qa} OR _query_:{!frange l=0
 u=1 unit=mi v=$qb}
 qa=dist(2,32.6126, -86.3950, latitude, longitude)
 qb=dist(2,44.1457, -73.8152, latitude, longitude)




 -Hoss




DatImportHandler and cron issue

2010-06-15 Thread iboppana

Hi All,

We are trying implement solr for our newspapers site search.
To build out the index with all the articles published so far, we are
running script which send the request to dataimport handler with different
dates.
What we are seeing is the request is dispatched to solr server,but its not
being processed.
Just wanted to check if its some kind of threading issues, and whats the
best approach to achieve this.

We are sleeping for 75 secs between the requests,


while (($date+=86400)  $now) {
   $curdate=strftime(%D, localtime($date));

   print Updating index for $curdate\n;

   $curdate=uri_escape($curdate);

   my $url =
'http://test.solr.ddtc.cmgdigital.com:8080/solr/npmetrosearch_statesman/dataimport?command=full-importentity=initialLoadclean=falsecommit=trueforDate='
. $curdate .
'numArticles=-1server=app5site=statesmanarticleTypes=story,slideshow,video,poll,specialArticle,list';

   print Sending: $url\n;

  
   #if (system(wget -q -O - \'$url\' | egrep -q \'$regex_pat\')) {
   if (system(curl -s \'$url\' | egrep -q \'$regex_pat\')) {
  print Failed to match expected regex reply: \$regex_pat\\n;
  exit 1; 
   }

   sleep 75;
}




This is what we are seeing on the server logs 
2010-06-14 12:51:01,328 INFO  [org.apache.solr.core.SolrCore]
(http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr
path=/dataimport
params={site=statesmanforDate=03/24/10articleTypes=story,slideshow,video,poll,specialArticle,listclean=falsecommit=trueentity=initialLoadcommand=full-importnumArticles=-1server=app5}
status=0 QTime=0 
2010-06-14 12:51:01,329 INFO 
[org.apache.solr.handler.dataimport.DataImporter] (Thread-378) Starting Full
Import
2010-06-14 12:51:01,332 INFO 
[org.apache.solr.handler.dataimport.SolrWriter] (Thread-378) Read
dataimport.properties
2010-06-14 12:51:01,425 INFO 
[org.apache.solr.handler.dataimport.DocBuilder] (Thread-378) Time taken =
0:0:0.93
2010-06-14 12:51:16,338 INFO  [org.apache.solr.core.SolrCore]
(http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr
path=/dataimport
params={site=statesmanforDate=03/25/10articleTypes=story,slideshow,video,poll,specialArticle,listclean=falsecommit=trueentity=initialLoadcommand=full-importnumArticles=-1server=app5}
status=0 QTime=0 
2010-06-14 12:51:16,338 INFO 
[org.apache.solr.handler.dataimport.DataImporter] (Thread-379) Starting Full
Import
2010-06-14 12:51:16,338 INFO 
[org.apache.solr.handler.dataimport.SolrWriter] (Thread-379) Read
dataimport.properties
2010-06-14 12:51:16,465 INFO 
[org.apache.solr.handler.dataimport.DocBuilder] (Thread-379) Time taken =
0:0:0.126

Appreciate any thoughts on this.

Thanks
  Indrani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DatImportHandler-and-cron-issue-tp897698p897698.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Master master?

2010-06-15 Thread Wilson Man
Don't think so, you probably want to look into this setup of distributed + 
sharding

http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr#d0e410

it will get you high availability plus more scalable.

Wilson Man   |   Principal Consultant   |   Liferay, Inc.   |   Enterprise.  
Open Source.  For Life.




On Jun 14, 2010, at 1:15 PM, Chris Hostetter wrote:

 
 : Does Solr handling having two masters that are also slaves to each other (ie
 : in a cycle)?
 
 no.
 
 
 
 -Hoss
 



Help patching Solr

2010-06-15 Thread Moazzam Khan
Hey guys,

Does anyone know how to patch stuff in Windows? I am trying to patch
Solr with patch 238 but it keeps erroring out with this message:



C:\solr\example\webappspatch solr.war ..\..\SOLR-236-trunk.patch
patching file solr.war
Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Thanks in advance

Moazzam


RE: Help patching Solr

2010-06-15 Thread Nagelberg, Kallin
I'm pretty sure you need to be running the patch against a checkout of the 
trunk sources, not a generated .war file. Once you've done that you can use the 
build scripts to make a new war.

-Kallin Nagelberg

-Original Message-
From: Moazzam Khan [mailto:moazz...@gmail.com] 
Sent: Tuesday, June 15, 2010 1:53 PM
To: solr-user@lucene.apache.org
Subject: Help patching Solr

Hey guys,

Does anyone know how to patch stuff in Windows? I am trying to patch
Solr with patch 238 but it keeps erroring out with this message:



C:\solr\example\webappspatch solr.war ..\..\SOLR-236-trunk.patch
patching file solr.war
Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

Thanks in advance

Moazzam


Re: Help patching Solr

2010-06-15 Thread Moazzam Khan
Thanks. I finally patched it (I think). I got the source from SVN and
applied the patch using a windows port. A caveat to those to want to
do this on windows - open the file in wordpad and save it as a
different file to replace unix line breaks with DOS line breaks.
Otherwise, the patch program gives an error:

Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354

Now, that I have patched it (as far as I can tell) how do I build the
sources :D (sorry I know it's a basic question but I have no idea how
to do this)

- Moazzam

On Tue, Jun 15, 2010 at 1:14 PM, Nagelberg, Kallin
knagelb...@globeandmail.com wrote:
 I'm pretty sure you need to be running the patch against a checkout of the 
 trunk sources, not a generated .war file. Once you've done that you can use 
 the build scripts to make a new war.

 -Kallin Nagelberg

 -Original Message-
 From: Moazzam Khan [mailto:moazz...@gmail.com]
 Sent: Tuesday, June 15, 2010 1:53 PM
 To: solr-user@lucene.apache.org
 Subject: Help patching Solr

 Hey guys,

 Does anyone know how to patch stuff in Windows? I am trying to patch
 Solr with patch 238 but it keeps erroring out with this message:



 C:\solr\example\webappspatch solr.war ..\..\SOLR-236-trunk.patch
 patching file solr.war
 Assertion failed: hunk, file ../patch-2.5.9-src/patch.c, line 354

 This application has requested the Runtime to terminate it in an unusual way.
 Please contact the application's support team for more information.

 Thanks in advance

 Moazzam



Reindexing only occurs after bouncing app

2010-06-15 Thread John Ament
Hi all

I wrote a small app using solrj and solr.  The app has a small wrapper that
handles the reindexing., which was written using groovy.  The groovy script
generates the solr docs, and then the java code deletes and recreates the
data

In a singleton ejb, we do this in the post construct phase:

39 CoreContainer.Initializer initializer = new CoreContainer.Initializer();
40 coreContainer = initializer.initialize();  41 solrServer = new
EmbeddedSolrServer(coreContainer, );
A method that does this can be invoked over HTTP service to force the
reindexing:

52 gse.run(search_indexer.groovy, b);  53 logger.info(Solr docs size:  +
solrDocs.size());  54 solrServer.deleteByQuery(*:*);  55
solrServer.add(solrDocs);
 56 solrServer.commit();

we've noticed that after executing this, we see appropriate log messages
indicating that it ran, however the search indexes do not repopulate.  We're
deployed on glassfish v3.  Any thoughts?

Any ideas?

Thanks,

John


Solr / Solrj Wildcard Phrase Searches

2010-06-15 Thread Vladimir Sutskever
Performing wild card phrase searches can be tricky. Spend some time figuring 
this one out.


1. To perform a wildcard search on a phrase, it is very important to escape the 
SPACE, so that SOLR treats it as a single phrase.
Ex: Citibank NA = Citibank\ NA


You can use org.apache.solr.client.solrj.util.ClientUtils (part of solrj 
library) to perform the escapes


--
Example 
--
So that a search for: CITIBANK\ N*  

Should produce results: 
CITIBANK NA
CITIBANK NATIONAL
CITIBANK N


2. Also make sure your field (I named it client_name_starts) is of fieldType 
that is maintained as a single Token during indexing. 

--
Example
--


!-- lowercases the entire field value, keeping it as a single token.  --
   fieldType name=lowercase class=solr.TextField 
positionIncrementGap=100
 analyzer
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory /
/analyzer
 /fieldType


field name=client_name_starts type=lowercase indexed=true stored=true  
/



3. Make sure to Lower Case/Upper Case (depending on your setup) the search user 
input string, before sending it to SOLR - since wildcards are NOT analyzed - 
and send AS IS


Good Luck


Kind regards,

Vladimir Sutskever
Investment Bank - Technology
JPMorgan Chase, Inc.
Tel: (212) 552.5097






This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.


using DataImport Dev Console: no errors, but no documents

2010-06-15 Thread Peter Wilkins
I'm new to Solr so I expect that I'm making some newbie error.  I run my 
data-config.xml file through the DataImportHandler Development Console and I 
see all the results of the xpath queries scroll past in the  debug pane.  It 
processes all the content without reporting an error in the terminal window 
that runs Jetty, or in the Dev Console itself.  This is what appears at the end 
of the debug pane:
---snip--
str name=statusidle/str
str name=importResponseConfiguration Re-loaded sucessfully/str
−
lst name=statusMessages
str name=Total Requests made to DataSource0/str
str name=Total Rows Fetched5322/str
str name=Total Documents Skipped0/str
str name=Full Dump Started2010-06-15 21:51:14/str
str name=Total Documents Processed0/str
str name=Time taken 0:0:0.71/str
/lst
−
---snip---

It fetches 5322 rows but doesn't process any documents and doesn't populate the 
index.  Any suggestions would be appreciated.

/peter


Here's my data-config.xml file:

dataConfig
dataSource name=myFileReader type=FileDataSource /
document
entity name=f 
processor=FileListEntityProcessor 

baseDir=/Users/pascal/tools/apache-solr-1.4.0/example/example-DIH/timetext 
fileName=.*ttml 
recursive=true 
rootEntity=false
dataSource=null

entity name=transcript
pk=tid
url=${f.fileAbsolutePath}
processor=XPathEntityProcessor
forEach=/tt/body/div/p
rootEntity=false
dataSource=myFileReader
onError=continue

field column=begin   
xpath=/tt/body/div/p/@begin   /
field column=dur 
xpath=/tt/body/div/p/@dur /
field column=end 
xpath=/tt/body/div/p/@end /
field column=phrase  xpath=/tt/body/div/p  
/
field column=tid 
xpath=/tt/body/div/p/@xml:id  /
/entity
/entity
/document
/dataConfig



DIH error documents' list

2010-06-15 Thread Maddy.Jsh

DIH skips the documents which has errors and it also shows which field caused
the error. But which documents is skipped and which field caused the error
is only shown in the server console. Is there a way to retrieve that info in
the browser or read the info from the console itself.

Thanks,
Maddy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-error-documents-list-tp899052p899052.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCoreAware

2010-06-15 Thread Blargy

Can someone please explain what the inform method should accomplish? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCoreAware-tp899064p899064.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrEventListener

2010-06-15 Thread Blargy

Can someone explain how to register a SolrEventListener? 

I am actually interested in using the SpellCheckerListener and it appears
that it would build/rebuild a spellchecker index on commit and/or optimize
but according to the wiki the only events that can be listened for are
firstSearcher  and newSearcher
(http://wiki.apache.org/solr/SolrPlugins#SolrEventListener) Is the wiki
outdated or something?

So how can I register this (or any other event listener) to execute on
commit/optimize? Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrEventListener-tp899074p899074.html
Sent from the Solr - User mailing list archive at Nabble.com.