Re: using DataImportHandler with ExtractRequestHandler ?

2009-10-14 Thread abhay kumar
Thanks Steven for the quick reply ..

On Wed, Oct 14, 2009 at 1:56 AM, Steven A Rowe sar...@syr.edu wrote:

 See http://issues.apache.org/jira/browse/SOLR-1358

 Steve

  -Original Message-
  From: abhay kumar [mailto:abhay...@gmail.com]
  Sent: Tuesday, October 13, 2009 8:59 AM
  To: solr-user@lucene.apache.org; solr-user-
  sc.1251278899.kmoigkhhnpcnaplolgcb-
  abhayait=gmail@lucene.apache.org; solr-user-
  sc.1253450516.pndkohgcdcidbclnkelo-abhayait=gmail@lucene.apache.org
  Subject: using DataImportHandler with ExtractRequestHandler ?
 
  Hi ,
 
  We are using solr-1.4 for our search module.
 
  We have a long schema (35 fields) whose some field values comes from
  database 
  some field(Actually 1) value comes from different file formats.
 
  We are able to index different file formats using Solr Cell
  ExtractRequestHandler .
  Data from database can be indexed using DataImportHandler.
 
  Now, I want to call both(DataImportHandler  ExtractRequestHandler )
  requesthandlers at the same time for each document.
  Is it possible?How?
 Or
  Can DataImportHandler call ExtractRequestHandler or vice versa ?
 
  Or
  Can these two RequestHandlers be called combined for one document ?
 
  If yes, How ?
 
  *For e.g.*
 
  Let's take 2 fields..
 
  resumeContent = it's value is stored in a file(pdf,word,doc) . So we
  need
  to use ExtractRequestHandler to get it's value.
 
  resumeTitle = It's value is stored in database. So I need to use
  DataImportHandler to get it's value from database.
 
  These 2 fields make one document.
 
 
  How  DataImportHandler can be used with ExtractRequestHandler or vice
  versa
  for the same document which some
  field values comes form database  some field values comes from
  different
  document formats ?
 
  I don't want to extract  different document formats  store it's
  content(body) in database before indexing .
 
  We are in agile development work.
 
  So a quick response will be appreciated.
 
  Regards,
  Abhay



Re: Error when indexing XML files

2009-10-14 Thread Fergus McMenemie
Hi, 

I am trying to index XML files using SolrJ. The original XML file contains 
nested elements. For example, the following is the snippet of the XML file. 

entry
  nameSOMETHING /name
  facilitySOME_OTHER_THING/facility
 /entry

I have added the elements name and facility in Schema.xml file to make 
these elements indexable. I have changed the XML document above to look like - 

add
doc
 ..
 field name=nameSOMETHING/field 
 ..
/doc
/add

Can you send us the Schema.xml file you created? I suspect that 
one of the fields should be multivalued.

-- 
Fergus.


One more happy Solr user ...

2009-10-14 Thread Avlesh Singh
I am pleased to announce the latest release of a popular Indian local search
portal called http://www.burrp.com http://mumbai.burrp.com.
In prior versions of this web application, search was Lucene driven and we
had to write our own implementation of search facets amongst other painful
tasks.

I can't be happier to inform everyone on this list that search/suggest
features on the burrp! site are now powered by Solr.
Please use it and let me know if we can make it better.

Very soon, I'll be back to report another usage of Solr (a grand one by
scale).
Thank you Solr developers.

Cheers
Avlesh


Re: One more happy Solr user ...

2009-10-14 Thread Andrew McCombe
Hi
Nice site.  First search I tried was for 'italien' in 'Mumbai' which
returned zero results.   Are you using spellcheck suggestions?

Apart from that it's nice and fast.

Regards
Andrew McCombe
iWebsolutions.co.uk


2009/10/14 Avlesh Singh avl...@gmail.com

 I am pleased to announce the latest release of a popular Indian local
 search
 portal called http://www.burrp.com http://mumbai.burrp.com.
 In prior versions of this web application, search was Lucene driven and we
 had to write our own implementation of search facets amongst other
 painful
 tasks.

 I can't be happier to inform everyone on this list that search/suggest
 features on the burrp! site are now powered by Solr.
 Please use it and let me know if we can make it better.

 Very soon, I'll be back to report another usage of Solr (a grand one by
 scale).
 Thank you Solr developers.

 Cheers
 Avlesh



Re: One more happy Solr user ...

2009-10-14 Thread Avlesh Singh
Ah! I knew that was coming :)
We are planning a spell-checker integration pretty soon.

Thanks for trying out the site Andrew.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 2:53 PM, Andrew McCombe eupe...@gmail.com wrote:

 Hi
 Nice site.  First search I tried was for 'italien' in 'Mumbai' which
 returned zero results.   Are you using spellcheck suggestions?

 Apart from that it's nice and fast.

 Regards
 Andrew McCombe
 iWebsolutions.co.uk


 2009/10/14 Avlesh Singh avl...@gmail.com

  I am pleased to announce the latest release of a popular Indian local
  search
  portal called http://www.burrp.com http://mumbai.burrp.com.
  In prior versions of this web application, search was Lucene driven and
 we
  had to write our own implementation of search facets amongst other
  painful
  tasks.
 
  I can't be happier to inform everyone on this list that search/suggest
  features on the burrp! site are now powered by Solr.
  Please use it and let me know if we can make it better.
 
  Very soon, I'll be back to report another usage of Solr (a grand one by
  scale).
  Thank you Solr developers.
 
  Cheers
  Avlesh
 



Re: One more happy Solr user ...

2009-10-14 Thread Chantal Ackermann

Hi Avlesh,

that is mean to sent something like that
http://mumbai.burrp.com/pack/list/kolkata-on-a-roll
around at lunch time - in Germany(!).

Very very sadly, there are many places in Mumbai that have mastered the 
art of making authentic Kolkata rolls but I don't know of any here in 
Munich


Congratulations for launching successfully!
Chantal

Avlesh Singh schrieb:

I am pleased to announce the latest release of a popular Indian local search
portal called http://www.burrp.com http://mumbai.burrp.com.
In prior versions of this web application, search was Lucene driven and we
had to write our own implementation of search facets amongst other painful
tasks.

I can't be happier to inform everyone on this list that search/suggest
features on the burrp! site are now powered by Solr.
Please use it and let me know if we can make it better.

Very soon, I'll be back to report another usage of Solr (a grand one by
scale).
Thank you Solr developers.

Cheers
Avlesh




Re: One more happy Solr user ...

2009-10-14 Thread Avlesh Singh
If burrp! can keep pace with Solr enhancements, we are not too far from a 
munich.burrp.com ;)
Thanks for checking out the site, Chantal.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 4:47 PM, Chantal Ackermann 
chantal.ackerm...@btelligent.de wrote:

 Hi Avlesh,

 that is mean to sent something like that
 http://mumbai.burrp.com/pack/list/kolkata-on-a-roll
 around at lunch time - in Germany(!).

 Very very sadly, there are many places in Mumbai that have mastered the
 art of making authentic Kolkata rolls but I don't know of any here in
 Munich

 Congratulations for launching successfully!
 Chantal

 Avlesh Singh schrieb:

 I am pleased to announce the latest release of a popular Indian local
 search
 portal called http://www.burrp.com http://mumbai.burrp.com.
 In prior versions of this web application, search was Lucene driven and we
 had to write our own implementation of search facets amongst other
 painful
 tasks.

 I can't be happier to inform everyone on this list that search/suggest
 features on the burrp! site are now powered by Solr.
 Please use it and let me know if we can make it better.

 Very soon, I'll be back to report another usage of Solr (a grand one by
 scale).
 Thank you Solr developers.

 Cheers
 Avlesh





Sorting on Multiple fields

2009-10-14 Thread Neil Lunn
We have come up against a situation we are trying to resolve in our Solr
implementation project. This revolves mostly around how to sort results from
index data we are likely to store in multiple fields but at runtime we are
likely to query on the result of which one is most relevant. A brief
example:
We have product catalog information in the index which will have multiple
prices dependent on the user logged in and other scenarios. For
simplification this will look something like this:

price_id101 = 100.00
price_id102 = 105.00
price_id103 = 110.00
price_id104 = 95.00
(etc)

What we are looking at is at runtime we want to know which one of several
selected prices is the minimum (or maximum), but not all prices, just a
select set of say 3 or 2 id's. The purpose we are looking at is to determine
a sort order to the results. This as we would be aware approaching a SQL
respository we would feed it some query logic to say find me the least
amount of these set of id's, therefore the search approach here raises some
questions.

- Do we attempt to raise some sort of functional query to find the least
amount of the requested price id's? This would seem to imply some playing
around in the query handler to allow a function of this sort.

- Do we look at this rather than some internal method to handle the query
and sort actions as a matter of relevancy on a calculated field? If so the
methods of determining the fields included in the calculated field are
alluding me at the moment. So pointers are welcome.

- Does this ultimately involve the implementation of some sort of custom
type and handler to do this sort of task.

I am open to any response as if someone has not come across a similar
problem before and can suggest an approach we are willing to open up a patch
branch or similar to do some work on the issue. Though if there are no
suggestions this will likely move out of our current stream and into future
development.

Neil


hadoop configuarions for SOLR-1301 patch

2009-10-14 Thread Pravin Karne
Hi,
I am using SOLR-1301 path. I have build the solr with given patch.
But I am not able to configure Hadoop for above war.

I want to run solr(create index) with 3 nodes (1+2) cluster.

How to do the Hadoop configurations for above patch?
How to set master and slave?


Thanks
-Pravin




DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: Boosting of words

2009-10-14 Thread bhaskar chandrasekar
 
Hi Clark,
 
Thanks for your input.I have a query.
 
 
I have my XML which contains the following:
 
add
doc
  field name=urlhttp://www.sun.com/field
  field name=titleinformation/field
  field name=descriptionjava plays a important role in computer industry 
for web users/field
/doc
doc
  field name=urlhttp://www.askguru.com/field
  field name=titlehomepage/field
  field name=descriptionInformation about technology is stored in the web 
sites/field
/doc
doc
  field name=urlhttp://www.techie.com/field
  field name=titlepost queries/field
  field name=descriptionThis web site have more java technology related to 
web/field
/doc
/add
 
When I give “java technology” as my input in Solr admin page ,At present  I get 
output as 
 
doc
  field name=urlhttp://www.techie.com/field
  field name=titlepost queries/field
  field name=descriptionThis web site have more java technology related to 
web/field
/doc
 
Now I need to get doc which has “technology” also
 
When I give “java technology “
 
I need to get output as,I need to give boosting to doc which has “technology”. 
It should display in the below order.The output should come as 
 
doc
  field name=urlhttp://www.techie.com/field
  field name=titlepost queries/field
  field name=descriptionThis web site have more java technology related to 
web/field
/doc
doc
  field name=urlhttp://www.askguru.com/field
  field name=titlehomepage/field
  field name=descriptionInformation about technology is stored in the web 
sites/field
/doc
doc
  field name=urlhttp://www.sun.com/field
  field name=titleinformation/field
  field name=descriptionjava plays a important role in computer industry 
for web users/field
/doc
 
Let me know how to achieve the same?
 
Regards
Bhaskar


--- On Tue, 10/13/09, Nicholas Clark clark...@gmail.com wrote:


From: Nicholas Clark clark...@gmail.com
Subject: Re: Boosting of words
To: solr-user@lucene.apache.org
Date: Tuesday, October 13, 2009, 1:01 PM


Bhaskar,

Read this page, specifically how to query data.

http://lucene.apache.org/solr/tutorial.html#Querying+Data

It sounds like you are very new to Solr, so I would also suggest reading the
wiki.

http://wiki.apache.org/solr/

-Nick


On Mon, Oct 12, 2009 at 10:02 PM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:


 Hi Nicholas,

 Thanks for your input.Where exactly the query

 q=product:red color:red^10

 should be used and defined?.
 Help me.

 Regards
 Bhaskar

 --- On Mon, 10/12/09, Nicholas Clark clark...@gmail.com wrote:


 From: Nicholas Clark clark...@gmail.com
 Subject: Re: Boosting of words
 To: solr-user@lucene.apache.org
 Date: Monday, October 12, 2009, 2:13 PM


 The easiest way to boost your query is to modify your query string.

 q=product:red color:red^10

 In the above example, I have boosted the color field. If red is found in
 that field, it will get a boost of 10. If it is only found in the product
 field, then there will be no boost.

 Here's more information:

 http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms

 Once you're comfortable with that, I suggest that you look into using the
 DisMax request handler. It will allow you to easily search across multiple
 fields with custom boost values.

 http://wiki.apache.org/solr/DisMaxRequestHandler

 -Nick


 On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar 
 bas_s...@yahoo.co.in
  wrote:

  Hi,
 
  I would like to know how can i give boosting to search input in Solr.
  Where exactly should i make the changes?.
 
  Regards
  Bhaskar
 
 
 








  

lazy loading error usin Solr Cell

2009-10-14 Thread Stefano Nannetti
Hi, I'm new to Solr and Java in general. I'd like to index rich documents
with Solr Cell for my Intranet, so I downloaded the last Solr nightly build
(solr-2009-10-14.tgz) and tried to follow the Solr Cell tutorial at
wiki.apache.org/solr/ExtractingRequestHandler.

I started Solr, copied a simple html file (prova.html) into the example
directory, moved to that directory and from there tried:

curl 'http://localhost:8983/solr/update/extract?literal.id=doc1commit=true'
-F myfi...@prova.html

But I received a lazy loading error. If someone could help me I copy
here the output. Thanks in advance.

Ste

Output:

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2prelazy loading error

org.apache.solr.common.SolrException: lazy loading error
   at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
   at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.lang.IllegalStateException: Unable to create a
XmlRootExtractor
   at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:135)
   at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:58)
   at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:75)
   at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
   at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:96)
   at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:85)
   at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:76)
   at
org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:173)
   at
org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:165)
   at
org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80)
   at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
   ...21 more
Caused by: org.xml.sax.SAXNotSupportedException:
http://javax.xml.XMLConstants/feature/secure-processing
   at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90)
   at
org.apache.tika.detect.XmlRootExtractor.lt;initgt;(XmlRootExtractor.java:47)
   at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:133)
   ...31 more
/pre
pRequestURI=/solr/update/extract/ppismalla href=
http://jetty.mortbay.org/;Powered by
Jetty:///a/small/i/pbr/

br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html


Re: lazy loading error usin Solr Cell

2009-10-14 Thread Yonik Seeley
Hmmm, I just tried the first steps of the Solr Cell tutorial, and it
worked fine for me (well, with the exception that there is no site
directory... I went to docs instead - I'll fix that).

Oh wait - I see your problem:
   at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90)

You're path picked up gcj, which is not supported by Solr.  You need
to use a different JVM.  If you didn't have anything else in mind, I'd
recommend just going with what's most widely used - the latest
released Sun JVM (currently 1.6_16) or OpenJDK.

-Yonik
http://www.lucidimagination.com



On Wed, Oct 14, 2009 at 9:09 AM, Stefano Nannetti
stefano.nanne...@gmail.com wrote:
 Hi, I'm new to Solr and Java in general. I'd like to index rich documents
 with Solr Cell for my Intranet, so I downloaded the last Solr nightly build
 (solr-2009-10-14.tgz) and tried to follow the Solr Cell tutorial at
 wiki.apache.org/solr/ExtractingRequestHandler.

 I started Solr, copied a simple html file (prova.html) into the example
 directory, moved to that directory and from there tried:

 curl 'http://localhost:8983/solr/update/extract?literal.id=doc1commit=true'
 -F myfi...@prova.html

 But I received a lazy loading error. If someone could help me I copy
 here the output. Thanks in advance.

 Ste

 Output:

 html
 head
 meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
 titleError 500 /title
 /head
 bodyh2HTTP ERROR: 500/h2prelazy loading error

 org.apache.solr.common.SolrException: lazy loading error
   at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
   at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Caused by: java.lang.IllegalStateException: Unable to create a
 XmlRootExtractor
   at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:135)
   at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:58)
   at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:75)
   at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
   at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:96)
   at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:85)
   at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:76)
   at
 org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:173)
   at
 org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:165)
   at
 org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80)
   at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
   ...21 more
 Caused by: org.xml.sax.SAXNotSupportedException:
 http://javax.xml.XMLConstants/feature/secure-processing
   at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90)
   at
 org.apache.tika.detect.XmlRootExtractor.lt;initgt;(XmlRootExtractor.java:47)
   at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:133)
   ...31 more
 /pre
 pRequestURI=/solr/update/extract/ppismalla href=
 http://jetty.mortbay.org/;Powered by
 Jetty:///a/small/i/pbr/

 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/
 br/

 /body
 /html



Re: Boosting of words

2009-10-14 Thread AHMET ARSLAN

 Hi Clark,
  
 Thanks for your input. I have a query.
  
  
 I have my XML which contains the following:
  
 add
 doc
   field name=urlhttp://www.sun.com/field
   field name=titleinformation/field
   field name=descriptionjava plays a important
 role in computer industry for web users/field
 /doc
 doc
   field name=urlhttp://www.askguru.com/field
   field name=titlehomepage/field
   field name=descriptionInformation about
 technology is stored in the web sites/field
 /doc
 doc
   field name=urlhttp://www.techie.com/field
   field name=titlepost queries/field
   field name=descriptionThis web site have more
 java technology related to web/field
 /doc
 /add
  
 When I give “java technology” as my input in Solr admin
 page ,At present  I get output as 
  
 doc
   field name=urlhttp://www.techie.com/field
   field name=titlepost queries/field
   field name=descriptionThis web site have more
 java technology related to web/field
 /doc
  
 Now I need to get doc which has “technology” also
  
 When I give “java technology “
  
 I need to get output as,I need to give boosting to doc
 which has “technology”. It should display in the below
 order.The output should come as 
  
 doc
   field name=urlhttp://www.techie.com/field
   field name=titlepost queries/field
   field name=descriptionThis web site have more
 java technology related to web/field
 /doc
 doc
   field name=urlhttp://www.askguru.com/field
   field name=titlehomepage/field
   field name=descriptionInformation about
 technology is stored in the web sites/field
 /doc
 doc
   field name=urlhttp://www.sun.com/field
   field name=titleinformation/field
   field name=descriptionjava plays a important
 role in computer industry for web users/field
 /doc
  
 Let me know how to achieve the same?

The query :  java^1 OR technology^100   will do it. Results will be in this 
order:

1-)This web site have more java technology related to web
2-)Information about technology is stored in the web sites
3-)java plays a important role in computer industry for web users

1-) contains both java and technology
2-) contains only technology
3-) contains only java

Is that what you want? 

Note that there is no  quotes in the query above. And you can adjust boost 
factors (1 and 100) according to your needs. Use OR operator between terms. You 
set individual terms boost with ^ operator.

hope this helps.







Re: Sorting on Multiple fields

2009-10-14 Thread Avlesh Singh

 Do we attempt to raise some sort of functional query to find the least
 amount of the requested price id's? This would seem to imply some playing
 around in the query handler to allow a function of this sort.

Unless I am missing something, this information can always be obtained by
post-processing the data obtained from search results. Isn't it?

Do we look at this rather than some internal method to handle the query
 and sort actions as a matter of relevancy on a calculated field? If so the
 methods of determining the fields included in the calculated field are
 alluding me at the moment. So pointers are welcome.

I really did not understand the question. Is it related to sorting of
results?

Does this ultimately involve the implementation of some sort of custom
 type and handler to do this sort of task.

If the answer to my previous question is affirmative, then yes, you would
need to implement custom sorting behavior. It can be achieved in multiple
ways depending upon your requirement. From something as simple as
function-queries to using the power of dynamic fields to writing a custom
field-type to writing a custom implementation of Lucene's Similarity .. any
of these can be a potential answer to custom sorting.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 5:53 PM, Neil Lunn neil.l...@trixan.com wrote:

 We have come up against a situation we are trying to resolve in our Solr
 implementation project. This revolves mostly around how to sort results
 from
 index data we are likely to store in multiple fields but at runtime we are
 likely to query on the result of which one is most relevant. A brief
 example:
 We have product catalog information in the index which will have multiple
 prices dependent on the user logged in and other scenarios. For
 simplification this will look something like this:

 price_id101 = 100.00
 price_id102 = 105.00
 price_id103 = 110.00
 price_id104 = 95.00
 (etc)

 What we are looking at is at runtime we want to know which one of several
 selected prices is the minimum (or maximum), but not all prices, just a
 select set of say 3 or 2 id's. The purpose we are looking at is to
 determine
 a sort order to the results. This as we would be aware approaching a SQL
 respository we would feed it some query logic to say find me the least
 amount of these set of id's, therefore the search approach here raises
 some
 questions.

 - Do we attempt to raise some sort of functional query to find the least
 amount of the requested price id's? This would seem to imply some playing
 around in the query handler to allow a function of this sort.

 - Do we look at this rather than some internal method to handle the query
 and sort actions as a matter of relevancy on a calculated field? If so the
 methods of determining the fields included in the calculated field are
 alluding me at the moment. So pointers are welcome.

 - Does this ultimately involve the implementation of some sort of custom
 type and handler to do this sort of task.

 I am open to any response as if someone has not come across a similar
 problem before and can suggest an approach we are willing to open up a
 patch
 branch or similar to do some work on the issue. Though if there are no
 suggestions this will likely move out of our current stream and into future
 development.

 Neil



Solr 1.4 release candidate

2009-10-14 Thread Yonik Seeley
Folks, we've been in code freeze since Monday and a test release
candidate was created yesterday, however it already had to be updated
last night due to a serious bug found in Lucene.

For now you can use the latest nightly build to get any recent changes
like this:
http://people.apache.org/builds/lucene/solr/nightly/

We'll probably release the final bits next week, so in the meantime,
download the latest nightly build and give it a spin!

-Yonik
http://www.lucidimagination.com


Lucene's CachingTokenFilter in index analyzer chain

2009-10-14 Thread Enrico Detoma
Hi all,

I'm trying to add a CachingTokenFilter derived filter to the index analyzer
chain for field text.
I need to work with CachingTokenFilter because I need to look-ahead in the
token stream (my filter is a stop phrases filter, where I look ahead in
the index to see if a stop phrase is found and then remove it from the token
stream).

When I test the correctness of the chain using this query:

/solr/analysis/field?analysis.fieldname=descriptionanalysis.fieldtype=textanalysis.fieldvalue=...
everything seems ok (I see that the stop phrases are removed from the token
stream).

But when I index documents, the index is totally empty: all searches on
text fields give no results at all!

Here is my index chain, where StopPhrasesFilterFactory is my custom filter
which derives from CachingTokenFilter:

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
  add enablePositionIncrements=true in both the index and query
  analyzers to leave a 'gap' for more accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=org.apache.solr.analysis.StopPhrasesFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=Italian
protected=protwords.txt/
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=Italian
protected=protwords.txt/
  /analyzer
/fieldType

Is it wrong to use CachingTokenFilter in the index chain?

Regards
Enrico


Re: FACET_SORT_INDEX descending?

2009-10-14 Thread Gerald Snyder

Thanks for the answer and the alternative idea.--Gerald


Chris Hostetter wrote:

: Reverse alphabetical ordering.   The option index provides alphabetical
: ordering. 

be careful: index doesn't mean alphabetical -- it means the natural 
ordering of terms as they exist in the index. for non ascii characters 
this is not neccessarily something that could be considered alphabetical 
(or sensical in terms of the locale).


The short answer is: no, there is no way to get reverse index order at 
the moment.


: I have a year_facet field, that I would like to display in reverse order (most
: recent years first).  Perhaps there is some other way to accomplish this.

the simplest way is to encode the year in some format thta will cause it 
to naturally sort in the order you want - so instead of indexing 1976 
and 2007 you could index 8024:1976 and 7993:2007 and then only 
display the part that comes after the :




-Hoss


  


POST queries to Solr instead of HTTP Gets with query string parameters

2009-10-14 Thread Glock, Thomas

Is a way to POST queries to Solr instead of supplying query string
parameters ?

Some of our queries may hit up against URL size limits.

If so, can someone provide an example ?

Thanks in advance



Re: hadoop configuarions for SOLR-1301 patch

2009-10-14 Thread Shalin Shekhar Mangar
On Wed, Oct 14, 2009 at 6:15 PM, Pravin Karne pravin_ka...@persistent.co.in
 wrote:

 Hi,
 I am using SOLR-1301 path. I have build the solr with given patch.
 But I am not able to configure Hadoop for above war.

 I want to run solr(create index) with 3 nodes (1+2) cluster.

 How to do the Hadoop configurations for above patch?
 How to set master and slave?


Pravin, questions on specific patches are best asked on the Jira issue.

-- 
Regards,
Shalin Shekhar Mangar.


Re: One more happy Solr user ...

2009-10-14 Thread Shalin Shekhar Mangar
On Wed, Oct 14, 2009 at 2:16 PM, Avlesh Singh avl...@gmail.com wrote:

 I am pleased to announce the latest release of a popular Indian local
 search
 portal called http://www.burrp.com http://mumbai.burrp.com.
 In prior versions of this web application, search was Lucene driven and we
 had to write our own implementation of search facets amongst other
 painful
 tasks.

 I can't be happier to inform everyone on this list that search/suggest
 features on the burrp! site are now powered by Solr.
 Please use it and let me know if we can make it better.


This is great!

Can you please add burrp to http://wiki.apache.org/solr/PublicServers?

-- 
Regards,
Shalin Shekhar Mangar.


Re: POST queries to Solr instead of HTTP Gets with query string parameters

2009-10-14 Thread Shalin Shekhar Mangar
On Wed, Oct 14, 2009 at 8:06 PM, Glock, Thomas thomas.gl...@pfizer.comwrote:


 Is a way to POST queries to Solr instead of supplying query string
 parameters ?


All Solr requests are normal HTTP requests. Most HTTP client libraries in
various languages have a way to select POST instead of GET. If you are using
Solrj client, then you can use
QueryRequest#setMethod(SolrRequest.METHOD.POST)

-- 
Regards,
Shalin Shekhar Mangar.


Re: lazy loading error usin Solr Cell

2009-10-14 Thread Stefano Nannetti
I removed the existing JVM from my Ubuntu 9.04 and installed OpenJDK. 
Now it's working fine. Thanks, now I can go deeper in the use of Solr!!


Ste

Yonik Seeley ha scritto:

Hmmm, I just tried the first steps of the Solr Cell tutorial, and it
worked fine for me (well, with the exception that there is no site
directory... I went to docs instead - I'll fix that).

Oh wait - I see your problem:

  at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90)


You're path picked up gcj, which is not supported by Solr.  You need
to use a different JVM.  If you didn't have anything else in mind, I'd
recommend just going with what's most widely used - the latest
released Sun JVM (currently 1.6_16) or OpenJDK.

-Yonik
http://www.lucidimagination.com



On Wed, Oct 14, 2009 at 9:09 AM, Stefano Nannetti
stefano.nanne...@gmail.com wrote:

Hi, I'm new to Solr and Java in general. I'd like to index rich documents
with Solr Cell for my Intranet, so I downloaded the last Solr nightly build
(solr-2009-10-14.tgz) and tried to follow the Solr Cell tutorial at
wiki.apache.org/solr/ExtractingRequestHandler.

I started Solr, copied a simple html file (prova.html) into the example
directory, moved to that directory and from there tried:

curl 'http://localhost:8983/solr/update/extract?literal.id=doc1commit=true'
-F myfi...@prova.html

But I received a lazy loading error. If someone could help me I copy
here the output. Thanks in advance.

Ste

Output:

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 /title
/head
bodyh2HTTP ERROR: 500/h2prelazy loading error

org.apache.solr.common.SolrException: lazy loading error
  at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249)
  at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
  at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
  at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
  at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
  at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
  at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
  at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
  at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
  at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
  at org.mortbay.jetty.Server.handle(Server.java:285)
  at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
  at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
  at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
  at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
  at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
  at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
  at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.lang.IllegalStateException: Unable to create a
XmlRootExtractor
  at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:135)
  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:58)
  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:75)
  at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90)
  at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:96)
  at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:85)
  at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:76)
  at
org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:173)
  at
org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:165)
  at
org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80)
  at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244)
  ...21 more
Caused by: org.xml.sax.SAXNotSupportedException:
http://javax.xml.XMLConstants/feature/secure-processing
  at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90)
  at
org.apache.tika.detect.XmlRootExtractor.lt;initgt;(XmlRootExtractor.java:47)
  at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:133)
  ...31 more
/pre
pRequestURI=/solr/update/extract/ppismalla href=
http://jetty.mortbay.org/;Powered by
Jetty:///a/small/i/pbr/

br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/
br/

/body
/html





RE: POST queries to Solr instead of HTTP Gets with query string parameters

2009-10-14 Thread Ankit Bhatnagar


Solrj1.4 supports QueryRequest#setMethod(SolrRequest.METHOD.POST)

but Solrj1.3 does not.

-Ankit

From: Shalin Shekhar Mangar [shalinman...@gmail.com]
Sent: Wednesday, October 14, 2009 11:08 AM
To: solr-user@lucene.apache.org
Subject: Re: POST queries to Solr instead of HTTP Gets with query string
parameters

On Wed, Oct 14, 2009 at 8:06 PM, Glock, Thomas thomas.gl...@pfizer.comwrote:


 Is a way to POST queries to Solr instead of supplying query string
 parameters ?


All Solr requests are normal HTTP requests. Most HTTP client libraries in
various languages have a way to select POST instead of GET. If you are using
Solrj client, then you can use
QueryRequest#setMethod(SolrRequest.METHOD.POST)

--
Regards,
Shalin Shekhar Mangar.


Re: Solr 1.4 release candidate

2009-10-14 Thread Joe Calderon
maybe im just not familiar with the way the version numbers works in
trunk but when i build the latest nightly the jars have names like
*-1.5-dev.jar,  is that normal?

On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 Folks, we've been in code freeze since Monday and a test release
 candidate was created yesterday, however it already had to be updated
 last night due to a serious bug found in Lucene.

 For now you can use the latest nightly build to get any recent changes
 like this:
 http://people.apache.org/builds/lucene/solr/nightly/

 We'll probably release the final bits next week, so in the meantime,
 download the latest nightly build and give it a spin!

 -Yonik
 http://www.lucidimagination.com



Solr/Lucene keeps eating up memory while idling

2009-10-14 Thread nonrenewable

I'm curious why this is occurring and whether i can prevent it. This is my
scenario:

Locally I have an idle running solr 1.3 service using lucene 2.4.1 which has
an index of ~330K documents containing ~10 fields each(total size ~12GB).
Currently I've turned off all caching, lazy field loading, however i do have
facet fields set for some request handlers. 

What i'm seeing is heap space usage increasing by ~1.2MB per 2 sec (by
java.lang.String objects). I'm assuming they're being used by lucene but i
may be wrong about that, since i have no actual data to confirm it. Why
exactly is this happening, considering no requests are being serviced?
Shouldn't the memory usage stabilise with a certain set of information and
only be affected on requests? Additionally there is a full GC every half
hour, which seems very unreasonable on a machine that isn't actually being
used as a service. 

I really hope there's just a certain setting that i've overlooked, or a
concept i'm not understanding because otherwise this behaviour seems very
unreasonable...

Thanks beforehand,
Tony
-- 
View this message in context: 
http://www.nabble.com/Solr-Lucene-keeps-eating-up-memory-while-idling-tp25894357p25894357.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 1.4 release candidate

2009-10-14 Thread Yonik Seeley
On Wed, Oct 14, 2009 at 12:04 PM, Joe Calderon calderon@gmail.com wrote:
 maybe im just not familiar with the way the version numbers works in
 trunk but when i build the latest nightly the jars have names like
 *-1.5-dev.jar,  is that normal?

Looks like Grant switched the version number a little early - nothing
to worry about though.
When we build official releases, we explicitly specify the version
number anyway.

-Yonik
http://www.lucidimagination.com

 On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 Folks, we've been in code freeze since Monday and a test release
 candidate was created yesterday, however it already had to be updated
 last night due to a serious bug found in Lucene.

 For now you can use the latest nightly build to get any recent changes
 like this:
 http://people.apache.org/builds/lucene/solr/nightly/

 We'll probably release the final bits next week, so in the meantime,
 download the latest nightly build and give it a spin!

 -Yonik
 http://www.lucidimagination.com




Re: Letters with accent in query

2009-10-14 Thread R. Tan
Correct. Apparently, Firefox is the only browser that translate it é to
%E9.


On Wed, Oct 14, 2009 at 3:06 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I'm querying with an accented keyword such as café but the debug info
 : shows that it is only searching for caf. I'm using the ISOLatin1Accent
 ...
 : http://localhost:8983/solr/select?q=%E9debugQuery=true
 :
 : Params return shows this:
 : lst name=params
 : str name=q/

 ...that's a pretty good tip off that you aren't URL encoding the character
 they way your servlet container is expecting it.  I suspect what you
 really want is...

   http://localhost:8983/solr/select?q=%C3%A9debugQuery=true






 -Hoss



Re: POST queries to Solr instead of HTTP Gets with query string parameters

2009-10-14 Thread Shalin Shekhar Mangar
On Wed, Oct 14, 2009 at 8:54 PM, Ankit Bhatnagar abhatna...@vantage.comwrote:



 Solrj1.4 supports QueryRequest#setMethod(SolrRequest.METHOD.POST)

 but Solrj1.3 does not.


I just checked the 1.3 release. It most definitely exists in 1.3

-- 
Regards,
Shalin Shekhar Mangar.


Adding callback url to data import handler...Is this possible?

2009-10-14 Thread William Pierce
Folks:

I am pretty happy with DIH -- it seems to work very well for my situation.
Thanks!!!

The one issue I see has to do with the fact that I need to keep polling 
url/dataimport to check if the data import completed successfully.   I need 
to know when/if the import is completed (successfully or otherwise) so that I 
can update appropriate structures in our app.  

What I would like is something like what Google Checkout API offers -- a 
callback URL.  That is, I should be able to pass along a URL to DIH.  Once it 
has completed the import, it can invoke the provided URL.  This provides a 
callback mechanism for those of us who don't have the liberty to change SOLR 
source code.  We can then do the needful upon receiving this callback.

If this functionality is already provided in some form/fashion, I'd love to 
know.

All in all, great functionality that has significantly helped me out!

Cheers,

- Bill

Re: capitalization and delimiters

2009-10-14 Thread Shalin Shekhar Mangar
On Mon, Oct 12, 2009 at 9:09 PM, Audrey Foo au...@hotmail.com wrote:


 In my search docs, I have content such as 'powershot' and 'powerShot'.
 I would expect 'powerShot' would be searched as 'power', 'shot' and
 'powershot', so that results for all these are returned. Instead, only
 results for 'power' and 'shot' are returned.
 Any suggestions?
 In the schema, index analyzer:filter
 class=solr.WordDelimiterFilterFactory generateWordParts=0
 generateNumberParts=0 catenateWords=1 catenateNumbers=1
 catenateAll=0/filter class=solr.LowerCaseFilterFactory/
 In the schema, query analyzerfilter
 class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1/filter
 class=solr.LowerCaseFilterFactory/


I find your index-time and query-time configuration very strange. Assuming
that you also have a lowercase filter, it seems that a token powerShot
will not be split and indexed as powershot. Then during query, both
power and shot will match nothing.

I suggest you start with the configuration given in the example schema.
Else, it'd be easier for us if you can help us understand the reasons behind
changing these parameters.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread Avlesh Singh
Had a look at EventListeners in
DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

Cheers
Avlesh

On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.comwrote:

 Folks:

 I am pretty happy with DIH -- it seems to work very well for my situation.
Thanks!!!

 The one issue I see has to do with the fact that I need to keep polling
 url/dataimport to check if the data import completed successfully.   I
 need to know when/if the import is completed (successfully or otherwise) so
 that I can update appropriate structures in our app.

 What I would like is something like what Google Checkout API offers -- a
 callback URL.  That is, I should be able to pass along a URL to DIH.  Once
 it has completed the import, it can invoke the provided URL.  This provides
 a callback mechanism for those of us who don't have the liberty to change
 SOLR source code.  We can then do the needful upon receiving this callback.

 If this functionality is already provided in some form/fashion, I'd love to
 know.

 All in all, great functionality that has significantly helped me out!

 Cheers,

 - Bill


Re: http replication transfer speed

2009-10-14 Thread Shalin Shekhar Mangar
Queries on the slave could be one reason. However, I see that in the perf
test on the wiki also shows the same transfer speed (with rsync too!). Not
sure whats up.

2009/10/12 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Did you try w/o firing queries on the slave?

 On Sun, Oct 11, 2009 at 6:05 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
  On a drive that can do 40+ that's getting query load might have it's
 writes
  knocked down to that?
 
  - Mark
 
  http://www.lucidimagination.com (mobile)
 
  On Oct 10, 2009, at 6:41 PM, Mark Miller markrmil...@gmail.com wrote:
 
  Anyone know why you would see a transfer speed of just 10-20MB over a
  gigbit network connection?
 
  Even with standard drives, I would expect to at least see around 40MB.
  Has anyone seen over 10-20 using replication?
 
  Any ideas on what the bottleneck should be? I think even a standard
  drive can do writes of a bit of 40MB/s, and certainly reads over that.
 
  Thoughts?
 
  --
  - Mark
 
  http://www.lucidimagination.com
 
 
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com




-- 
Regards,
Shalin Shekhar Mangar.


Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread William Pierce
Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I 
mentioned this would require us to write Java code.


Our app(s) are entirely windows/asp.net/C# so while we could add Java in a 
pinch,  we'd prefer to stick to using SOLR using its convenient REST-style 
interfaces which makes no demand on our app environment.


Thanks again for your suggestion!

Cheers,

Bill

--
From: Avlesh Singh avl...@gmail.com
Sent: Wednesday, October 14, 2009 10:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding callback url to data import handler...Is this possible?


Had a look at EventListeners in
DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

Cheers
Avlesh

On Wed, Oct 14, 2009 at 11:21 PM, William Pierce 
evalsi...@hotmail.comwrote:



Folks:

I am pretty happy with DIH -- it seems to work very well for my 
situation.

   Thanks!!!

The one issue I see has to do with the fact that I need to keep polling
url/dataimport to check if the data import completed successfully. 
I
need to know when/if the import is completed (successfully or otherwise) 
so

that I can update appropriate structures in our app.

What I would like is something like what Google Checkout API offers -- a
callback URL.  That is, I should be able to pass along a URL to DIH. 
Once
it has completed the import, it can invoke the provided URL.  This 
provides

a callback mechanism for those of us who don't have the liberty to change
SOLR source code.  We can then do the needful upon receiving this 
callback.


If this functionality is already provided in some form/fashion, I'd love 
to

know.

All in all, great functionality that has significantly helped me out!

Cheers,

- Bill




RE: Lucene Merge Threads

2009-10-14 Thread Giovanni Fernandez-Kincade
Does anyone know the correct syntax to specify the maximum number of threads 
for the ConcurrentMergeScheduler?

Also, is there any concrete way to know when the merge is actually complete 
(aside from profiling the machine)?

Thanks,
Gio.

-Original Message-
From: Giovanni Fernandez-Kincade 
Sent: Tuesday, October 13, 2009 7:59 PM
To: Giovanni Fernandez-Kincade; 'solr-user@lucene.apache.org'; 
'noble.p...@gmail.com'
Subject: RE: Lucene Merge Threads

I'm still getting the error after getting the latest from trunk and building 
it. 
This is what I added to the solrconfig.xml:
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount5/int
/mergeScheduler


Any other ideas?

Thanks,
Gio.

SEVERE: org.apache.solr.common.SolrException: Error loading class '
5
'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: 
5

at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
at java.security.AccessController.doPrivileged(Unknown Source)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.$$YJP$$forName0(Native Method)
at java.lang.Class.forName0(Unknown Source)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294)
... 28 more

-Original Message-
From: Giovanni Fernandez-Kincade 
Sent: Tuesday, October 13, 2009 10:50 AM
To: solr-user@lucene.apache.org; 'noble.p...@gmail.com'
Subject: RE: Lucene Merge Threads

Here's the version information from the admin page:

Solr Specification Version: 1.3.0.2009.07.28.18.51.06
Solr Implementation Version: 1.4-dev ${svnversion} - gkincade - 2009-07-28 
18:51:06
Lucene Specification Version: 2.9-dev
Lucene Implementation Version: 2.9-dev 794238 - 2009-07-15 18:05:08




-Original Message-
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf 

Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread Avlesh Singh
Hmmm ... I think this is a valid use case and it might be a good idea to
support it in someway.
I will post this thread on the dev-mailing list to seek opinion.

Cheers
Avlesh

On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote:

 Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I
 mentioned this would require us to write Java code.

 Our app(s) are entirely windows/asp.net/C# so while we could add Java in a
 pinch,  we'd prefer to stick to using SOLR using its convenient REST-style
 interfaces which makes no demand on our app environment.

 Thanks again for your suggestion!

 Cheers,

 Bill

 --
 From: Avlesh Singh avl...@gmail.com
 Sent: Wednesday, October 14, 2009 10:59 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Adding callback url to data import handler...Is this possible?


  Had a look at EventListeners in
 DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

 Cheers
 Avlesh

 On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com
 wrote:

  Folks:

 I am pretty happy with DIH -- it seems to work very well for my
 situation.
   Thanks!!!

 The one issue I see has to do with the fact that I need to keep polling
 url/dataimport to check if the data import completed successfully. I
 need to know when/if the import is completed (successfully or otherwise)
 so
 that I can update appropriate structures in our app.

 What I would like is something like what Google Checkout API offers -- a
 callback URL.  That is, I should be able to pass along a URL to DIH. Once
 it has completed the import, it can invoke the provided URL.  This
 provides
 a callback mechanism for those of us who don't have the liberty to change
 SOLR source code.  We can then do the needful upon receiving this
 callback.

 If this functionality is already provided in some form/fashion, I'd love
 to
 know.

 All in all, great functionality that has significantly helped me out!

 Cheers,

 - Bill





how to get field contents out of Document object

2009-10-14 Thread Joe Calderon
hello *, sorry if this seems like a dumb question, im still fairly new
to working with lucene/solr internals.

given a Document object, what is the proper way to fetch an integer
value for a field called num_in_stock, it is both indexed and stored

thx much

--joe


Opaque replication failures

2009-10-14 Thread Michael
Hi,

I have a multicore Solr 1.4 setup.  core_master is a 3.7G master for
replication, and core_slave is a 500 byte slave pointing to the
master.  I'm using the example replication configuration from
solrconfig.xml, with ${enable.master} and ${enable.slave} properties
so that the master and slave can use the same solrconfig.xml.

When I attempt to replicate (every 60 seconds or by pressing the
button on the slave replication admin page), it doesn't work.
Unfortunately, neither the admin page nor the REST API details
command show anything useful, and the logs show no errors.

How can I get insight into what is causing the failure?  I assume it's
some configuration problem but don't know where to start.

Thanks in advance for any help!  Config files are below.
Michael



Here is my solr.xml:

?xml version='1.0' encoding='UTF-8'?solr sharedLib=lib persistent=true
cores adminPath=/admin/cores shareSchema=true
  core name=core_master instanceDir=. dataDir=/home/search/solr/data/5
property name=enable.master value=true /
  /core
  core name=core_slave instanceDir=. dataDir=/home/search/solr/data/1
property name=enable.slave value=true /
  /core
/cores
/solr

And here's the relevant chunk of my solrconfig.xml:

requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
str name=enable${enable.master:false}/str
str name=replicateAftercommit/str
/lst
lst name=slave
str name=enable${enable.slave:false}/str
str 
name=masterUrlhttp://localhost:31000/solr/core_master/replication/str
str name=pollInterval00:00:60/str
 /lst
/requestHandler

Here's what the details command on the slave has to say -- nothing
explanatory that I can see.  Is the isReplicating=false worrying?

lst name=details

  str name=indexSize589 bytes/str
  str name=indexPath/home/search/solr/data/1/index/str
  arr name=commits/
  str name=isMasterfalse/str
  str name=isSlavetrue/str
  long name=indexVersion1254772638413/long
  long name=generation2/long

  lst name=slave
lst name=masterDetails
  str name=indexSize3.75 GB/str
  str name=indexPath/home/search/solr/data/5/index/str
  arr name=commits/
  str name=isMastertrue/str
  str name=isSlavefalse/str
  long name=indexVersion1254772639291/long
  long name=generation156/long
/lst
str 
name=masterUrlhttp://localhost:31000/solr/core_master/replication/str
str name=pollInterval00:00:60/str
str name=indexReplicatedAtWed Oct 14 14:25:22 EDT 2009/str

arr name=indexReplicatedAtList
  strWed Oct 14 14:25:22 EDT 2009/str
  strWed Oct 14 14:25:22 EDT 2009/str
  strWed Oct 14 14:25:21 EDT 2009/str
  strWed Oct 14 14:24:27 EDT 2009/str
  (etc)
/arr
arr name=replicationFailedAtList
  strWed Oct 14 14:25:22 EDT 2009/str
  strWed Oct 14 14:25:22 EDT 2009/str
  strWed Oct 14 14:25:21 EDT 2009/str
  strWed Oct 14 14:24:27 EDT 2009/str
  (etc)
/arr

str name=timesIndexReplicated1481/str
str name=lastCycleBytesDownloaded0/str
str name=timesFailed1481/str
str name=replicationFailedAtWed Oct 14 14:25:22 EDT 2009/str
str name=previousCycleTimeInSeconds0/str
str name=isReplicatingfalse/str
  /lst

/lst


(Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?

2009-10-14 Thread Teruhiko Kurosaka
I've downloaded solr-2009-10-12.zip and tried to
compile my TokenizerFactory impelmentation against this
version of Solr.  Compilation failed. One of the causes
is that the compiler couldn't find 
org.apache.solr.common.ReosourceLoader. 

I discovered this class in apache-solr-solrj-nightly.jar.
I didn't add this classpath at the first time because
this jar sounds like the jar for building Java client.
I needed ResourceLoader to write my TokenizerFactory.

I wonder why the common classes are in the solrj JAR?
Is the solrj JAR not just for the clients?

BTW, is there some sort of transition guide for Solr 1.4?
I see there are changes how classes are divided into JARs
like above, and there are some incompatible API changes.
It'll be greate if such information can be part of CHANGES.txt.

-kuro 


Re: solr IOException

2009-10-14 Thread Elaine Li
Hi Yonik,

I tried the POST method in my ajax request in javascript. It does not
work. I still get the same error message.

Elaine

On Tue, Oct 13, 2009 at 5:12 PM, Yonik Seeley ysee...@gmail.com wrote:
 Jetty has a maximum request size for HTTP-GET... can you use POST instead?

 -Yonik
 http://www.lucidimagination.com

 On Tue, Oct 13, 2009 at 4:33 PM, Elaine Li elaine.bing...@gmail.com wrote:
 Hi,

 In my query, i have around 80 boolean clauses. I don't know if it is
 because the number of boolean clauses are too big, so I got into this
 problem.
 My solr config file actually says the max number to be 1024.

 Can any one help?

 _header=[1515632954,1939520811,m=3653,g=4096,p=4096,c=4096]={sauidp=U601264301252517927557;
 CoreID6=01421694673512525179481ci=90130510,90175093,90175119,90175106;
 DEFAULTFORMAT=specific;
 BUGLIST=5%3A11%3A12%3A36%3A39%3A63%3A77%3A80%3A100%3A106%3A109%3A111%3A114%3A119%3A122%3A125%3A127%3A138%3A142%3A152%3A153%3A154%3A155%3A156%3A157%3A158%3A169%3A178%3A180%3A182%3A183%3A186%3A188%3A190%3A194%3A198%3A199%3A200%3A202%3A206%3A209%3A211%3A212%3A213%3A217%3A219%3A220%3A233%3A236%3A242%3A243%3A249%3A255%3}{}
 _buffer=[1515632954,1939520811,m=3653,g=4096,p=4096,c=4096]={sauidp=U601264301252517927557;
 CoreID6=01421694673512525179481ci=90130510,90175093,90175119,90175106;
 DEFAULTFORMAT=specific;
 BUGLIST=5%3A11%3A12%3A36%3A39%3A63%3A77%3A80%3A100%3A106%3A109%3A111%3A114%3A119%3A122%3A125%3A127%3A138%3A142%3A152%3A153%3A154%3A155%3A156%3A157%3A158%3A169%3A178%3A180%3A182%3A183%3A186%3A188%3A190%3A194%3A198%3A199%3A200%3A202%3A206%3A209%3A211%3A212%3A213%3A217%3A219%3A220%3A233%3A236%3A242%3A243%3A249%3A255%3}{}
 2009-10-13 16:20:28.800::WARN:  handle failed
 java.io.IOException: FULL
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at 
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Thanks.

 Elaine




RE: Lucene Merge Threads

2009-10-14 Thread Giovanni Fernandez-Kincade
In case anyone is having the same problem, I finally got this working, using 
the nightly build link that Yonik sent around:
http://people.apache.org/builds/lucene/solr/nightly/

Thanks,
Gio.
-Original Message-
From: Giovanni Fernandez-Kincade 
Sent: Wednesday, October 14, 2009 2:10 PM
To: Giovanni Fernandez-Kincade; solr-user@lucene.apache.org; 
noble.p...@gmail.com
Subject: RE: Lucene Merge Threads

Does anyone know the correct syntax to specify the maximum number of threads 
for the ConcurrentMergeScheduler?

Also, is there any concrete way to know when the merge is actually complete 
(aside from profiling the machine)?

Thanks,
Gio.

-Original Message-
From: Giovanni Fernandez-Kincade 
Sent: Tuesday, October 13, 2009 7:59 PM
To: Giovanni Fernandez-Kincade; 'solr-user@lucene.apache.org'; 
'noble.p...@gmail.com'
Subject: RE: Lucene Merge Threads

I'm still getting the error after getting the latest from trunk and building 
it. 
This is what I added to the solrconfig.xml:
mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler
int name=maxThreadCount5/int
/mergeScheduler


Any other ideas?

Thanks,
Gio.

SEVERE: org.apache.solr.common.SolrException: Error loading class '
5
'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325)
at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: 
5

at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.$$YJP$$doPrivileged(Native Method)
at java.security.AccessController.doPrivileged(Unknown Source)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.$$YJP$$forName0(Native Method)
at java.lang.Class.forName0(Unknown Source)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294)
... 28 more

-Original Message-
From: Giovanni Fernandez-Kincade 
Sent: Tuesday, October 13, 2009 10:50 AM
To: solr-user@lucene.apache.org; 'noble.p...@gmail.com'
Subject: RE: 

Re: how to get field contents out of Document object

2009-10-14 Thread Yonik Seeley
On Wed, Oct 14, 2009 at 2:24 PM, Joe Calderon calderon@gmail.com wrote:
 hello *, sorry if this seems like a dumb question, im still fairly new
 to working with lucene/solr internals.

 given a Document object, what is the proper way to fetch an integer
 value for a field called num_in_stock, it is both indexed and stored

FieldType controls translation back and forth between Fields and
Strings/Objects.
See FieldType.toObject() or FieldType.storedToReadable()

-Yonik
http://www.lucidimagination.com


Re: DataImportHandler problem: Feeding the XPathEntityProcessor with the FieldReaderDataSource

2009-10-14 Thread Shalin Shekhar Mangar
See SOLR-1511

2009/10/7 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 hi Lance. db.blob is the correct field name so that is fine.
 you can probbaly open an issue and provide the testcase as a patch.
 That can help us track this better

 On Wed, Oct 7, 2009 at 12:45 AM, Lance Norskog goks...@gmail.com wrote:
  A side note that might help: if I change the dataField from 'db.blob'
  to 'blob', this DIH stack emits no documents.
 
  On 10/5/09, Lance Norskog goks...@gmail.com wrote:
  I've added a unit test for the problem down below. It feeds document
  field data into the XPathEntityProcessor via the
  FieldReaderDataSource, and the XPath EP does not emit unpacked fields.
 
  Running this under the debugger, I can see the supplied StringReader,
  with the XML string, being piped into the XPath EP. But somehow the
  XPath EP does not pick it apart the right way.
 
  Here is the DIH configuration file separately.
 
  dataConfig
dataSource type='FieldReaderDataSource' name='fc' /
dataSource type='MockDataSource' name='db' /
document
entity name='db' query='select * from x' dataSource='db'
  field column='dbid' /
  field column='tag' /
  field column='blob' /
  entity name='unpack' dataSource='fc'
  processor='XPathEntityProcessor'
forEach='/names' dataField='db.blob'
field column='name' xpath='/names/name' /
  /entity
/entity
/document
  /dataConfig
 
  Any ideas?
 
 
 ---
 
  package org.apache.solr.handler.dataimport;
 
  import static
 
 org.apache.solr.handler.dataimport.AbstractDataImportHandlerTest.createMap;
  import junit.framework.TestCase;
 
  import java.util.ArrayList;
  import java.util.HashMap;
  import java.util.List;
  import java.util.Map;
 
  import org.apache.solr.common.SolrInputDocument;
  import org.apache.solr.common.SolrInputField;
  import org.apache.solr.handler.dataimport.TestDocBuilder.SolrWriterImpl;
  import org.junit.Test;
 
  /*
   * Demonstrate problem feeding XPathEntity from a FieldReaderDatasource
   */
 
  public class TestFieldReaderXPath extends TestCase {
static final String KISSINGER =
 namesnameHenry/name/names;
 
static final String[][][] DBDOCS = {
{{dbid, 1}, {blob, KISSINGER}},
};
 
/*
 * Receive a row from SQL and fetch a row from Solr - no value
 matching
 * stolen from TestDocBuilder
 * */
 
@Test
public void testSolrEmbedded() throws Exception {
try {
DataImporter di = new DataImporter();
di.loadDataConfig(dih_config_FR_into_XP);
DataImporter.RequestParams rp = new
 DataImporter.RequestParams();
rp.command = full-import;
rp.requestParams = new HashMapString, Object();
 
DataConfig cfg = di.getConfig();
DataConfig.Entity entity =
 cfg.document.entities.get(0);
ListMapString,Object l = new
 ArrayListMapString,Object();
addDBDocuments(l);
MockDataSource.setIterator(select * from x,
 l.iterator());
entity.dataSrc = new MockDataSource();
entity.isDocRoot = true;
SolrWriterImpl swi = new SolrWriterImpl();
di.runCmd(rp, swi);
 
assertEquals(1, swi.docs.size());
SolrInputDocument doc = swi.docs.get(0);
SolrInputField field;
field = doc.getField(dbid);
assertEquals(field.getValue().toString(), 1);
field = doc.getField(blob);
assertEquals(field.getValue().toString(),
 KISSINGER);
field = doc.getField(name);
assertNotNull(field);
assertEquals(field.getValue().toString(),
 Henry);
} finally {
MockDataSource.clearCache();
}
}
 
 
private void addDBDocuments(ListMapString, Object l) {
for(String[][] dbdoc: DBDOCS) {
l.add(createMap(dbdoc[0][0], dbdoc[0][1],
 dbdoc[1][0], dbdoc[1][1]));
}
}
 
 String dih_config_FR_into_XP = dataConfig\r\n +
   dataSource type='FieldReaderDataSource' name='fc' /\r\n +
   dataSource type='MockDataSource' name='db' /\r\n +
   document\r\n +
   entity name='db' query='select * from x'
 dataSource='db'\r\n +
 field column='dbid' /\r\n +
 field column='tag' /\r\n +
 field column='blob' /\r\n +
 entity name='unpack' dataSource='fc'
  

Re: Lucene Merge Threads

2009-10-14 Thread Jason Rutherglen
Gio,

 Also, is there any concrete way to know when the merge is actually complete 
 (aside from profiling the machine)?

This would be a great feature to add to the Solr web UI.  The ability
to monitor merges in progress and log how much time each used.

-J


'Down' boosting shorter docs

2009-10-14 Thread Simon Wistow
Our index has some items in it which basically contain a title and a 
single word body.

If the user searches for a word in the title (especially if title is of 
itself only oen word) then that doc will get scored quite highly, 
despite the fact that, in this case, it's not really relevant.

I've tried something like

qf=title^2.0 content^0.5
bf=num_pages

but that disproportionally boosts long documents to the detriment of 
relevancy

bf=product(num_pages,0.05)

has no effect but 

bf=product(num_pages,0.06)


has a bunch of long documents which don't seem to return any highlighted 
fields plus the short document with only the query in the title which is 
progress in that it's almost exactly the opposite of what I want.

Any suggestions? Am I going to need to reindex and add the length in 
bytes or characters of the document?

Simon






Re: advice on failover setup

2009-10-14 Thread Jason Rutherglen
Don,

Sorry, yes the features are under development and also hopefully
the wikis as well. :)

When they become available, well I can say personally I need the
Katta integration working in the next few months. Jason Venner
got it working over at his company.  It might be good to
describe your use case to see what is a good fit for you.

-J

On Wed, Oct 14, 2009 at 4:20 PM, Don Clore don.cl...@5to1.com wrote:
 I'm sorry, for clarification, is it the *wiki# pages that are under
 development, or the features (I'm guessing the latter)?

 If the latter (ZooKeeperIntegration and KattaIntegration are not available
 yet), is there any sort of guess as to when these features might become
 available?

 thanks,
 Don

 On Wed, Oct 14, 2009 at 2:13 PM, Jason Rutherglen 
 jason.rutherg...@gmail.com wrote:

 Dan,

 For automatic failover there are 2 wiki pages that may be helpful,
 however both are in the development stage.

 http://wiki.apache.org/solr/ZooKeeperIntegration
 http://wiki.apache.org/solr/KattaIntegration

 -J

 On Wed, Oct 14, 2009 at 12:48 PM, Katz, Dan dan.k...@fepoc.com wrote:
  Hi folks,
 
  I'm tasked with designing a failover architecture for our new Solr
  server. I've read the Replication section in the docs
  (http://wiki.apache.org/solr/SolrReplication) and I need some
  clarification/insight. My questions:
 
  1.      Is there such a thing as master/master replication?
  2.      If we have one master and one slave server, and the master goes
  down, does the slave automatically become the master? What's the process
  for brining the server back up and getting the two back in sync? Is it a
  manual process always?
  3.      We're running Solr inside Tomcat on Windows currently. Any
  suggestions for a load balancer that will automatically switch to the
  alternate server if one goes down?
 
  Thanks in advance,
 
  --
  Dan Katz
  Lead Web Developer
  FEP Operations Center(r)
  202.203.2572 (Direct)
  dan.k...@fepoc.com
 
 
 
 
  Unauthorized interception of this communication could be a
  violation of Federal and State Law. This communication and
  any files transmitted with it are confidential and may contain
  protected health information. This communication is solely
  for the use of the person or entity to whom it was addressed.
  If you are not the intended recipient, any use, distribution,
  printing or acting in reliance on the contents of this message
  is strictly prohibited. If you have received this message
  in error, please notify the sender and destroy any and all copies.
  Thank you.
 
 ***
 
 




Re: Right place to put my Tokenizer jars

2009-10-14 Thread Erik Hatcher
You're better off putting extensions like these in solr-home/lib and  
letting Solr load them rather than putting them in a container  
classpath like Jetty's lib/ext.  As you've seen, conflicts occur  
because of class loader visibility.


Erik

On Oct 14, 2009, at 7:28 PM, Teruhiko Kurosaka wrote:


I have my custom Tokenizer and TokenizerFactory in a jar,
and I've been putting it in example/lib/ext. and it's been
working fine with Solr 1.3.

This jar uses SLF4J as a logging API, and I had the SLF4J jars
in the same place, example/lib/ext.

Because Solr 1.4 uses SLF4J too and have it builtin,
I thought I wouldn't need to have another set of the
same jars, I removed them from example/lib/ext.  Then,
when my TokenizerFactory is run, I've got a
NoClassDefFoundError error.

This error can be fixed by putting another set of SLF4J jars
in example/lib/ext, but I don't understand why.
After all, my jar can access Lucene and Solr APIs whose
jars resides elsewhere than example/lib/ext.  Why only
SLF4J jars must be duplicated and exist in example/lib/ext?
Why SLF4J jars are special? Is this somethng to do
with the fact that SLF4J jars are needed at the static
initialization time? What is the correct place to put
my Tokenizer(Filter) jars?

-kuro




Re: Right place to put my Tokenizer jars

2009-10-14 Thread Koji Sekiguchi
Hi Kurosaka-san,

I think you got a kind of class loader problem.
I usually put my plugin jars under the lib directory of solr home.

http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins

Koji

Teruhiko Kurosaka wrote:
 I have my custom Tokenizer and TokenizerFactory in a jar,
 and I've been putting it in example/lib/ext. and it's been
 working fine with Solr 1.3. 

 This jar uses SLF4J as a logging API, and I had the SLF4J jars
 in the same place, example/lib/ext.

 Because Solr 1.4 uses SLF4J too and have it builtin,
 I thought I wouldn't need to have another set of the
 same jars, I removed them from example/lib/ext.  Then,
 when my TokenizerFactory is run, I've got a
 NoClassDefFoundError error.

 This error can be fixed by putting another set of SLF4J jars
 in example/lib/ext, but I don't understand why.
 After all, my jar can access Lucene and Solr APIs whose
 jars resides elsewhere than example/lib/ext.  Why only
 SLF4J jars must be duplicated and exist in example/lib/ext?
 Why SLF4J jars are special? Is this somethng to do
 with the fact that SLF4J jars are needed at the static
 initialization time? What is the correct place to put
 my Tokenizer(Filter) jars?

 -kuro 

   


-- 
http://www.rondhuit.com/en/



RE: Right place to put my Tokenizer jars

2009-10-14 Thread Teruhiko Kurosaka
Actually, I meant to say I have my Tokenizer jars in solr/lib.
I have the jars that my Tokenizer jars depend in lib/ext,
as I wanted them to be loaded only once per container
due to their internal description.  Bad idea?

-kuro

 From: Teruhiko Kurosaka 
 Sent: Wednesday, October 14, 2009 4:28 PM
 To: solr-user@lucene.apache.org
 Subject: Right place to put my Tokenizer jars
 
 I have my custom Tokenizer and TokenizerFactory in a jar, and 
 I've been putting it in example/lib/ext. and it's been 
 working fine with Solr 1.3. 
 
 This jar uses SLF4J as a logging API, and I had the SLF4J 
 jars in the same place, example/lib/ext.
 
 Because Solr 1.4 uses SLF4J too and have it builtin, I 
 thought I wouldn't need to have another set of the same jars, 
 I removed them from example/lib/ext.  Then, when my 
 TokenizerFactory is run, I've got a NoClassDefFoundError error.
 
 This error can be fixed by putting another set of SLF4J jars 
 in example/lib/ext, but I don't understand why.
 After all, my jar can access Lucene and Solr APIs whose jars 
 resides elsewhere than example/lib/ext.  Why only SLF4J jars 
 must be duplicated and exist in example/lib/ext?
 Why SLF4J jars are special? Is this somethng to do with the 
 fact that SLF4J jars are needed at the static initialization 
 time? What is the correct place to put my Tokenizer(Filter) jars?
 
 -kuro 
 

Re: 'Down' boosting shorter docs

2009-10-14 Thread Yonik Seeley
A multiplicative boost may work better than one added in:
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

-Yonik
http://www.lucidimagination.com



On Wed, Oct 14, 2009 at 7:21 PM, Simon Wistow si...@thegestalt.org wrote:
 Our index has some items in it which basically contain a title and a
 single word body.

 If the user searches for a word in the title (especially if title is of
 itself only oen word) then that doc will get scored quite highly,
 despite the fact that, in this case, it's not really relevant.

 I've tried something like

 qf=title^2.0 content^0.5
 bf=num_pages

 but that disproportionally boosts long documents to the detriment of
 relevancy

 bf=product(num_pages,0.05)

 has no effect but

 bf=product(num_pages,0.06)


 has a bunch of long documents which don't seem to return any highlighted
 fields plus the short document with only the query in the title which is
 progress in that it's almost exactly the opposite of what I want.

 Any suggestions? Am I going to need to reindex and add the length in
 bytes or characters of the document?

 Simon


Re: Adding callback url to data import handler...Is this possible?

2009-10-14 Thread Noble Paul നോബിള്‍ नोब्ळ्
I can understand the concern that you do not wish to write Java code .
But a callback url is a very specific requirement. We plan to extend
javascript support to the EventListener callback . Will it help?

On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote:
 Hmmm ... I think this is a valid use case and it might be a good idea to
 support it in someway.
 I will post this thread on the dev-mailing list to seek opinion.

 Cheers
 Avlesh

 On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote:

 Thanks, Avlesh.  Yes, I did take a look at the event listeners.  As I
 mentioned this would require us to write Java code.

 Our app(s) are entirely windows/asp.net/C# so while we could add Java in a
 pinch,  we'd prefer to stick to using SOLR using its convenient REST-style
 interfaces which makes no demand on our app environment.

 Thanks again for your suggestion!

 Cheers,

 Bill

 --
 From: Avlesh Singh avl...@gmail.com
 Sent: Wednesday, October 14, 2009 10:59 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Adding callback url to data import handler...Is this possible?


  Had a look at EventListeners in
 DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners

 Cheers
 Avlesh

 On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com
 wrote:

  Folks:

 I am pretty happy with DIH -- it seems to work very well for my
 situation.
   Thanks!!!

 The one issue I see has to do with the fact that I need to keep polling
 url/dataimport to check if the data import completed successfully. I
 need to know when/if the import is completed (successfully or otherwise)
 so
 that I can update appropriate structures in our app.

 What I would like is something like what Google Checkout API offers -- a
 callback URL.  That is, I should be able to pass along a URL to DIH. Once
 it has completed the import, it can invoke the provided URL.  This
 provides
 a callback mechanism for those of us who don't have the liberty to change
 SOLR source code.  We can then do the needful upon receiving this
 callback.

 If this functionality is already provided in some form/fashion, I'd love
 to
 know.

 All in all, great functionality that has significantly helped me out!

 Cheers,

 - Bill







-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


storing multiple type of records (Parent - Child Relationship)

2009-10-14 Thread ashokcz

Hi All ,
I have a specific requirement of storing multiple type of records. but dont
know how to do it .
First let me tell the requirement.
I have a table called user table and a user can be mapped to multiple
projects.
User table details are User Name , User Id , address , and other details .
I have stored them in solr but now the mapping between user and project has
to be stored .
Project table have (project name , location , business unit ,etc)

I can still go ahead and store user has single record with project details
as indvidual fields , like
UserId:user1 
UserAddress: india
ProjectNames: project1,project2
ProjectBU: retail , finance
ProjectLocation:UK,US

Here i will search in fields like UserId , ProjectBU ,ProjectLocation and
have made UserAddress, ProjectLocation as facets


but is there a way where we can store user records separately and project
records separately .
and jut give the link in solr ?? like mentioned below and still making it
searchable and facetable ??

User Details
=
UserId:user1 
UserAddress: india
ProjectId:1,2

Project Details
==
ProjectId:1
ProjectNames: project1
ProjectBU: retail
ProjectLocation:UK

ProjectId:2
ProjectNames: project2
ProjectBU:finance
ProjectLocation:US


-- 
View this message in context: 
http://www.nabble.com/storing-multiple-type-of-records-%28Parent---Child-Relationship%29-tp25902894p25902894.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: (Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?

2009-10-14 Thread Ryan McKinley


I wonder why the common classes are in the solrj JAR?
Is the solrj JAR not just for the clients?


the solr server uses solrj for distributed search.  This makes solrj  
the general way to talk to solr (even from within solr)






Re: Error when indexing XML files

2009-10-14 Thread Chaitali Gupta
Hi, 

Please find the schema file attached. Please let me know what I am doing wrong. 

Regards
Chaitali 

--- On Wed, 10/14/09, Fergus McMenemie fer...@twig.me.uk wrote:

From: Fergus McMenemie fer...@twig.me.uk
Subject: Re: Error when indexing XML files
To: solr-user@lucene.apache.org
Date: Wednesday, October 14, 2009, 2:25 AM

Hi, 

I am trying to index XML files using SolrJ. The original XML file contains 
nested elements. For example, the following is the snippet of the XML file. 

entry
  nameSOMETHING /name
  facilitySOME_OTHER_THING/facility
 /entry

I have added the elements name and facility in Schema.xml file to make 
these elements indexable. I have changed the XML document above to look like - 

add
doc
 ..
 field name=nameSOMETHING/field 
 ..
/doc
/add

Can you send us the Schema.xml file you created? I suspect that 
one of the fields should be multivalued.

-- 
Fergus.



  ?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--  
 This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml
--

schema name=example version=1.1
  !-- attribute name is the name of this schema and is only used for display purposes.
   Applications should change this to reflect the nature of the search collection.
   version=1.1 is Solr's version number for the schema syntax and semantics.  It should
   not normally be changed by applications.
   1.0: multiValued attribute did not exist, all fields are multiValued by nature
   1.1: multiValued attribute introduced, false by default --

  types
!-- field type definitions. The name attribute is
   just a label to be used by field definitions.  The class
   attribute and any other attributes determine the real
   behavior of the fieldType.
 Class names starting with solr refer to java classes in the
   org.apache.solr.analysis package.
--

!-- The StrField type is not analyzed, but indexed/stored verbatim.  
   - StrField and TextField support an optional compressThreshold which
   limits compression (if enabled in the derived fields) to values which
   exceed a certain size (in characters).
--
fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/

!-- boolean type: true or false --
fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/

!-- The optional sortMissingLast and sortMissingFirst attributes are
 currently supported on types that are sorted internally as strings.
   - If sortMissingLast=true, then a sort on this field will cause documents
 without the field to come after documents with the field,
 regardless of the requested sort order (asc or desc).
   - If sortMissingFirst=true, then a sort on this field will cause documents
 without the field to come before documents with the field,
 regardless of the requested sort order.
   - If sortMissingLast=false and sortMissingFirst=false (the default),
 then default lucene sorting will be used which places docs without the
 field first in an ascending sort and last in a descending sort.
--


!-- numeric field types that store and index the text
 value verbatim (and hence don't support range queries, since the
 lexicographic ordering isn't equal to the numeric ordering) --
fieldType name=integer class=solr.IntField omitNorms=true/
fieldType name=long class=solr.LongField omitNorms=true/
fieldType name=float class=solr.FloatField omitNorms=true/
fieldType name=double class=solr.DoubleField omitNorms=true/


!-- Numeric field types that manipulate the value into
 a string value that isn't human-readable in its internal form,
 but with a lexicographic ordering the same as the numeric