Re: Field names with a period (.)

2011-05-05 Thread Gora Mohanty
On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza leonardo...@gmail.com wrote:
 Hi guys,

 Can i have a field name with a period(.) ?
 Like in *file.size*

Cannot find now where this is documented, but from what I remember it is
recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in
field names, and some special characters are known to cause problems.

Regards,
Gora


copyField

2011-05-05 Thread deniz
another question
if i define different fields with different boosts and then copy them into
another field and make a search by using this universal field, the boosting
will be done? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/copyField-tp2902242p2902242.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to use sub-fields or multivalued fields for boosting?

2011-05-05 Thread findbestopensource
Hello deniz,

You could create a new field say FullName which is a copyfield of
firstname and surname. Search on both the new field and location but boost
up the new field query.

Regards
Aditya
www.findbestopensource.com



On Thu, May 5, 2011 at 9:21 AM, deniz denizdurmu...@gmail.com wrote:

 okay... let me make the situation more clear... I am trying to create an
 universal field which includes information about users like firstname,
 surname, gender, location etc. When I enter something e.g London, I would
 like to match any users having 'London' in any field firstname, surname or
 location. But if it matches name or surname, I would like to give a higher
 weight.

 so my question is... is it possible to have sub-fields? like
 field name=universal
   field name=firstnameblabla/field
   field name=surnameblabla/field
   field name=genderblabla/field
   field name=locationblabla/field
 /field

 or any other ideas for implementing such feature?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread bryan rasmussen
I am asking specifically because I am wondering if it is worth my time
too read the Enterprise server book or if there is too much of a
branch between the two?

If I read the book are there any parts of the book specifically that
won't be relevant?

Thanks,
Bryan Rasmussen


Re: Patch problems solr 1.4 - solr-2010

2011-05-05 Thread roySolr
Hello,

thanks for the answers, i use branch 1.4 and i have succesfully patch
solr-2010.

Now i want to use the collate spellchecking. How does my url look like. I
tried this but
it's not working(It's the same as solr without solr-2010).

http://localhost:8983/solr/select?q=man unitetspellcheck.q=man
unitetspellcheck=truespellcheck.build=truespellcheck.collate=truespellcheck.collateExtendedResult=truespellcheck.maxCollations=10spellcheck.maxCollationTries=10

I get the collapse man united as suggestion. Man is good spelled, but not
in this phrase. It must
be manchester united and i want that solr requerying the collapse and only
give the suggestion
if it gives some results. How can i fix this??

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Patch-problems-solr-1-4-solr-2010-tp2898443p2902546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread rajini maski
Does the solr enable lemmatization concept?



   I found a documentation that gives an information as solr enables
lemmatization concept. Here is the link :
http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf

Can anyone help me finding the jar specified in that document so that i can
add it as plugin.
 jar :rlp.solr.RLPTokenizerFactory


Thanks and Regards,
Rajani Maski


Re: JsonUpdateRequestHandler

2011-05-05 Thread Jan Høydahl
Justine,

The JSON update request handler was added in Solr 3.1. Please download this 
version and try again.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 3. mai 2011, at 22.34, Justine Mathews wrote:

 Hi,
 
 When I have add the Json request handler as below for update in solrconfig.xml
 requestHandler name=/update/json class=solr.JsonUpdateRequestHandler/
 
 I am getting following error. Version : apache-solr-1.4.1.  Could you please 
 help...
 
 Error is shown below,
 
 
 Check your log files for more detailed information on what may be wrong.
 
 If you want solr to continue after configuration errors, change:
 
 abortOnConfigurationErrorfalse/abortOnConfigurationError
 
 in solrconfig.xml
 
 -
 org.apache.solr.common.SolrException: Error loading class 
 'solr.JsonUpdateRequestHandler'
at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at 
 org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
at 
 org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
at org.apache.solr.core.SolrCore.init(SolrCore.java:556)
at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at 
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
 org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
 org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
 org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
 Caused by: java.lang.ClassNotFoundException: solr.JsonUpdateRequestHandler
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
... 30 more
 RequestURI=/solr/
 
 
 --
 Regards,
 Justine K Mathews, MCSD.NET
 Mob: +44-(0) 7795268546
 http://www.justinemathews.comhttp://www.justinemathews.com/
 http://uk.linkedin.com/in/justinemathews
 



Re: copyField

2011-05-05 Thread Ahmet Arslan
 if i define different fields with different boosts and then
 copy them into
 another field and make a search by using this universal
 field, the boosting
 will be done? 

No. copyField just copies raw content.


Re: How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread Jan Høydahl
Hi,

Solr IS an enterprise search server. And there is only one edition :)
I'd wait a few more weeks until the Solr 3.1 books are available, and then read 
up on it.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 09.37, bryan rasmussen wrote:

 I am asking specifically because I am wondering if it is worth my time
 too read the Enterprise server book or if there is too much of a
 branch between the two?
 
 If I read the book are there any parts of the book specifically that
 won't be relevant?
 
 Thanks,
 Bryan Rasmussen



Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread Jan Høydahl
Hi,

Solr does not have lemmatization out of the box.

You'll have to find 3rd party analyzers, and the most known such is from 
BasisTech. Please contact them to learn more.

I'm not aware of any open source lemmatizers for Solr.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 10.34, rajini maski wrote:

 Does the solr enable lemmatization concept?
 
 
 
   I found a documentation that gives an information as solr enables
 lemmatization concept. Here is the link :
 http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf
 
 Can anyone help me finding the jar specified in that document so that i can
 add it as plugin.
 jar :rlp.solr.RLPTokenizerFactory
 
 
 Thanks and Regards,
 Rajani Maski



Re: How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread bryan rasmussen
ok, I just saw the thing about syncing the version numbers.

Is there any information on these Solr 3.1 books? Publishers,
publication dates, website on them?

Mvh,
Bryan Rasmussen

On Thu, May 5, 2011 at 10:57 AM, Jan Høydahl jan@cominvent.com wrote:
 Hi,

 Solr IS an enterprise search server. And there is only one edition :)
 I'd wait a few more weeks until the Solr 3.1 books are available, and then 
 read up on it.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 5. mai 2011, at 09.37, bryan rasmussen wrote:

 I am asking specifically because I am wondering if it is worth my time
 too read the Enterprise server book or if there is too much of a
 branch between the two?

 If I read the book are there any parts of the book specifically that
 won't be relevant?

 Thanks,
 Bryan Rasmussen




Why is org.apache.solr.response.XMLWriter final?

2011-05-05 Thread Gabriele Kahlout
Hello,

It's final in the trunk, and has always been since conception in 2006 at
revision 372455. Why?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Format date before indexing it

2011-05-05 Thread Marc SCHNEIDER
Hi,

I have to index records that have fields containing date.
This date can be : 2011, 2011-05, 2015-05-01. Trailing characters also
can be slashes.
I'd like to convert theses values into a valid date for Solr.

So my question is : what is the best way to achieve this?
1) Use solr.DateField and make my own filter to that I get the date in the
right format
2) Subclass solr.DateField ?

Thanks in advance,
Marc.


Is it possible to load all indexed data in search request

2011-05-05 Thread Kannan
Hi 

 I can load all indexed data using /select request and query param as *:*.
I tried  same with /Search request but it didn't work. Even it didn't work
for * as query value. I am using disMax handler. Is it possible to load
all indexed data in search and suggest request?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-all-indexed-data-in-search-request-tp2902808p2902808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to load all indexed data in search request

2011-05-05 Thread Gora Mohanty
On Thu, May 5, 2011 at 3:48 PM, Kannan ramkannan2...@gmail.com wrote:
 Hi

  I can load all indexed data using /select request and query param as *:*.
 I tried  same with /Search request but it didn't work. Even it didn't work
 for * as query value. I am using disMax handler. Is it possible to load
 all indexed data in search and suggest request?

If I understand correctly, you are trying to retrieve all Solr records in one
go: Question 3.8 in the FAQ ( http://wiki.apache.org/solr/FAQ )
addresses this.

Regards,
Gora


Re: Is it possible to load all indexed data in search request

2011-05-05 Thread Ahmet Arslan


 I am using disMax handler. Is it
 possible to load
 all indexed data in search and suggest request?

With dismax, you can use q.alt=*:* parameter. Don't use q parameter at all.


Re: Format date before indexing it

2011-05-05 Thread Ahmet Arslan


--- On Thu, 5/5/11, Marc SCHNEIDER marc.schneide...@gmail.com wrote:

 From: Marc SCHNEIDER marc.schneide...@gmail.com
 Subject: Format date before indexing it
 To: solr-user solr-user@lucene.apache.org
 Date: Thursday, May 5, 2011, 12:51 PM
 Hi,
 
 I have to index records that have fields containing date.
 This date can be : 2011, 2011-05, 2015-05-01.
 Trailing characters also
 can be slashes.
 I'd like to convert theses values into a valid date for
 Solr.
 
 So my question is : what is the best way to achieve this?
 1) Use solr.DateField and make my own filter to that I get
 the date in the
 right format
 2) Subclass solr.DateField ?

http://wiki.apache.org/solr/UpdateRequestProcessor 
or 
http://wiki.apache.org/solr/DataImportHandler#Transformer if you are using DIH.


Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread François Schiettecatte
Rajani

You might also want to look at Balie ( http://balie.sourceforge.net/ ), from 
the web site:

Features:

• language identification
• tokenization
• sentence boundary detection
• named-entity recognition


Can't vouch for it though.




On May 5, 2011, at 4:58 AM, Jan Høydahl wrote:

 Hi,
 
 Solr does not have lemmatization out of the box.
 
 You'll have to find 3rd party analyzers, and the most known such is from 
 BasisTech. Please contact them to learn more.
 
 I'm not aware of any open source lemmatizers for Solr.
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 5. mai 2011, at 10.34, rajini maski wrote:
 
 Does the solr enable lemmatization concept?
 
 
 
  I found a documentation that gives an information as solr enables
 lemmatization concept. Here is the link :
 http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf
 
 Can anyone help me finding the jar specified in that document so that i can
 add it as plugin.
 jar :rlp.solr.RLPTokenizerFactory
 
 
 Thanks and Regards,
 Rajani Maski
 



[ann] Lily 1.0 is out: Smart Data at Scale, made Easy!

2011-05-05 Thread Steven Noels
Hi all,

We’re really proud to release the first official major release of Lily
- our flagship repository for scalable data and content management,
after 18 months of intense engineering work. We’re thrilled being
first to launch the first open source, general-purpose,
highly-scalable yet flexible data repository based on NOSQL/BigData
technology: read all about it below.

What

Lily is a data and content repository made for the Age of Data: it
allows you to store and manage vast amounts of data, and in the future
will allow you to monetize user interactions by tracking and analyzing
audience data.

Lily makes Big Data easy with a high-level, developer-friendly data
model with rich types, versioning and schema management. Lily offers
simple Java and REST APIs for creating, reading and managing data. Its
flexible indexing mechanism supports interactive and batch-oriented
index maintenance.

Lily is the foundation for any large-scale data-centric application:
social media, e-commerce, large content management applications,
product catalogs, archiving, media asset management: any data-centric
application with an ambition to scale beyond a single-server setup.

Lily is dead serious about Scale. The Lily repository has been tested
to scale beyond any common content repository technology out there,
due to its inherently distributed architecture, providing economically
affordable, robust, and high-performing data management services for
any kind of enterprise application.

For whom

Lily puts BigData technology within reach of enterprise and corporate
developers, wrapping high-care leading-edge technology in a
developer-and administrator-friendly package. Lily offers the
flexibility and scalability of Apache HBase, the de-facto leading
Google BigTable implementation, and the sophistication and robustness
of Apache SOLR, the market leader of open source enterprise and
internet search. Lily sits on the shoulders of these Big Data
revolution leaders, and provides additional ease of use needed for
corporate adoption.

Thanks

Lily builds further upon the best data and search technology out
there: Apache HBase and SOLR. HBase is in use at some of the largest
data properties out there: Facebook, StumbleUpon and Yahoo! SOLR is
rapidly replacing proprietary enterprise search solutions all over the
place and is one of the most popular open source projects at the
Apache Software Foundation. We're thankful for the developer
communities working hard on these projects, and strive hard to
contribute back where possible. We're also appreciative of the
commercial service suppliers backing these projects: Lucid Imagination
and Cloudera.

Where

Everything Lily can be found at www.lilyproject.org. Enjoy!

Thanks,

The Lily team @ http://outerthought.org/

Outerthought
Scalable Smart Data, made Easy
Makers of Kauri, Daisy CMS and Lily


Programmatic restructuring of a Solr cloud

2011-05-05 Thread Sergey Sazonov

Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.


And now to the main topic: I would like to learn whether it is possible 
to restructure a Solr cloud programmatically.


Let me describe the system we are designing to make the requirements 
clear. The indexed documents are certain log entries. We are planning to 
shard them by month, and only keep the last 12 months in the index. We 
are going to replicate each shard across several servers.


Now, the user is always required to search within a single month (= 
shard). Most importantly, we expect an absolute majority of the requests 
to query the current month, with only a minor load on the previous 
months. In order to utilise the cluster most efficiently, we would like 
a majority of the servers to contain replicas of the current month data, 
and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that migrate from master to master, 
depending on which master holds the data for the current month. When a 
new month starts, those slaves have to be reconfigured to hold the new 
shard and to replicate from the new master (their old master now holding 
the data for the previous month).


Since this operation has to be done every month, we are naturally 
considering automating it. So my question is whether anyone has faced a 
similar problem before, and what is the best way to solve it. We are not 
committed to any solution, or even architecture, so feel free to propose 
different solutions. The only requirement is that a majority of the 
servers should be able to serve requests to the current month at any 
given moment.


Thank you in advance for your answers.

Best regards,
Sergey Sazonov.


Re: why query chinese character with bracket become phrase query by default?

2011-05-05 Thread Michael McCandless
Unfortunately, the current out-of-the-box defaults (example config)
for Solr are a disaster for non-whitespace languages (CJK, Thai,
etc.), ie, exactly what you've hit.

This is because Lucene's QueryParser can unexpectedly, dangerously,
create PhraseQuery even when the user did not ask for it (auto
phrase).  Not only does this mean no results for non-whitespace
languages, but it also means worse search performance (PhraseQuery is
usually more costly than TermQuerys).

Lucene leaves this auto phrase behavior off by default, but Solr
defaults it to on.

Robert's email gives a good description of how you can turn it off.

The very first thing every non-whitespace language Solr app should do
is turn  off autoGeneratePhraseQueries!

Mike

http://blog.mikemccandless.com

On Wed, May 4, 2011 at 8:21 PM, cyang2010 ysxsu...@hotmail.com wrote:
 Hi,

 In solr admin query full interface page, the following query with english
 become term query according to debug :

 title_en_US: (blood red)

 lst name=debug
 str name=rawquerystringtitle_en_US: (blood red)/str
 str name=querystringtitle_en_US: (blood red)/str
 str name=parsedquerytitle_en_US:blood title_en_US:red/str
 str name=parsedquery_toStringtitle_en_US:blood title_en_US:red/str


 However, using the same syntax with two chinese terms, the query result into
 a phrase query:

 title_zh_CN: (我活)

 lst name=debug
 str name=rawquerystringtitle_zh_CN: (我活)/str
 str name=querystringtitle_zh_CN: (我活)/str
 str name=parsedqueryPhraseQuery(title_zh_CN:我 活)/str
 str name=parsedquery_toStringtitle_zh_CN:我 活/str


 I do have different tokenizer/filter for those two different fields.
 title_en_US is using all those common english specific tokenizer, while
 title_zh_CN uses solr.ChineseTokenizerFactory.

 I don't think those tokenizer determin whether things within bracket become
 term queries or phrase queries.

 I really need to blindly pass user-input text to a solr field without doing
 any parsing, and hope it is all doing term query for each term contained in
 the search text.

 How do i achieve that?

 Thanks,


 cy

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/why-query-chinese-character-with-bracket-become-phrase-query-by-default-tp2901542p2901542.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How do I debug Unable to evaluate expression using this context printed at start?

2011-05-05 Thread Gabriele Kahlout
I've tried to re-install solr on tomcat, and now when I launch tomcat in
debug mode I see the following exception relating to solr. It's not enough
to understand the problem (and fix it), but I don't know where to look for
more (or what to do). Please help me.

Following the tutorial and discussion here, this is my context descriptor
(solr.xml):

?xml version=1.0 encoding=utf-8?
Context docBase=/Users/simpatico/SOLR_HOME/dist/solr.war debug=0
crossContext=true
  Environment name=solr/home type=java.lang.String
value=/Users/simpatico/SOLR_HOME override=true/
/Context

(the war exists)
$ ls $SOLR_HOME/dist/solr.war
/Users/simpatico/SOLR_HOME//dist/solr.war

$ ls $SOLR_HOME/conf/solrconfig.xml
/Users/simpatico/SOLR_HOME//conf/solrconfig.xml

When Tomcat starts:

INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME
May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/Users/simpatico/SOLR_HOME/'
...
INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to
classloader
May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log
SEVERE:
*javax.xml.transform.TransformerException: Unable to evaluate expression
using this context*
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at
org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.RuntimeException: Unable to evaluate expression using
this context
at
com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
at
com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
... 18 more
-
java.lang.RuntimeException: Unable to evaluate expression using this context
at
com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
at
com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at
org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
at
org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
--- 

Re: Programmatic restructuring of a Solr cloud

2011-05-05 Thread Jan Høydahl
Hi,

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named jan, feb, mar etc
* Every month, you clear the current month index and switch indexing to it
  You will only have one master, because you're only indexing to one month at a 
time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr 
replica pointing to its master
  This way, Amazon will spin off replicas as needed
  NOTE: Your replica could still be located at /solr/select even if it 
replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one 
or more shards
  
shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr

After this is setup, you have 0 config to worry about :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:

 Dear Solr Experts,
 
 First of all, I would like to thank you for your patience when answering 
 questions of those who are less experienced.
 
 And now to the main topic: I would like to learn whether it is possible to 
 restructure a Solr cloud programmatically.
 
 Let me describe the system we are designing to make the requirements clear. 
 The indexed documents are certain log entries. We are planning to shard them 
 by month, and only keep the last 12 months in the index. We are going to 
 replicate each shard across several servers.
 
 Now, the user is always required to search within a single month (= shard). 
 Most importantly, we expect an absolute majority of the requests to query the 
 current month, with only a minor load on the previous months. In order to 
 utilise the cluster most efficiently, we would like a majority of the servers 
 to contain replicas of the current month data, and have only one or two 
 servers per older month. To this end, we are planning to have a set of slaves 
 that migrate from master to master, depending on which master holds the 
 data for the current month. When a new month starts, those slaves have to be 
 reconfigured to hold the new shard and to replicate from the new master 
 (their old master now holding the data for the previous month).
 
 Since this operation has to be done every month, we are naturally considering 
 automating it. So my question is whether anyone has faced a similar problem 
 before, and what is the best way to solve it. We are not committed to any 
 solution, or even architecture, so feel free to propose different solutions. 
 The only requirement is that a majority of the servers should be able to 
 serve requests to the current month at any given moment.
 
 Thank you in advance for your answers.
 
 Best regards,
 Sergey Sazonov.



Controlling webapp startup

2011-05-05 Thread Benson Margulies
There are two ways to characterize what I'd like to do.

1) use the EmbeddedSolrServer to launch Solr, and subsequently enable
the HTTP GET/json servlet. I can provide the 'servlet' wiring, I just
need to be able to hand an HttpServletRequest to something and
retrieve in return the same json that would come back from the usual
Solr servlet.

2) Use the usual Solr servlet apparatus, but defer its startup until
other code in the webapp makes up its mind about configuration and
calls System.setProperty to locate the solr home and data directories.


fast case-insensitive autocomplete

2011-05-05 Thread Kusenda, Brandyn J
Hi.
I need an autocomplete solution to handle case-insensitive queries but
return the original text with the case still intact.   I've experimented
with both the Suggester and TermComponent methods.  TermComponent is working
when I use the regex option, however, it is far to slow.   I get the speed i
want by using term.prefix for by using the suggester but it's case
sensitive.

Here is an example operating on a user directory:

Query: bran
Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian 
Smith, ...

A solution that I would expect to work would be to store two fields; one
containing the original text and the other containing the lowercase.  Then
convert the query to lower case and run the query against the lower case
field and return the original (case preserved) field.
Unfortunately, I can't get a TermComponent query to return additional
fields.  It only returns the field it's searching against.  Should this work
or can I only return additional fields for standard queries.

Thanks in advance,
Brandyn


RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele,

The sequence should be

1. svn update
2. ant get-maven-poms
3. mvn -N -Pbootstrap install

I think you left out #2 - there was a very recent change to the POMs that 
affects the noggit jar name.

Steve

 -Original Message-
 From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
 Sent: Thursday, May 05, 2011 1:22 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to build Solr as a maven project?
 
 Thank you so much for this gem, David!
 
 I still don't manage to build though:
 $ svn update
 At revision 1099684.
 
 $ mvn clean
 
 $ mvn -N -Pbootstrap install
 
 [INFO]
 
 [INFO] BUILD FAILURE
 [INFO]
 
 [INFO] Total time: 8.234s
 [INFO] Finished at: Thu May 05 07:21:34 CEST 2011
 [INFO] Final Memory: 12M/81M
 [INFO]
 
 [ERROR] Failed to execute goal
 org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
 (install-solr-noggit) on project lucene-solr-grandparent: Error
 installing
 artifact 'org.apache.solr:solr-noggit:jar': Failed to install artifact
 org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:
 /Users/simpatico/debug/solr4/solr/lib/apache-solr-noggit-r944541.jar (No
 such file or directory) - [Help 1]
 
 
 On Thu, May 5, 2011 at 12:02 AM, Smiley, David W. dsmi...@mitre.org
 wrote:
 
  Hi folks. What you're supposed to do is run:
 
  mvn -N -Pbootstrap install
 
  as the very first one-time only step.  It copies several custom jar
 files
  into your local repository. From then on you can build like normally
 with
  maven.
 
  ~ David Smiley
  Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
 
 
  On May 4, 2011, at 2:36 PM, Gabriele Kahlout wrote:
 
   but it doesn't build.
  
   Now, I've checked out solr4 from the trunk and tried to build the
 maven
   project there, but it fails downloading berkleydb:
  
   BUILD FAILURE
   -
 ---
   Total time: 1:07.367s
   Finished at: Wed May 04 20:33:29 CEST 2011
   Final Memory: 24M/81M
   -
 ---
   Failed to execute goal on project lucene-bdb: Could not resolve
  dependencies
   for project org.apache.lucene:lucene-bdb:jar:4.0-SNAPSHOT: Failure to
  find
   com.sleepycat:berkeleydb:jar:4.7.25 in
   http://download.carrot2.org/maven2/was cached in the local
 repository,
   resolution will not be reattempted until
   the update interval of carrot2.org has elapsed or updates are forced
 -
   [Help 1]
  
  
   I looked up to get the jar on my own but I didn't find a 4.7.25
 version,
  the
   latest on oracle website (java edition) is 4.1. Where can i download
 this
   maven dependency from?
  
   On Wed, May 4, 2011 at 1:26 PM, Gabriele Kahlout
   gabri...@mysimpatico.comwrote:
  
   It worked after checking out the dev-tools folder. Thank you!
  
  
   On Wed, May 4, 2011 at 1:20 PM, lboutros boutr...@gmail.com wrote:
  
   property name=version value=3.1-SNAPSHOT/
   target name=get-maven-poms
   description=Copy Maven POMs from dev-tools/maven/ to their
   target
   locations
 copy todir=. overwrite=true
   fileset dir=${basedir}/dev-tools/maven/
   filterset begintoken=@ endtoken=@
 filter token=version value=${version}/
   /filterset
   globmapper from=*.template to=*/
 /copy
   /target
  
  
  
  
   --
   Regards,
   K. Gabriele
  
   --- unchanged since 20/9/10 ---
   P.S. If the subject contains [LON] or the addressee acknowledges
 the
   receipt within 48 hours then I don't resend the email.
   subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
   time(x)  Now + 48h) ⇒ ¬resend(I, this).
  
   If an email is sent by a sender that is not a trusted contact or the
  email
   does not contain a valid code then the email is not received. A
 valid
  code
   starts with a hyphen and ends with X.
   ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
 y ∈
   L(-[a-z]+[0-9]X)).
  
  
  
  
   --
   Regards,
   K. Gabriele
  
   --- unchanged since 20/9/10 ---
   P.S. If the subject contains [LON] or the addressee acknowledges
 the
   receipt within 48 hours then I don't resend the email.
   subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
  time(x)
Now + 48h) ⇒ ¬resend(I, this).
  
   If an email is sent by a sender that is not a trusted contact or the
  email
   does not contain a valid code then the email is not received. A valid
  code
   starts with a hyphen and ends with X.
   ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
 y ∈
   L(-[a-z]+[0-9]X)).
 
 
 
 
 
 
 
 
 --
 Regards,
 K. Gabriele
 
 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't 

Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James

Hi All,

I have solr and tika installed and am happily extracting and indexing 
various files.
Unfortunately on some word documents it blows up since it tries to 
auto-generate a 'title' field but my title field in the schema is single 
valued.


Here is my config for the extract handler...

requestHandler name=/update/extract 
class=org.apache.solr.handler.extraction.ExtractingRequestHandler

lst name=defaults
str name=uprefixignored_/str
/lst
/requestHandler

Is there a config option to make it only extract text, or ideally to 
allow me to specify which metadata fields to accept ?


E.g. I'd like to use any author metadata it finds but to not use any 
title metadata it finds as I want title to be single valued and set 
explicitly using a literal.title in the post request.


I did look around for some docs but all i can find are very basic 
examples. there's no comprehensive configuration documentation out there 
as far as I can tell.



ALSO...

I get some other bad responses coming back such as...

htmlheadtitleApache Tomcat/6.0.28 - Error 
report/titlestyle!--H1 
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} 
H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
525D76;font-size:16px;} H3 
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} 
BODY 
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B 
{font-family:Tahoma,Arial,sans-serif;c
olor:white;background-color:#525D76;} P 
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A 
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style 
/headbodyh1HTTP Status 500 - org.ap

ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

java.lang.NoSuchMethodError: 
org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:636)
/h1HR size=1 noshade=noshadepbtype/b Status 
report/ppbmessage/b 
uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;


For the above my url was...

 
http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not
es=literal.tag=UCN_productionliteral.author=Maurits+van+der+Grinten

I guess there's something special I need to be able to process power 
point files ? Maybe I need to get the latest apache POI ? Any 
suggestions welcome...



Regards,

Emyr


Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Jay Luker
Hi Emyr,

You could try using the extractOnly=true parameter [1]. Of course,
you'll need to repost the extracted text manually.

--jay

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


On Thu, May 5, 2011 at 9:36 AM, Emyr James emyr.ja...@sussex.ac.uk wrote:
 Hi All,

 I have solr and tika installed and am happily extracting and indexing
 various files.
 Unfortunately on some word documents it blows up since it tries to
 auto-generate a 'title' field but my title field in the schema is single
 valued.

 Here is my config for the extract handler...

 requestHandler name=/update/extract
 class=org.apache.solr.handler.extraction.ExtractingRequestHandler
 lst name=defaults
 str name=uprefixignored_/str
 /lst
 /requestHandler

 Is there a config option to make it only extract text, or ideally to allow
 me to specify which metadata fields to accept ?

 E.g. I'd like to use any author metadata it finds but to not use any title
 metadata it finds as I want title to be single valued and set explicitly
 using a literal.title in the post request.

 I did look around for some docs but all i can find are very basic examples.
 there's no comprehensive configuration documentation out there as far as I
 can tell.


 ALSO...

 I get some other bad responses coming back such as...

 htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
 525D76;font-size:16px;} H3
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
 {font-family:Tahoma,Arial,sans-serif;c
 olor:white;background-color:#525D76;} P
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
 /headbodyh1HTTP Status 500 - org.ap
 ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

 java.lang.NoSuchMethodError:
 org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
    at
 org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
    at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
    at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
    at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
    at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
    at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
    at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
    at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
    at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
    at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
    at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
    at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
    at
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
    at java.lang.Thread.run(Thread.java:636)
 /h1HR size=1 noshade=noshadepbtype/b Status
 report/ppbmessage/b
 uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

 For the above my url was...

  http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not
 es=literal.tag=UCN_productionliteral.author=Maurits+van+der+Grinten

 I guess there's something special I need to be able to process power point
 files ? Maybe I need to get the latest apache POI ? Any suggestions
 welcome...


 Regards,

 Emyr



Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread Gabriele Kahlout
Okay, that sequence worked, but then shouldn't I be able to do $ mvn install
afterwards? This is what I get:

...
Compiling 478 source files to /Users/simpatico/debug/solr4/solr/build/solr
-
COMPILATION ERROR :
-
org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
package com.google.common.io does not exist
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
com.google.common.collect does not exist
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[29,27] package
com.google.common.io does not exist
org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[29,4] cannot
find symbol
symbol  : variable ByteStreams
location: class org.apache.solr.spelling.suggest.fst.InputStreamDataInput
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[128,57] cannot find
symbol
symbol  : variable Lists
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[170,26] cannot find
symbol
symbol  : variable Lists
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[203,35] cannot find
symbol
symbol  : variable Lists
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[529,6] cannot find
symbol
symbol  : variable Closeables
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[551,6] cannot find
symbol
symbol  : variable Closeables
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
9 errors
-

Reactor Summary:

Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS [13.255s]
Lucene parent POM . SUCCESS [0.199s]
Lucene Core ... SUCCESS [15.528s]
Lucene Test Framework . SUCCESS [4.657s]
Lucene Common Analyzers ... SUCCESS [16.770s]
Lucene Contrib Ant  SUCCESS [1.103s]
Lucene Contrib bdb  SUCCESS [0.883s]
Lucene Contrib bdb-je . SUCCESS [0.872s]
Lucene Database aggregator POM  SUCCESS [0.091s]
Lucene Demo ... SUCCESS [0.842s]
Lucene Memory . SUCCESS [0.726s]
Lucene Queries  SUCCESS [1.559s]
Lucene Highlighter  SUCCESS [3.007s]
Lucene InstantiatedIndex .. SUCCESS [1.224s]
Lucene Lucli .. SUCCESS [1.579s]
Lucene Miscellaneous .. SUCCESS [1.163s]
Lucene Query Parser ... SUCCESS [4.274s]
Lucene Spatial  SUCCESS [1.159s]
Lucene Spellchecker ... SUCCESS [0.841s]
Lucene Swing .. SUCCESS [1.177s]
Lucene Wordnet  SUCCESS [0.816s]
Lucene XML Query Parser ... SUCCESS [1.197s]
Lucene Contrib aggregator POM . SUCCESS [0.079s]
Lucene ICU Analysis Components  SUCCESS [1.494s]
Lucene Phonetic Filters ... SUCCESS [0.759s]
Lucene Smart Chinese Analyzer . SUCCESS [3.534s]
Lucene Stempel Analyzer ... SUCCESS [1.537s]
Lucene Analysis Modules aggregator POM  SUCCESS [0.081s]
Lucene Benchmark .. SUCCESS [3.693s]
Lucene Modules aggregator POM . SUCCESS [0.147s]
Apache Solr parent POM  SUCCESS [0.099s]
Apache Solr Solrj . SUCCESS [3.670s]
Apache Solr Core .. FAILURE [7.842s]

On Thu, May 5, 2011 at 3:36 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 The sequence should be

 1. svn update
 2. ant get-maven-poms
 3. mvn -N -Pbootstrap install

 I think you left out #2 - there was a very recent change to the POMs that
 affects the noggit jar name.

 Steve

  -Original Message-
  From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
  Sent: Thursday, May 05, 2011 1:22 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Is it possible to build Solr as a maven project?
 
  Thank you so much for this gem, David!
 
  I still don't manage to build though:
  $ svn update
  At revision 1099684.
 
  $ mvn clean
 
  $ mvn -N -Pbootstrap install
 
  [INFO]
  
  [INFO] BUILD FAILURE
  [INFO]
  

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James
Thanks for the suggestion but there surely must be a better way than 
that to do it ?
I don't want to post the whole file up, get it extracted on the server, 
send the extracted text back to the client then send it all back up to 
the server again as plain text.


On 05/05/11 14:55, Jay Luker wrote:

Hi Emyr,

You could try using the extractOnly=true parameter [1]. Of course,
you'll need to repost the extracted text manually.

--jay

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk  wrote:

Hi All,

I have solr and tika installed and am happily extracting and indexing
various files.
Unfortunately on some word documents it blows up since it tries to
auto-generate a 'title' field but my title field in the schema is single
valued.

Here is my config for the extract handler...

requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
str name=uprefixignored_/str
/lst
/requestHandler

Is there a config option to make it only extract text, or ideally to allow
me to specify which metadata fields to accept ?

E.g. I'd like to use any author metadata it finds but to not use any title
metadata it finds as I want title to be single valued and set explicitly
using a literal.title in the post request.

I did look around for some docs but all i can find are very basic examples.
there's no comprehensive configuration documentation out there as far as I
can tell.


ALSO...

I get some other bad responses coming back such as...

htmlheadtitleApache Tomcat/6.0.28 - Error report/titlestyle!--H1
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
525D76;font-size:16px;} H3
{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B
{font-family:Tahoma,Arial,sans-serif;c
olor:white;background-color:#525D76;} P
{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - org.ap
ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

java.lang.NoSuchMethodError:
org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:636)
/h1HR size=1 noshade=noshadepbtype/b  Status
report/ppbmessage/b
uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

For the above my url was...

  
http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not

Re: why query chinese character with bracket become phrase query by default?

2011-05-05 Thread Yonik Seeley
2011/5/5 Michael McCandless luc...@mikemccandless.com:
 The very first thing every non-whitespace language Solr app should do
 is turn  off autoGeneratePhraseQueries!

Luckily, this is configurable per FieldType... so if it doesn't exist
yet, we should come up with a good
CJK fieldtype to add to the example schema.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Anuj Kumar
Hi Emyr,

You can try the XPath based approach and see if that works. Also, see if
dynamic fields can help you for the meta data fields.

References-
http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters
http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput

Regards,
Anuj

On Thu, May 5, 2011 at 7:28 PM, Emyr James emyr.ja...@sussex.ac.uk wrote:

 Thanks for the suggestion but there surely must be a better way than that
 to do it ?
 I don't want to post the whole file up, get it extracted on the server,
 send the extracted text back to the client then send it all back up to the
 server again as plain text.


 On 05/05/11 14:55, Jay Luker wrote:

 Hi Emyr,

 You could try using the extractOnly=true parameter [1]. Of course,
 you'll need to repost the extracted text manually.

 --jay

 [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


 On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk
  wrote:

 Hi All,

 I have solr and tika installed and am happily extracting and indexing
 various files.
 Unfortunately on some word documents it blows up since it tries to
 auto-generate a 'title' field but my title field in the schema is single
 valued.

 Here is my config for the extract handler...

 requestHandler name=/update/extract
 class=org.apache.solr.handler.extraction.ExtractingRequestHandler
 lst name=defaults
 str name=uprefixignored_/str
 /lst
 /requestHandler

 Is there a config option to make it only extract text, or ideally to
 allow
 me to specify which metadata fields to accept ?

 E.g. I'd like to use any author metadata it finds but to not use any
 title
 metadata it finds as I want title to be single valued and set explicitly
 using a literal.title in the post request.

 I did look around for some docs but all i can find are very basic
 examples.
 there's no comprehensive configuration documentation out there as far as
 I
 can tell.


 ALSO...

 I get some other bad responses coming back such as...

 htmlheadtitleApache Tomcat/6.0.28 - Error
 report/titlestyle!--H1

 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
 H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
 525D76;font-size:16px;} H3

 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
 BODY
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
 B
 {font-family:Tahoma,Arial,sans-serif;c
 olor:white;background-color:#525D76;} P

 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style
 /headbodyh1HTTP Status 500 - org.ap
 ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

 java.lang.NoSuchMethodError:

 org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at

 org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at

 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at

 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at

 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at

 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at

 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at

 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
 

Re: Field names with a period (.)

2011-05-05 Thread Leonardo Souza
Thanks Gora!

[ ]'s
Leonardo da S. Souza
 °v°   Linux user #375225
 /(_)\   http://counter.li.org/
 ^ ^



On Thu, May 5, 2011 at 3:09 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza leonardo...@gmail.com
 wrote:
  Hi guys,
 
  Can i have a field name with a period(.) ?
  Like in *file.size*

 Cannot find now where this is documented, but from what I remember it is
 recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in
 field names, and some special characters are known to cause problems.

 Regards,
 Gora



RE: Patch problems solr 1.4 - solr-2010

2011-05-05 Thread Dyer, James
There is still a functionality gap in Solr's spellchecker even with Solr-2010 
applied.  If a user enters a word that is in the dictionary, solr will never 
try to correct it.  The only way around this is to use 
spellcheck.onlyMorePopular.  The problem with this approach is 
onlyMorePopular causes the spellchecker to assume *every* word in the query 
is a misspelling and it won't even consider the original terms in building 
collations.  What is needed is a hybrid option that will try to build 
collations using combinations of original terms, corrected terms and more 
popular terms.  To my knowledge, there is no way to get the spellchecker to do 
that currently.

On the other hand, if you're pretty sure man is not in the dictionary, try 
upping spellcheck.count to something higher than the default (20 maybe?)...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: roySolr [mailto:royrutten1...@gmail.com] 
Sent: Thursday, May 05, 2011 3:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Patch problems solr 1.4 - solr-2010

Hello,

thanks for the answers, i use branch 1.4 and i have succesfully patch
solr-2010.

Now i want to use the collate spellchecking. How does my url look like. I
tried this but
it's not working(It's the same as solr without solr-2010).

http://localhost:8983/solr/select?q=man unitetspellcheck.q=man
unitetspellcheck=truespellcheck.build=truespellcheck.collate=truespellcheck.collateExtendedResult=truespellcheck.maxCollations=10spellcheck.maxCollationTries=10

I get the collapse man united as suggestion. Man is good spelled, but not
in this phrase. It must
be manchester united and i want that solr requerying the collapse and only
give the suggestion
if it gives some results. How can i fix this??

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Patch-problems-solr-1-4-solr-2010-tp2898443p2902546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting words with non-ascii chars

2011-05-05 Thread Pavel Kukačka
Thanks for the suggestion, Peter;

the problem was elsewhere though - somewhere in the highlighting
module.
I've fixed it by adding (into the field definition in schema.xml) a
custom czech charFilter (mappings from í = i) - then it started to
work as expected.

Cheers,
Pavel


Peter Wolanin píše v Po 02. 05. 2011 v 17:38 +0200:
 Does your servlet container have the URI encoding set correctly, e.g.
 URIEncoding=UTF-8 for tomcat6?
 
 http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
 
 Older versions of Jetty use ISO-8859-1 as the default URI encoding,
 but jetty 6 should use UTF-8 as default:
 
 http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
 
 -Peter
 
 On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka pavel.kuka...@seznam.cz 
 wrote:
  Hello,
 
 I've hit a (probably trivial) roadblock I don't know how to overcome 
  with Solr 3.1:
  I have a document with common fields (title, keywords, content) and I'm
  trying to use highlighting.
 With queries using ASCII characters there is no problem; it works 
  smoothly. However,
  when I search using a czech word including non-ascii chars (like slovíčko 
  for example - 
  http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dkoversion=2.2start=0rows=10indent=onhl=onhl.fl=*),
   the document is found, but
  the response doesn't contain the highlighted snippet in the highlighting 
  node - there is only an
  empty node - like this:
  **
  .
  .
  .
  lst name=highlighting
   lst name=2009/
  /lst
  
 
 
  When searching for the other keyword ( 
  http://localhost:8983/solr/select/?q=slovoversion=2.2start=0rows=10indent=onhl=onhl.fl=*),
   the resulting response is fine - like this:
  
  lst name=highlighting
   lst name=2009
  arr name=user_keywords
   strslovamp;#237;amp;#269;ko lt;em 
  id=highlightinggt;slovolt;/emgt;/str
 /arr
   /lst
  /lst
 
  
 
  Did anyone come accross this problem?
  Cheers,
  Pavel
 
 
 
 
 
 




Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James

Hi,
I'm not really sure how these can help with my problem. Can you give a 
bit more info on this ?


I think what i'm after is a fairly common request..

http://lucene.472066.n3.nabble.com/Controlling-Tika-s-metadata-td2378677.html
http://lucene.472066.n3.nabble.com/Select-tika-output-for-extract-only-td499059.html#a499062

Did the change that Yonik Seely mentions to allow more control over the 
output ever make it into 1.4 ?


Regards,
Emyr

On 05/05/11 15:01, Anuj Kumar wrote:

Hi Emyr,

You can try the XPath based approach and see if that works. Also, see if
dynamic fields can help you for the meta data fields.

References-
http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters
http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput

Regards,
Anuj

On Thu, May 5, 2011 at 7:28 PM, Emyr Jamesemyr.ja...@sussex.ac.uk  wrote:


Thanks for the suggestion but there surely must be a better way than that
to do it ?
I don't want to post the whole file up, get it extracted on the server,
send the extracted text back to the client then send it all back up to the
server again as plain text.


On 05/05/11 14:55, Jay Luker wrote:


Hi Emyr,

You could try using the extractOnly=true parameter [1]. Of course,
you'll need to repost the extracted text manually.

--jay

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


On Thu, May 5, 2011 at 9:36 AM, Emyr Jamesemyr.ja...@sussex.ac.uk
  wrote:


Hi All,

I have solr and tika installed and am happily extracting and indexing
various files.
Unfortunately on some word documents it blows up since it tries to
auto-generate a 'title' field but my title field in the schema is single
valued.

Here is my config for the extract handler...

requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
str name=uprefixignored_/str
/lst
/requestHandler

Is there a config option to make it only extract text, or ideally to
allow
me to specify which metadata fields to accept ?

E.g. I'd like to use any author metadata it finds but to not use any
title
metadata it finds as I want title to be single valued and set explicitly
using a literal.title in the post request.

I did look around for some docs but all i can find are very basic
examples.
there's no comprehensive configuration documentation out there as far as
I
can tell.


ALSO...

I get some other bad responses coming back such as...

htmlheadtitleApache Tomcat/6.0.28 - Error
report/titlestyle!--H1

{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
525D76;font-size:16px;} H3

{font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
BODY
{font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;}
B
{font-family:Tahoma,Arial,sans-serif;c
olor:white;background-color:#525D76;} P

{font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
{color : black;}A.name {color : black;}HR {color : #525D76;}--/style
/headbodyh1HTTP Status 500 - org.ap
ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

java.lang.NoSuchMethodError:

org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at

org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at

org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at

org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at

org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at

org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at

org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Ramirez, Paul M (388J)
Hey Emyr,

Looking at your stack trace below my guess is that you have two conflicting 
Apache POI jars in your classpath. The odd stack trace is indicative of that as 
the class loader is likely loading some other version of  the DirectoryNode 
class that doesn't have the iterator method. 

 java.lang.NoSuchMethodError: 
 org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

Thanks,
Paul Ramirez


On May 5, 2011, at 6:36 AM, Emyr James wrote:

 Hi All,
 
 I have solr and tika installed and am happily extracting and indexing 
 various files.
 Unfortunately on some word documents it blows up since it tries to 
 auto-generate a 'title' field but my title field in the schema is single 
 valued.
 
 Here is my config for the extract handler...
 
 requestHandler name=/update/extract 
 class=org.apache.solr.handler.extraction.ExtractingRequestHandler
 lst name=defaults
 str name=uprefixignored_/str
 /lst
 /requestHandler
 
 Is there a config option to make it only extract text, or ideally to 
 allow me to specify which metadata fields to accept ?
 
 E.g. I'd like to use any author metadata it finds but to not use any 
 title metadata it finds as I want title to be single valued and set 
 explicitly using a literal.title in the post request.
 
 I did look around for some docs but all i can find are very basic 
 examples. there's no comprehensive configuration documentation out there 
 as far as I can tell.
 
 
 ALSO...
 
 I get some other bad responses coming back such as...
 
 htmlheadtitleApache Tomcat/6.0.28 - Error 
 report/titlestyle!--H1 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
  
 H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
 525D76;font-size:16px;} H3 
 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
  
 BODY 
 {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B 
 {font-family:Tahoma,Arial,sans-serif;c
 olor:white;background-color:#525D76;} P 
 {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
  
 {color : black;}A.name {color : black;}HR {color : #525D76;}--/style 
 /headbodyh1HTTP Status 500 - org.ap
 ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
 
 java.lang.NoSuchMethodError: 
 org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
 at 
 org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
 at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
 at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
 at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:636)
 /h1HR size=1 noshade=noshadepbtype/b Status 
 report/ppbmessage/b 
 uorg.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
 
 For the above my url was...
 
  
 http://localhost:8080/solr/update/extract?literal.id=3922defaultField=contentfmap.content=contentuprefix=ignored_stream.contentType=application%2Fvnd.ms-powerpointcommit=trueliteral.title=Reactor+cycle+141literal.not
 

Re: UIMA analysisEngine path

2011-05-05 Thread Barry Hathaway

Tommaso,

Thanks. Now Solr finds the descriptor; however, I think this is very bad 
practice.
Descriptors really aren't meant to be jarred up. They often contain 
relative paths.

For example, in my case I have a directory that looks like:
appassemble
|- desc
|- pear

where the AnalysisEngine descriptor contained in desc is an aggregate 
analysis engine and
refers to other analysis engines packaged as installed PEAR files in the 
pear subdirectory.
As such, the descriptor contains relative paths pointing into the pear 
subdirectory.
Grabbing the descriptor from the jar breaks that since 
OverridingParamsAEProvider

uses the XMLInputSource method without relative path signature.

Barry

On 5/4/2011 6:16 AM, Tommaso Teofili wrote:

Hello Barry,
the main AnalysisEngine descriptor defined inside theanalysisEngine
element should be inside one of the jars imported with thelib  elements.
At the moment it cannot be taken from expanded directories but it should be
easy to do it (and indeed useful) modifying the
OverridingParamsAEProvider class
[1] at line 57.
Hope this helps,
Tommaso

[1] :
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup

2011/5/3 Barry Hathawaybhath...@nycap.rr.com


I'm new to Solr and trying to get it call a UIMA aggregate analysis engine
and not having much luck.
The null pointer exception indicates that it can't find the xml file
associated with the engine.
I have tried a number of combinations of a path in theanalysisEngine
  element, but nothing
seems to work. In addition, I've put the directory containing the
descriptor in both the classpath
when starting the server and in alib  element in solrconfig.xml. So:

What classpath does theanalysisEngine  tag effectively search for to
locate the descriptor?

Do thelib  entries in solrconfig.xml affect this classpath?

Do the engine descriptors have to be in a jar or can they be in an expanded
directory?

Thanks in advance.

Barry








Re: How do I debug Unable to evaluate expression using this context printed at start?

2011-05-05 Thread Gabriele Kahlout
While the question remains valid, I found there reason to my problem.
Backing up I had saved Tomcat's descriptor file in my $SOLR_HOME and Solr
was trying to read it as described in SolrCore
Wikihttp://wiki.apache.org/solr/CoreAdmin
.

What saved me was remembering Chris's earlier
remarkhttp://markmail.org/thread/3y4zqieyjqfi5vl3. Thank you Chris!


On Thu, May 5, 2011 at 2:58 PM, Gabriele Kahlout
gabri...@mysimpatico.comwrote:

 I've tried to re-install solr on tomcat, and now when I launch tomcat in
 debug mode I see the following exception relating to solr. It's not enough
 to understand the problem (and fix it), but I don't know where to look for
 more (or what to do). Please help me.

 Following the tutorial and discussion here, this is my context descriptor
 (solr.xml):

 ?xml version=1.0 encoding=utf-8?
 Context docBase=/Users/simpatico/SOLR_HOME/dist/solr.war debug=0
 crossContext=true
   Environment name=solr/home type=java.lang.String
 value=/Users/simpatico/SOLR_HOME override=true/
 /Context

 (the war exists)
 $ ls $SOLR_HOME/dist/solr.war
 /Users/simpatico/SOLR_HOME//dist/solr.war

 $ ls $SOLR_HOME/conf/solrconfig.xml
 /Users/simpatico/SOLR_HOME//conf/solrconfig.xml

 When Tomcat starts:
 
 INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME
 May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader init
 INFO: Solr home set to '/Users/simpatico/SOLR_HOME/'
 ...
 INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to
 classloader
 May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log
 SEVERE:
 *javax.xml.transform.TransformerException: Unable to evaluate expression
 using this context*
 at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363)
 at
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
 at
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
 at
 org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
 at
 org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
 at
 org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:680)
 Caused by: java.lang.RuntimeException: Unable to evaluate expression using
 this context
 at
 com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
 at
 com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
 at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
 ... 18 more
 -
 java.lang.RuntimeException: Unable to evaluate expression using this
 context
 at
 com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
 at
 com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
 at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
 at
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
 at
 com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
 at
 org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
 at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
 at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
 at
 org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
 at
 org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
 at
 org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
 at
 org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:98)
 at
 org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
 at
 

SpellCheckComponent issue

2011-05-05 Thread Siddharth Powar
Hi,

(Sorry, emailing again because the last post was not posted...)

I have been using using SolrSpellCheckcomponent. One of my requirements is
that if a user types something like add, solr would return adidas. To
get something like this, I used EdgeNGramsFilterFactory and applied it to
the fields that I am indexing. So for adidas I will have something like a,
ad, adi, adid... Correct me if I'm wrong, shouldnt the distance
algorithm used internally, match adidas with this approach?


Thanks,
Sid


Re: fast case-insensitive autocomplete

2011-05-05 Thread Jan Høydahl
Hi,

Try this solution using a Solr core: 
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 15.22, Kusenda, Brandyn J wrote:

 Hi.
 I need an autocomplete solution to handle case-insensitive queries but
 return the original text with the case still intact.   I've experimented
 with both the Suggester and TermComponent methods.  TermComponent is working
 when I use the regex option, however, it is far to slow.   I get the speed i
 want by using term.prefix for by using the suggester but it's case
 sensitive.
 
 Here is an example operating on a user directory:
 
 Query: bran
 Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian 
 Smith, ...
 
 A solution that I would expect to work would be to store two fields; one
 containing the original text and the other containing the lowercase.  Then
 convert the query to lower case and run the query against the lower case
 field and return the original (case preserved) field.
 Unfortunately, I can't get a TermComponent query to return additional
 fields.  It only returns the field it's searching against.  Should this work
 or can I only return additional fields for standard queries.
 
 Thanks in advance,
 Brandyn



RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele,

On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
 Okay, that sequence worked, but then shouldn't I be able to do $ mvn
 install afterwards? This is what I get:
...
 COMPILATION ERROR :
 -
 org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
 package com.google.common.io does not exist
 org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
 com.google.common.collect does not exist
...

mvn install should work, but it doesn't - I can reproduce this error on my 
machine.  This is a bug in the Maven build.  

The nightly Lucene/Solr Maven build on Jenkins should have caught this 
compilation failure three weeks ago, when Dawid Weiss committed his work under 
https://issues.apache.org/jira/browse/SOLR-2378.  Unfortunately, the nightly 
builds were using the results of compilation under the Ant build, rather than 
compiling from scratch.  I have committed a fix to the nightly build script so 
this won't happen again.

The Maven build bug is that the Solr-core Google Guava dependency was scoped as 
test-only.  Until SOLR-2378, that was true, but it is no longer.  So the fix is 
simply to remove scopetest/scope from the dependency declaration in the 
Solr-core POM.  I've committed this too.

If you svn update you will get these two fixes.

Thank you very much for persisting, and reporting the problems you have 
encountered.

Steve



Re: apache-solr-3.1 slow stats component queries

2011-05-05 Thread Johannes Goll
Hi,

I bench-marked the slow stats queries (6 point estimate) using the same
hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which
returns only the sum and count for statistics component results. Solr/Lucene
is run on jetty.

The relationship between query time and set of found documents is linear
when using the stats component (R^2 0.99). I guess this is expected as the
application needs to scan/sum-up the stat field for all matching documents?

Are there any plans for caching stat results for a certain stat field along
with the documents that match a filter query ? Any other ideas that could
help to improve this (hardware/software configuration) ?  Even for a subset
of 10M entries, the stat search takes on the order of 10 seconds.

Thanks in advance.
Johannes



2011/4/18 Johannes Goll johannes.g...@gmail.com

 any ideas why in this case the stats summaries are so slow  ?  Thank you
 very much in advance for any ideas/suggestions. Johannes


 2011/4/5 Johannes Goll johannes.g...@gmail.com

 Hi,

 thank you for making the new apache-solr-3.1 available.

 I have installed the version from

 http://apache.tradebit.com/pub//lucene/solr/3.1.0/

 and am running into very slow stats component queries (~ 1 minute)
 for fetching the computed sum of the stats field

 url: ?q=*:*start=0rows=0stats=truestats.field=weight

 int name=QTime52825/int

 #documents: 78,359,699
 total RAM: 256G
 vm arguments:  -server -xmx40G

 the stats.field specification is as follows:
 field name=weighttype=pfloatindexed=true
 stored=false required=true multiValued=false
 default=1/

 filter queries that narrow down the #docs help to reduce it -
 QTime seems to be proportional to the number of docs being returned
 by a filter query.

 Is there any way to improve the performance of such stats queries ?
 Caching only helped to improve the filter query performance but if
 larger subsets are being returned, QTime increases unacceptably.

 Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
 I have created a custom 3.1 version that does only return the sum. But
 this
 only slightly improved the performance. Of course I could somehow cache
 the larger sum queries on the client side but I want to do this only as a
 last resort.

 Thank you very much in advance for any ideas/suggestions.

 Johannes




 --
 Johannes Goll
 211 Curry Ford Lane
 Gaithersburg, Maryland 20878



Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread lboutros
Thanks Steve, this will be really simpler next time :)

Is it documented somewhere ? If no, perhaps could we add something in this
page for example ?

http://wiki.apache.org/solr/FrontPage#Solr_Development

or here :

http://wiki.apache.org/solr/NightlyBuilds

Ludovic.

2011/5/5 steve_rowe [via Lucene] 
ml-node+2904178-33932273-383...@n3.nabble.com

 Hi Gabriele,

 On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
  Okay, that sequence worked, but then shouldn't I be able to do $ mvn
  install afterwards? This is what I get:
 ...
  COMPILATION ERROR :
  -
  org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
  package com.google.common.io does not exist
  org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
  com.google.common.collect does not exist
 ...

 mvn install should work, but it doesn't - I can reproduce this error on
 my machine.  This is a bug in the Maven build.

 The nightly Lucene/Solr Maven build on Jenkins should have caught this
 compilation failure three weeks ago, when Dawid Weiss committed his work
 under https://issues.apache.org/jira/browse/SOLR-2378.  Unfortunately,
 the nightly builds were using the results of compilation under the Ant
 build, rather than compiling from scratch.  I have committed a fix to the
 nightly build script so this won't happen again.

 The Maven build bug is that the Solr-core Google Guava dependency was
 scoped as test-only.  Until SOLR-2378, that was true, but it is no longer.
  So the fix is simply to remove scopetest/scope from the dependency
 declaration in the Solr-core POM.  I've committed this too.

 If you svn update you will get these two fixes.

 Thank you very much for persisting, and reporting the problems you have
 encountered.

 Steve



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2904178.html
  To start a new topic under Solr - User, email
 ml-node+472068-1765922688-383...@n3.nabble.com
 To unsubscribe from Solr - User, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=.




-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2904375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread Gabriele Kahlout
Steven, thank you!

$ mvn -DskipTests=true install
works!

[INFO] Reactor Summary:
[INFO]
[INFO] Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS
[13.142s]
[INFO] Lucene parent POM . SUCCESS [0.345s]
[INFO] Lucene Core ... SUCCESS [18.448s]
[INFO] Lucene Test Framework . SUCCESS [3.560s]
[INFO] Lucene Common Analyzers ... SUCCESS [7.739s]
[INFO] Lucene Contrib Ant  SUCCESS [1.265s]
[INFO] Lucene Contrib bdb  SUCCESS [1.332s]
[INFO] Lucene Contrib bdb-je . SUCCESS [1.321s]
[INFO] Lucene Database aggregator POM  SUCCESS [0.242s]
[INFO] Lucene Demo ... SUCCESS [1.813s]
[INFO] Lucene Memory . SUCCESS [2.412s]
[INFO] Lucene Queries  SUCCESS [2.275s]
[INFO] Lucene Highlighter  SUCCESS [2.985s]
[INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s]
[INFO] Lucene Lucli .. SUCCESS [1.814s]
[INFO] Lucene Miscellaneous .. SUCCESS [1.998s]
[INFO] Lucene Query Parser ... SUCCESS [2.755s]
[INFO] Lucene Spatial  SUCCESS [1.314s]
[INFO] Lucene Spellchecker ... SUCCESS [1.535s]
[INFO] Lucene Swing .. SUCCESS [1.233s]
[INFO] Lucene Wordnet  SUCCESS [1.309s]
[INFO] Lucene XML Query Parser ... SUCCESS [1.483s]
[INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s]
[INFO] Lucene ICU Analysis Components  SUCCESS [2.728s]
[INFO] Lucene Phonetic Filters ... SUCCESS [1.765s]
[INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s]
[INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s]
[INFO] Lucene Analysis Modules aggregator POM  SUCCESS [0.213s]
[INFO] Lucene Benchmark .. SUCCESS [2.926s]
[INFO] Lucene Modules aggregator POM . SUCCESS [0.307s]
[INFO] Apache Solr parent POM  SUCCESS [0.233s]
[INFO] Apache Solr Solrj . SUCCESS [3.780s]
[INFO] Apache Solr Core .. SUCCESS [9.693s]
[INFO] Apache Solr Search Server . SUCCESS [6.739s]
[INFO] Apache Solr Test Framework  SUCCESS [2.699s]
[INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s]
[INFO] Apache Solr Clustering  SUCCESS [6.736s]
[INFO] Apache Solr DataImportHandler . SUCCESS [4.914s]
[INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s]
[INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s]
[INFO] Apache Solr Content Extraction Library  SUCCESS [1.909s]
[INFO] Apache Solr - UIMA integration  SUCCESS [1.922s]
[INFO] Apache Solr Contrib aggregator POM  SUCCESS [0.211s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 2:18.040s
[INFO] Finished at: Thu May 05 20:39:09 CEST 2011
[INFO] Final Memory: 38M/90M
[INFO]


On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote:

 Hi Gabriele,

 On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
  Okay, that sequence worked, but then shouldn't I be able to do $ mvn
  install afterwards? This is what I get:
 ...
  COMPILATION ERROR :
  -
  org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
  package com.google.common.io does not exist
  org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
  com.google.common.collect does not exist
 ...

 mvn install should work, but it doesn't - I can reproduce this error on
 my machine.  This is a bug in the Maven build.

 The nightly Lucene/Solr Maven build on Jenkins should have caught this
 compilation failure three weeks ago, when Dawid Weiss committed his work
 under https://issues.apache.org/jira/browse/SOLR-2378.  Unfortunately,
 the nightly builds were using the results of compilation under the Ant
 build, rather than compiling from scratch.  I have committed a fix to the
 nightly build script so this won't happen again.

 The Maven build bug is that the Solr-core Google Guava dependency was
 scoped as test-only.  Until SOLR-2378, that was true, but it is no longer.
  So 

OverlappingFileLockException when concurrent commits in solr

2011-05-05 Thread nitesh nandy
Hello,

I'm using solr version 1.4.0 with tomcat 6. I've 2 solr instances running as
2 different web apps with separate data folders. My application requires
frequent commits from multiple clients. I've noticed that when more than one
client try to commit at the same time, these OverlappingFileLockException
start to appear. Can anything be done to rectify this problem? Please find
the error log below. Thanks

---
HTTP Status 500 - null

java.nio.channels.OverlappingFileLockException
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1215)
 at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1117)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:923)
 at java.nio.channels.FileChannel.tryLock(FileChannel.java:978)
at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233)
 at org.apache.lucene.store.Lock.obtain(Lock.java:73)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550)
 at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java:1407)
at
org.apache.solr.update.SolrIndexWriter.lt;initgt;(SolrIndexWriter.java:190)
 at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
 at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:636)
/h1HR size=1 noshade=noshadepbtype/b Status
report/ppbmessage/b unull

java.nio.channels.OverlappingFileLockException
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1215)
 at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1117)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:923)
 at java.nio.channels.FileChannel.tryLock(FileChannel.java:978)
at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233)
 at org.apache.lucene.store.Lock.obtain(Lock.java:73)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550)
 at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java:1407)
at
org.apache.solr.update.SolrIndexWriter.lt;initgt;(SolrIndexWriter.java:190)
 at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
 at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

DIH for e-mails

2011-05-05 Thread m _ 米蟲ы~
I’m using Data Import Handler for index emails.


The problem is that I wanna add my own field such as security_number.


Someone have any idea?


Regards,


--
  
 James  Bond Fang

DIH for e-mails

2011-05-05 Thread m _ 米蟲ы~
I’m using Data Import Handler for index emails.


The problem is that I wanna add my own field such as security_number.


Someone have any idea?


Regards,


--
  
 James  Bond Fang

DIH for e-mails

2011-05-05 Thread 方振鹏



I’m using Data Import Handler for index emails.

The problem is that I wanna add my own field such as security_number.

Someone have any idea?

Regards,

Jame Bond Fang



Re: DIH for e-mails

2011-05-05 Thread Peter Sturge
The best way to add your own fields is to create a custom Transformer sub-class.
See:
http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FDataImportHandler

This will guide you through the steps.

Peter


2011/5/5 方振鹏 michong900...@xmu.edu.cn:



 I’m using Data Import Handler for index emails.

 The problem is that I wanna add my own field such as security_number.

 Someone have any idea?

 Regards,

 Jame Bond Fang




Re: How do i I modify XMLWriter to write foobar?

2011-05-05 Thread Chris Hostetter

: $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml
: queryResponseWriter name=xml class=org.apache.solr.request.*
: XMLResponseWriter* default=true/
: 
: Now I comment the line in Solrconfix.xml, and there's no more writer.
: $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml
: 
: I make a query, and the XMLResponseWriter is still in charge.
: *$ curl -L http://localhost:8080/solr/select?q=apache*
: ?xml version=1.0 encoding=UTF-8?

...

Your example request is not specifying a wt param.

in addition to the response writers declared in your solrconfig.xml, there 
are response writers that exist implicitly unless you define your own 
instances that override those names (xml, json, python, etc...)

the real question is: what writer do you *want* to have used when no wt is 
specified?

whatever the answer is: declare n instance of that writer with 
default=true in your solrconfig.xml


-Hoss


Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Alexey Serba
{quote}
...
Caused by: java.io.EOFException: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was
unexpectedly lost.
   at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
   at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
   ... 22 more
Apr 21, 2011 3:53:28 AM
org.apache.solr.handler.dataimport.EntityProcessorBase getNext
SEVERE: getNext() failed for query 'REDACTED'
org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure

The last packet successfully received from the server was 128
milliseconds ago.  The last packet sent successfully to the server was
25,273,484 milliseconds ago.
...
{quote}

It could probably be because of autocommit / segment merging. You
could try to disable autocommit / increase mergeFactor

{quote}
I've used sphinx in the past, which uses multiple queries to pull out
a subset of records ranged based on PrimaryKey, does Solr offer
functionality similar to this? It seems that once a Solr index gets to
a certain size, the indexing of a batch takes longer than MySQL's
net_write_timeout, so it kills the connection.
{quote}

I was thinking about some hackish solution to paginate results
entity name =pages query=SELECT id FROM generate_series( (SELECT
count(*) from source_table) / 1000 ) ... 
  entity name=records query=SELECT * from source_table LIMIT 1000
OFFSET ${pages.id}*1000
  /entity
/entity
Or something along those lines ( you'd need to to calculate offset in
pages query )

But unfortunately MySQL does not provide generate_series function
(it's postgres function and there'r similar solutions for oracle and
mssql).


On Mon, Apr 25, 2011 at 3:59 AM, Scott Bigelow eph...@gmail.com wrote:
 Thank you everyone for your help. I ended up getting the index to work
 using the exact same config file on a (substantially) larger instance.

 On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 {{{A custom indexer, so that's a fairly common practice? So when you are
 dealing with these large indexes, do you try not to fully rebuild them
 when you can? It's not a nightly thing, but something to do in case of
 a disaster? Is there a difference in the performance of an index that
 was built all at once vs. one that has had delta inserts and updates
 applied over a period of months?}}}

 Is it a common practice? Like all of this, it depends. It's certainly
 easier to let DIH do the work. Sometimes DIH doesn't have all the
 capabilities necessary. Or as Chris said, in the case where you already
 have a system built up and it's easier to just grab the output from
 that and send it to Solr, perhaps with SolrJ and not use DIH. Some people
 are just more comfortable with their own code...

 Do you try not to fully rebuild. It depends on how painful a full rebuild
 is. Some people just like the simplicity of starting over every 
 day/week/month.
 But you *have* to be able to rebuild your index in case of disaster, and
 a periodic full rebuild certainly keeps that process up to date.

 Is there a difference...delta inserts...updates...applied over months. Not
 if you do an optimize. When a document is deleted (or updated), it's only
 marked as deleted. The associated data is still in the index. Optimize will
 reclaim that space and compact the segments, perhaps down to one.
 But there's no real operational difference between a newly-rebuilt index
 and one that's been optimized. If you don't delete/update, there's not
 much reason to optimize either

 I'll leave the DIH to others..

 Best
 Erick

 On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow eph...@gmail.com wrote:
 Thanks for the e-mail. I probably should have provided more details,
 but I was more interested in making sure I was approaching the problem
 correctly (using DIH, with one big SELECT statement for millions of
 rows) instead of solving this specific problem. Here's a partial
 stacktrace from this specific problem:

 ...
 Caused by: java.io.EOFException: Can not read response from server.
 Expected to read 4 bytes, read 0 bytes before connection was
 unexpectedly lost.
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
        ... 22 more
 Apr 21, 2011 3:53:28 AM
 org.apache.solr.handler.dataimport.EntityProcessorBase getNext
 SEVERE: getNext() failed for query 'REDACTED'
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
 Communications link failure

 The last packet successfully received from the server was 128
 milliseconds ago.  The last packet sent successfully to the server was
 25,273,484 milliseconds ago.
 ...


 A custom indexer, so that's a fairly common practice? So when you are
 dealing with these large indexes, do you try not to fully rebuild them
 when you can? It's not a 

RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
You're welcome, I'm glad you got it to work. - Steve

 -Original Message-
 From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
 Sent: Thursday, May 05, 2011 2:41 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Is it possible to build Solr as a maven project?
 
 Steven, thank you!
 
 $ mvn -DskipTests=true install
 works!
 
 [INFO] Reactor Summary:
 [INFO]
 [INFO] Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS
 [13.142s]
 [INFO] Lucene parent POM . SUCCESS
 [0.345s]
 [INFO] Lucene Core ... SUCCESS
 [18.448s]
 [INFO] Lucene Test Framework . SUCCESS
 [3.560s]
 [INFO] Lucene Common Analyzers ... SUCCESS
 [7.739s]
 [INFO] Lucene Contrib Ant  SUCCESS
 [1.265s]
 [INFO] Lucene Contrib bdb  SUCCESS
 [1.332s]
 [INFO] Lucene Contrib bdb-je . SUCCESS
 [1.321s]
 [INFO] Lucene Database aggregator POM  SUCCESS
 [0.242s]
 [INFO] Lucene Demo ... SUCCESS
 [1.813s]
 [INFO] Lucene Memory . SUCCESS
 [2.412s]
 [INFO] Lucene Queries  SUCCESS
 [2.275s]
 [INFO] Lucene Highlighter  SUCCESS
 [2.985s]
 [INFO] Lucene InstantiatedIndex .. SUCCESS
 [2.170s]
 [INFO] Lucene Lucli .. SUCCESS
 [1.814s]
 [INFO] Lucene Miscellaneous .. SUCCESS
 [1.998s]
 [INFO] Lucene Query Parser ... SUCCESS
 [2.755s]
 [INFO] Lucene Spatial  SUCCESS
 [1.314s]
 [INFO] Lucene Spellchecker ... SUCCESS
 [1.535s]
 [INFO] Lucene Swing .. SUCCESS
 [1.233s]
 [INFO] Lucene Wordnet  SUCCESS
 [1.309s]
 [INFO] Lucene XML Query Parser ... SUCCESS
 [1.483s]
 [INFO] Lucene Contrib aggregator POM . SUCCESS
 [0.151s]
 [INFO] Lucene ICU Analysis Components  SUCCESS
 [2.728s]
 [INFO] Lucene Phonetic Filters ... SUCCESS
 [1.765s]
 [INFO] Lucene Smart Chinese Analyzer . SUCCESS
 [3.709s]
 [INFO] Lucene Stempel Analyzer ... SUCCESS
 [4.241s]
 [INFO] Lucene Analysis Modules aggregator POM  SUCCESS
 [0.213s]
 [INFO] Lucene Benchmark .. SUCCESS
 [2.926s]
 [INFO] Lucene Modules aggregator POM . SUCCESS
 [0.307s]
 [INFO] Apache Solr parent POM  SUCCESS
 [0.233s]
 [INFO] Apache Solr Solrj . SUCCESS
 [3.780s]
 [INFO] Apache Solr Core .. SUCCESS
 [9.693s]
 [INFO] Apache Solr Search Server . SUCCESS
 [6.739s]
 [INFO] Apache Solr Test Framework  SUCCESS
 [2.699s]
 [INFO] Apache Solr Analysis Extras ... SUCCESS
 [3.868s]
 [INFO] Apache Solr Clustering  SUCCESS
 [6.736s]
 [INFO] Apache Solr DataImportHandler . SUCCESS
 [4.914s]
 [INFO] Apache Solr DataImportHandler Extras .. SUCCESS
 [2.721s]
 [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS
 [0.253s]
 [INFO] Apache Solr Content Extraction Library  SUCCESS
 [1.909s]
 [INFO] Apache Solr - UIMA integration  SUCCESS
 [1.922s]
 [INFO] Apache Solr Contrib aggregator POM  SUCCESS
 [0.211s]
 [INFO]
 
 [INFO] BUILD SUCCESS
 [INFO]
 
 [INFO] Total time: 2:18.040s
 [INFO] Finished at: Thu May 05 20:39:09 CEST 2011
 [INFO] Final Memory: 38M/90M
 [INFO]
 
 
 On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote:
 
  Hi Gabriele,
 
  On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
   Okay, that sequence worked, but then shouldn't I be able to do $ mvn
   install afterwards? This is what I get:
  ...
   COMPILATION ERROR :
   -
   org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
   package com.google.common.io does not exist
   org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
   com.google.common.collect does not exist
  ...
 
  mvn install should work, but it doesn't - I can reproduce this error
 on
  my machine.  This is a bug in the Maven build.
 
  The nightly Lucene/Solr Maven build on Jenkins should have caught this
  compilation failure three weeks ago, when Dawid Weiss committed his
 work
  under 

Re: SpellCheckComponent issue

2011-05-05 Thread Em
Hi Sid,

unfortunately not and as far as I know it is not possible to realize your
requirements with Solr's SpellCheck-Packages (I talk about V. 1.4, since
there are some changes in 3.1).

Regards,
Em

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-issue-tp2903926p2904839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Rohit
Hi,

I am new to solr and this is my first attempt at indexing solr data, I am
getting the following exception while indexing,

org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at
org.apache.solr.schema.DateField.parseMath(DateField.java:165) at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)

I understand from reading some articles that Solr stores time only in UTC,
this is the query i am trying to index,

Select id,text,'language',links,tweetType,source,location,
bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim
e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla
ssified,locationDetail,
geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m
entions,senderInfScr,
createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'),
'%Y-%m-%d') as
IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d')
as
ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d')
as
EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d')
as MET,sign(classified) as sentiment from

Why i am doing this timezone conversion is because i need to group results
by the user timezone. How can i achieve this?

Regards, Rohit

 



Re: How do i I modify XMLWriter to write foobar?

2011-05-05 Thread Gabriele Kahlout
I've now tried to write my own QueryResponseWriter plugin[1], as a maven
project depending on Solr Core 3.1, which is the same version of Solr I've
installed. It seems I'm not able to get rid of some cache.


$ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml
queryResponseWriter name=*xml* class=org.apache.solr.request.*
XMLResponseWriter*/
queryResponseWriter name=*Test* class=com.mysimpatico.me.indexplugins.*
TestQueryResponseWriter* default=true/

Restarted tomcat after changing solrconfig.xml and placing indexplugins.jar
in $SOLR_HOME/
At tomcat boot:
INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/IndexPlugins.jar' to
classloader

I get legacy code of the plugin for both, and I don't understand why. At
least the xml should be different. Why could this be? How to find out?
http://localhost:8080/solr/select?q=apachewt=Test and
http://localhost:8080/solr/select?q=apachewt=xml
XML Parsing Error: syntax error
Location: http://localhost:8080/solr/select?q=apachewt=xml (//Test
Line Number 1, Column 1:
foobarresponseHeaderstatusQTimeparamsqapachewtxmlresponse00foobar
^

It seems the new code for TestQueryResponseWriter[1] seems to never be
executed since i added a severe log statement that doesn't appear in tomcat
logs. Where are those caches?

Thank you in advance.

[1]
package com.mysimpatico.me.indexplugins;

import java.io.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.solr.request.XMLResponseWriter;


/**
 * Hello world!
 *
 */
public class TestQueryResponseWriter extends XMLResponseWriter{

@Override
public void write(Writer writer,
org.apache.solr.request.SolrQueryRequest request,
org.apache.solr.response.SolrQueryResponse response) throws IOException {

Logger.getLogger(TestQueryResponseWriter.class.getName()).log(Level.SEVERE,
Hello from TestQueryResponseWriter);
super.write(writer, request, response);
}
}


On Thu, May 5, 2011 at 9:01 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml
 : queryResponseWriter name=xml class=org.apache.solr.request.*
 : XMLResponseWriter* default=true/
 :
 : Now I comment the line in Solrconfix.xml, and there's no more writer.
 : $ xmlstarlet sel -t -c /config/queryResponseWriter conf/solrconfig.xml
 :
 : I make a query, and the XMLResponseWriter is still in charge.
 : *$ curl -L http://localhost:8080/solr/select?q=apache*
 : ?xml version=1.0 encoding=UTF-8?

 ...

 Your example request is not specifying a wt param.

 in addition to the response writers declared in your solrconfig.xml, there
 are response writers that exist implicitly unless you define your own
 instances that override those names (xml, json, python, etc...)

 the real question is: what writer do you *want* to have used when no wt is
 specified?

 whatever the answer is: declare n instance of that writer with
 default=true in your solrconfig.xml


 -Hoss




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread Gabriele Kahlout
Just for the reference.

$ svn update
At revision 1099940.

On Thu, May 5, 2011 at 9:14 PM, Steven A Rowe sar...@syr.edu wrote:

 You're welcome, I'm glad you got it to work. - Steve

  -Original Message-
  From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
  Sent: Thursday, May 05, 2011 2:41 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Is it possible to build Solr as a maven project?
 
  Steven, thank you!
 
  $ mvn -DskipTests=true install
  works!
 
  [INFO] Reactor Summary:
  [INFO]
  [INFO] Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS
  [13.142s]
  [INFO] Lucene parent POM . SUCCESS
  [0.345s]
  [INFO] Lucene Core ... SUCCESS
  [18.448s]
  [INFO] Lucene Test Framework . SUCCESS
  [3.560s]
  [INFO] Lucene Common Analyzers ... SUCCESS
  [7.739s]
  [INFO] Lucene Contrib Ant  SUCCESS
  [1.265s]
  [INFO] Lucene Contrib bdb  SUCCESS
  [1.332s]
  [INFO] Lucene Contrib bdb-je . SUCCESS
  [1.321s]
  [INFO] Lucene Database aggregator POM  SUCCESS
  [0.242s]
  [INFO] Lucene Demo ... SUCCESS
  [1.813s]
  [INFO] Lucene Memory . SUCCESS
  [2.412s]
  [INFO] Lucene Queries  SUCCESS
  [2.275s]
  [INFO] Lucene Highlighter  SUCCESS
  [2.985s]
  [INFO] Lucene InstantiatedIndex .. SUCCESS
  [2.170s]
  [INFO] Lucene Lucli .. SUCCESS
  [1.814s]
  [INFO] Lucene Miscellaneous .. SUCCESS
  [1.998s]
  [INFO] Lucene Query Parser ... SUCCESS
  [2.755s]
  [INFO] Lucene Spatial  SUCCESS
  [1.314s]
  [INFO] Lucene Spellchecker ... SUCCESS
  [1.535s]
  [INFO] Lucene Swing .. SUCCESS
  [1.233s]
  [INFO] Lucene Wordnet  SUCCESS
  [1.309s]
  [INFO] Lucene XML Query Parser ... SUCCESS
  [1.483s]
  [INFO] Lucene Contrib aggregator POM . SUCCESS
  [0.151s]
  [INFO] Lucene ICU Analysis Components  SUCCESS
  [2.728s]
  [INFO] Lucene Phonetic Filters ... SUCCESS
  [1.765s]
  [INFO] Lucene Smart Chinese Analyzer . SUCCESS
  [3.709s]
  [INFO] Lucene Stempel Analyzer ... SUCCESS
  [4.241s]
  [INFO] Lucene Analysis Modules aggregator POM  SUCCESS
  [0.213s]
  [INFO] Lucene Benchmark .. SUCCESS
  [2.926s]
  [INFO] Lucene Modules aggregator POM . SUCCESS
  [0.307s]
  [INFO] Apache Solr parent POM  SUCCESS
  [0.233s]
  [INFO] Apache Solr Solrj . SUCCESS
  [3.780s]
  [INFO] Apache Solr Core .. SUCCESS
  [9.693s]
  [INFO] Apache Solr Search Server . SUCCESS
  [6.739s]
  [INFO] Apache Solr Test Framework  SUCCESS
  [2.699s]
  [INFO] Apache Solr Analysis Extras ... SUCCESS
  [3.868s]
  [INFO] Apache Solr Clustering  SUCCESS
  [6.736s]
  [INFO] Apache Solr DataImportHandler . SUCCESS
  [4.914s]
  [INFO] Apache Solr DataImportHandler Extras .. SUCCESS
  [2.721s]
  [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS
  [0.253s]
  [INFO] Apache Solr Content Extraction Library  SUCCESS
  [1.909s]
  [INFO] Apache Solr - UIMA integration  SUCCESS
  [1.922s]
  [INFO] Apache Solr Contrib aggregator POM  SUCCESS
  [0.211s]
  [INFO]
  
  [INFO] BUILD SUCCESS
  [INFO]
  
  [INFO] Total time: 2:18.040s
  [INFO] Finished at: Thu May 05 20:39:09 CEST 2011
  [INFO] Final Memory: 38M/90M
  [INFO]
  
 
  On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe sar...@syr.edu wrote:
 
   Hi Gabriele,
  
   On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
Okay, that sequence worked, but then shouldn't I be able to do $ mvn
install afterwards? This is what I get:
   ...
COMPILATION ERROR :
-
org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
package com.google.common.io does not exist
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
com.google.common.collect does not exist
   ...
  
   mvn install should work, but it doesn't 

Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
Hi,

Sorry for the possible double post, I wrote this up but had the
incorrect sender address, so I am guessing that my previous one is going
to be rejected by the list moderation daemon.

I am trying to figure out options for the following problem. I am on
Solr 1.4.1 (Lucene 2.9.1).

I have search results which are going to be ranked by the user (using a
thumbs up/down) and would translate to a score between -1 and +1. 

This data is stored in a database table (
unique_id
thumbs_up
thumbs_down
num_calls

as the thumbs up/down component is clicked.

We want to be able to sort the results by the following score =
(thumbs_up - thumbs_down) / (num_calls). The unique_id field refers to
the one referenced as uniqueId in the schema.xml.

Based on the following conversation:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html 

...my understanding is that I need to:

1) subclass FieldType to create my own RankFieldType. 
2) In this class I override the getSortField() method to return my
custom FieldSortComparatorSource object.
3) Build the custom FieldSortComparatorSource object which returns a
custom FieldSortComparator object in newComparator().
4) Configure the field type of class RankFieldType (rank_t), and a field
(called rank) of field type rank_t in schema.xml of type RankFieldType.
5) use sort=rank+desc to do the sort.

My question is: is there a simpler/more performant way? The number of
database lookups seems like its going to be pretty high with this
approach. And its hard to believe that my problem is new, so I am
guessing this is either part of some Solr configuration I am missing, or
there is some other (possibly simpler) approach I am overlooking.

Pointers to documentation or code (or even keywords I could google)
would be much appreciated.

TIA for all your help,

Sujit




Re: Custom sorting based on external (database) data

2011-05-05 Thread Ahmet Arslan


--- On Thu, 5/5/11, Sujit Pal sujit@comcast.net wrote:

 From: Sujit Pal sujit@comcast.net
 Subject: Custom sorting based on external (database) data
 To: solr-user solr-user@lucene.apache.org
 Date: Thursday, May 5, 2011, 11:03 PM
 Hi,
 
 Sorry for the possible double post, I wrote this up but had
 the
 incorrect sender address, so I am guessing that my previous
 one is going
 to be rejected by the list moderation daemon.
 
 I am trying to figure out options for the following
 problem. I am on
 Solr 1.4.1 (Lucene 2.9.1).
 
 I have search results which are going to be ranked by the
 user (using a
 thumbs up/down) and would translate to a score between -1
 and +1. 
 
 This data is stored in a database table (
 unique_id
 thumbs_up
 thumbs_down
 num_calls
 
 as the thumbs up/down component is clicked.
 
 We want to be able to sort the results by the following
 score =
 (thumbs_up - thumbs_down) / (num_calls). The unique_id
 field refers to
 the one referenced as uniqueId in the schema.xml.
 
 Based on the following conversation:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html
 
 
 ...my understanding is that I need to:
 
 1) subclass FieldType to create my own RankFieldType. 
 2) In this class I override the getSortField() method to
 return my
 custom FieldSortComparatorSource object.
 3) Build the custom FieldSortComparatorSource object which
 returns a
 custom FieldSortComparator object in newComparator().
 4) Configure the field type of class RankFieldType
 (rank_t), and a field
 (called rank) of field type rank_t in schema.xml of type
 RankFieldType.
 5) use sort=rank+desc to do the sort.
 
 My question is: is there a simpler/more performant way? The
 number of
 database lookups seems like its going to be pretty high
 with this
 approach. And its hard to believe that my problem is new,
 so I am
 guessing this is either part of some Solr configuration I
 am missing, or
 there is some other (possibly simpler) approach I am
 overlooking.
 
 Pointers to documentation or code (or even keywords I could
 google)
 would be much appreciated.

Looks like it can be done with 
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html 
and 
http://wiki.apache.org/solr/FunctionQuery

You can dump your table into three text files. Issue a commit to load these 
changes.

Sort by function query is available in Solr3.1 though.


Re: Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
Thank you Ahmet, looks like we could use this. Basically we would do
periodic dumps of the (unique_id|computed_score) sorted by score and
write it out to this file followed by a commit.

Found some more info here, for the benefit of others looking for
something similar:
http://dev.tailsweep.com/solr-external-scoring/ 

On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote:
 
 --- On Thu, 5/5/11, Sujit Pal sujit@comcast.net wrote:
 
  From: Sujit Pal sujit@comcast.net
  Subject: Custom sorting based on external (database) data
  To: solr-user solr-user@lucene.apache.org
  Date: Thursday, May 5, 2011, 11:03 PM
  Hi,
  
  Sorry for the possible double post, I wrote this up but had
  the
  incorrect sender address, so I am guessing that my previous
  one is going
  to be rejected by the list moderation daemon.
  
  I am trying to figure out options for the following
  problem. I am on
  Solr 1.4.1 (Lucene 2.9.1).
  
  I have search results which are going to be ranked by the
  user (using a
  thumbs up/down) and would translate to a score between -1
  and +1. 
  
  This data is stored in a database table (
  unique_id
  thumbs_up
  thumbs_down
  num_calls
  
  as the thumbs up/down component is clicked.
  
  We want to be able to sort the results by the following
  score =
  (thumbs_up - thumbs_down) / (num_calls). The unique_id
  field refers to
  the one referenced as uniqueId in the schema.xml.
  
  Based on the following conversation:
  http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html
  
  
  ...my understanding is that I need to:
  
  1) subclass FieldType to create my own RankFieldType. 
  2) In this class I override the getSortField() method to
  return my
  custom FieldSortComparatorSource object.
  3) Build the custom FieldSortComparatorSource object which
  returns a
  custom FieldSortComparator object in newComparator().
  4) Configure the field type of class RankFieldType
  (rank_t), and a field
  (called rank) of field type rank_t in schema.xml of type
  RankFieldType.
  5) use sort=rank+desc to do the sort.
  
  My question is: is there a simpler/more performant way? The
  number of
  database lookups seems like its going to be pretty high
  with this
  approach. And its hard to believe that my problem is new,
  so I am
  guessing this is either part of some Solr configuration I
  am missing, or
  there is some other (possibly simpler) approach I am
  overlooking.
  
  Pointers to documentation or code (or even keywords I could
  google)
  would be much appreciated.
 
 Looks like it can be done with 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
  
 and 
 http://wiki.apache.org/solr/FunctionQuery
 
 You can dump your table into three text files. Issue a commit to load these 
 changes.
 
 Sort by function query is available in Solr3.1 though.



force 0 results from within a search component?

2011-05-05 Thread Frederik Kraus
Hi guys,

another question on custom search components:

Is there any way to force the response to be 0 results from within a search 
component (and break out of the component chain)?

I'm doing some checks in my first-component and in some cases would like to 
stop processing the request and just pretend, that there are 0 results ...

Thanks,

Fred. 

Re: fast case-insensitive autocomplete

2011-05-05 Thread Otis Gospodnetic
Hi,

I haven't used Suggester yet, but couldn't you feed it all lowercase content 
and 
then lowercase whatever the user is typing before sending it to Suggester to 
avoid case mismatch?

Autocomplete on http://search-lucene.com/ uses 
http://sematext.com/products/autocomplete/index.html if you want a shortcut.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Kusenda, Brandyn J brandyn-kuse...@uiowa.edu
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thu, May 5, 2011 9:22:03 AM
 Subject: fast case-insensitive autocomplete
 
 Hi.
 I need an autocomplete solution to handle case-insensitive queries  but
 return the original text with the case still intact.   I've  experimented
 with both the Suggester and TermComponent methods.   TermComponent is working
 when I use the regex option, however, it is far to  slow.   I get the speed i
 want by using term.prefix for by using the  suggester but it's case
 sensitive.
 
 Here is an example operating on a  user directory:
 
 Query: bran
 Results: Branden Smith, Brandon Thompson,  Brandon Verner, Brandy Finny, 
 Brian 
Smith, ...
 
 A solution that I would  expect to work would be to store two fields; one
 containing the original text  and the other containing the lowercase.  Then
 convert the query to lower  case and run the query against the lower case
 field and return the original  (case preserved) field.
 Unfortunately, I can't get a TermComponent query to  return additional
 fields.  It only returns the field it's searching  against.  Should this work
 or can I only return additional fields for  standard queries.
 
 Thanks in advance,
 Brandyn
 


Re: force 0 results from within a search component?

2011-05-05 Thread Ahmet Arslan
 Is there any way to force the response to be 0 results
 from within a search component (and break out of the
 component chain)?
 
 I'm doing some checks in my first-component and in some
 cases would like to stop processing the request and just
 pretend, that there are 0 results ...

Yes. You can disable all underlying components by their parameters.

setParam(query,false);
setParam(facet,false);
setParam(hl,false);

etc..


Re: why query chinese character with bracket become phrase query by default?

2011-05-05 Thread cyang2010
Nice, it works like a charm.

I am using solr 1.4.1.  Here is my configuration for the chinese field:

  fieldType name=text_ch class=solr.TextField
positionIncrementGap=100 
   analyzer type=index 
 tokenizer class=solr.ChineseTokenizerFactory/  
 
   /analyzer 
   analyzer type=query 
 tokenizer class=solr.ChineseTokenizerFactory/  
 filter class=solr.PositionFilterFactory/ 
   /analyzer 
  /fieldType 



Now when I get the expected hassle free parsing on solr side:

lst name=debug
str name=rawquerystringtitle_zh_CN:(我活)/str
str name=querystringtitle_zh_CN:(我活)/str
str name=parsedquerytitle_zh_CN:我 title_zh_CN:活/str
str name=parsedquery_toStringtitle_zh_CN:我 title_zh_CN:活/str



--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-query-chinese-character-with-bracket-become-phrase-query-by-default-tp2901542p2905784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Thoughts on Search Analytics?

2011-05-05 Thread Otis Gospodnetic
Hi,

I'd like to solicit your thoughts about Search Analytics if you are  doing any 
sort of analysis/reporting of search logs or click stream or  anything related.

* Which information or reports do you find the most useful and why?
* Which reports would you like to have, but don't have for whatever  reason 
(don't have the needed data, or it's too hard to produce such  reports, or ...)
* Which tool(s) or service(s) do you use and find the most useful?

I'm preparing a presentation on the topic of Search Analytics, so I'm  trying 
to 

solicit opinions, practices, desires, etc. on this topic.

Your thoughts would be greatly appreciated.  If you could reply  directly, that 
would be great, since this may be a bit OT for the list.

Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


Testing the limits of non-Java Solr

2011-05-05 Thread Jack Repenning
What's the probability that I can build a non-trivial Solr app without writing 
any Java?

I've been planning to use Solr, Lucene, and existing plug-ins, and sort of 
hoping not to write any Java (the app itself is Ruby / Rails). The dox (such as 
http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but my 
planning's all been no Java.]

I'm just beginning the design work in earnest, and I suddenly notice that it 
seems every mail thread, blog, or example starts out Java-free, but somehow 
ends up involving Java code. I'm not sure I yet understand all these snippets; 
conceivably some of the Java I see could just as easily be written in another 
language, but it makes me wonder. Is it realistic to plan a sizable Solr 
application without some Java programming?

I know, I know, I know: everything depends on the details. I'd be interested 
even in anecdotes: has anyone ever achieved this before? Also, what are the 
clues I should look for that I need to step into the Java realm? I understand, 
for example, that it's possible to write filters and tokenizers to do stuff not 
available in any standard one; in this case, the clue would be I can't find 
what I want in the standard list, I guess. Are there other things I should 
look for?

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep











PGP.sig
Description: This is a digitally signed message part


Re: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Ahmet Arslan
 org.apache.solr.common.SolrException: Invalid Date
 String:'2011-01-07' at
 org.apache.solr.schema.DateField.parseMath(DateField.java:165)

Solr accepts date in the following format: 2011-01-07T00:00:00Z

 I understand from reading some articles that Solr stores
 time only in UTC,
 this is the query i am trying to index,

It seems that you are fetching data from a Relational Database. You may 
consider using http://wiki.apache.org/solr/DataImportHandler

 Why i am doing this timezone conversion is because i need
 to group results
 by the user timezone. How can i achieve this?

Save timezone info in a field and facet on that field?
http://wiki.apache.org/solr/SimpleFacetParameters


Re: Testing the limits of non-Java Solr

2011-05-05 Thread Otis Gospodnetic
Short answer: Yes, you can deploy a Solr cluster and write an application that 
talks to it without writing any Java (but it may be PHP or Python or unless 
that application is you typing telnet my-solr-server 8983 )

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Jack Repenning jrepenn...@collab.net
 To: solr-user@lucene.apache.org
 Sent: Thu, May 5, 2011 6:28:31 PM
 Subject: Testing the limits of non-Java Solr
 
 What's the probability that I can build a non-trivial Solr app without 
 writing  
any Java?
 
 I've been planning to use Solr, Lucene, and existing plug-ins,  and sort of 
hoping not to write any Java (the app itself is Ruby / Rails). The  dox (such 
as 
http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but  
my 
planning's all been no Java.]
 
 I'm just beginning the design work in  earnest, and I suddenly notice that it 
seems every mail thread, blog, or example  starts out Java-free, but somehow 
ends up involving Java code. I'm not sure I  yet understand all these 
snippets; 
conceivably some of the Java I see could just  as easily be written in another 
language, but it makes me wonder. Is it  realistic to plan a sizable Solr 
application without some Java  programming?
 
 I know, I know, I know: everything depends on the details.  I'd be interested 
even in anecdotes: has anyone ever achieved this before? Also,  what are the 
clues I should look for that I need to step into the Java realm? I  
understand, 
for example, that it's possible to write filters and tokenizers to  do stuff 
not 
available in any standard one; in this case, the clue would be I  can't find 
what I want in the standard list, I guess. Are there other things I  should 
look for?
 
 -==-
 Jack Repenning
 Technologist
 Codesion  Business Unit
 CollabNet, Inc.
 8000 Marina Boulevard, Suite  600
 Brisbane, California 94005
 office: +1 650.228.2562
 twitter: http://twitter.com/jrep
 
 
 
 
 
 
 
 
 
 


Re: Thoughts on Search Analytics?

2011-05-05 Thread François Schiettecatte
When I ran the search engine at Feedster, I wrote a perl script that ran 
nightly and gave me:

total number of searches
total number of searches per hour
N most frequent searches
max time for a search
min time for a search
mean time for searches
median time for searches
N slowest searches
warnings
errors

all the above per index (core in SOLR)

The script generated a text file (for me) and an Excel spreadsheet (for the 
management)

François


On May 5, 2011, at 6:25 PM, Otis Gospodnetic wrote:

 Hi,
 
 I'd like to solicit your thoughts about Search Analytics if you are  doing 
 any 
 sort of analysis/reporting of search logs or click stream or  anything 
 related.
 
 * Which information or reports do you find the most useful and why?
 * Which reports would you like to have, but don't have for whatever  reason 
 (don't have the needed data, or it's too hard to produce such  reports, or 
 ...)
 * Which tool(s) or service(s) do you use and find the most useful?
 
 I'm preparing a presentation on the topic of Search Analytics, so I'm  trying 
 to 
 
 solicit opinions, practices, desires, etc. on this topic.
 
 Your thoughts would be greatly appreciated.  If you could reply  directly, 
 that 
 would be great, since this may be a bit OT for the list.
 
 Thanks!
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Craig Stires

Rohit,

The solr server using TrieDateField must receive values in the format
2011-01-07T17:00:30Z

This should be a UTC-based datetime.  The offset can be applied once you get
your results back from solr
   SimpleDateFormat df =   new SimpleDateFormat(format);
   df.setTimeZone(TimeZone.getTimeZone(IST));
   java.util.Date dateunix = df.parse(datetime);


-Craig


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Friday, 6 May 2011 2:31 AM
To: solr-user@lucene.apache.org
Subject: Solr: org.apache.solr.common.SolrException: Invalid Date String:

Hi,

I am new to solr and this is my first attempt at indexing solr data, I am
getting the following exception while indexing,

org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at
org.apache.solr.schema.DateField.parseMath(DateField.java:165) at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)

I understand from reading some articles that Solr stores time only in UTC,
this is the query i am trying to index,

Select id,text,'language',links,tweetType,source,location,
bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim
e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla
ssified,locationDetail,
geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m
entions,senderInfScr,
createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'),
'%Y-%m-%d') as
IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d')
as
ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d')
as
EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d')
as MET,sign(classified) as sentiment from

Why i am doing this timezone conversion is because i need to group results
by the user timezone. How can i achieve this?

Regards, Rohit

 




Re: Solr Terms and Date field issues

2011-05-05 Thread Erick Erickson
H, this is puzzling. If you could come up with a couple of xml
files and a schema
that illustrate this, I'll see what I can see...

Thanks,
Erick

On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote:

 Erik,

 I suspected the same, and setup a test instance to reproduce this. The date 
 field I used is setup to capture indexing time, in other words the schema has 
 a default value of NOW. However, I have reproduced this issue with fields 
 which do no have defaults too.

 On the second one, I did a delete-commit (with expungeDeletes=true) and then 
 a optimize. All other fields show updated terms except the date fields. I 
 have also double checked to see if the Luke handler has any different terms, 
 and it did not.


 Thanks
 Viswa


 Date: Wed, 4 May 2011 08:17:39 -0400
 Subject: Re: Solr Terms and Date field issues
 From: erickerick...@gmail.com
 To: solr-user@lucene.apache.org

 Hmmm, this *looks* like you've changed your schema without
 re-indexing all your data so you're getting old (string?) values in
 that field, but that's just a guess. If this is really happening on a
 clean index it's a problem.

 I'm also going to guess that you're not really deleting the documents
 you think. Are you committing after the deletes?

 Best
 Erick

 On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote:
 
  Hello,
 
  The terms query for a date field seems to get populated with some weird 
  dates, many of these dates (1970,2009,2011-04-23) are not present in the 
  indexed data.  Please see sample data below
 
  I also notice that a delete and optimize does not remove the relevant 
  terms for date fields, the string fields seems work fine.
 
  Thanks
  Viswa
 
  Results from Terms component:
 
 
  int name=2011-05-04T02:01:32.928Z3479/int
 
  int name=2011-05-04T02:00:19.2Z3479/int
 
  int name=2011-05-03T22:34:58.432Z3479/int
 
  int name=2011-04-23T01:36:14.336Z3479/int
 
  int name=2009-03-13T13:23:01.248Z3479/int
 
  int name=1970-01-01T00:00:00Z3479/int
 
  int name=1970-01-01T00:00:00Z3479/int
 
  int name=1970-01-01T00:00:00Z3479/int
 
  int name=1970-01-01T00:00:00Z3479/int
 
  int name=2011-05-04T02:01:34.592Z265/int
 
 
  Result from facet component, rounded by seconds.:
 
  lst name=InsertTime
  int name=2011-05-04T02:01:32Z1/int
 
  int name=2011-05-04T02:01:33Z1148/int
 
  int name=2011-05-04T02:01:34Z2333/int
 
  str name=gap+1SECOND/str
 
  date name=start2011-05-03T06:14:14Z/date
 
  date name=end2011-05-04T06:14:14Z/date/lst
 



Re: Is it possible to use sub-fields or multivalued fields for boosting?

2011-05-05 Thread Erick Erickson
For a truly universal field, I'm not at all sure how you'd proceed. But if you
know what your sub-fields are in advance, have you considered just making
them regular fields and them throwing (d)dismax at it?

Best
Erick

On Wed, May 4, 2011 at 11:51 PM, deniz denizdurmu...@gmail.com wrote:
 okay... let me make the situation more clear... I am trying to create an
 universal field which includes information about users like firstname,
 surname, gender, location etc. When I enter something e.g London, I would
 like to match any users having 'London' in any field firstname, surname or
 location. But if it matches name or surname, I would like to give a higher
 weight.

 so my question is... is it possible to have sub-fields? like
 field name=universal
   field name=firstnameblabla/field
   field name=surnameblabla/field
   field name=genderblabla/field
   field name=locationblabla/field
 /field

 or any other ideas for implementing such feature?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field names with a period (.)

2011-05-05 Thread Erick Erickson
I remember the same, except I think I've seen the recommendation that you
make all the letters lower-case. As I remember, there are some interesting
edge cases that you might run into later with upper case.

But I can't remember the specifics either

Erick

On Thu, May 5, 2011 at 10:08 AM, Leonardo Souza leonardo...@gmail.com wrote:
 Thanks Gora!

 [ ]'s
 Leonardo da S. Souza
  °v°   Linux user #375225
  /(_)\   http://counter.li.org/
  ^ ^



 On Thu, May 5, 2011 at 3:09 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza leonardo...@gmail.com
 wrote:
  Hi guys,
 
  Can i have a field name with a period(.) ?
  Like in *file.size*

 Cannot find now where this is documented, but from what I remember it is
 recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in
 field names, and some special characters are known to cause problems.

 Regards,
 Gora




Solr 3.1 returning entire highlighted field

2011-05-05 Thread Jake Brownell
Hi,

After upgrading from Solr 1.4.0 to 3.1, are highlighting has gone from 
highlighting short pieces of text to displaying what appears to be the entire 
contents of the highlighted field. 

The request using solrj is setting the following:

params.setHighlight(true);
params.setHighlightSnippets(3);
params.set(hl.fl, content_highlight);

From solrconfig


  requestHandler name=dismax class=solr.SearchHandler
lst name=defaults
  str name=defTypedismax/str
  !-- Use the regex highlight fragmenter because it seems to return better 
results. --
  str name=f.text.hl.fragmenterregex/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler  highlighting
   !-- Configure the standard fragmenter --
   !-- This could most likely be commented out in the default case --
   fragmenter name=gap class=org.apache.solr.highlight.GapFragmenter 
default=true
lst name=defaults
 int name=hl.fragsize100/int
/lst
   /fragmenter

   !-- A regular-expression-based fragmenter (f.i., for sentence extraction) 
--
   fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter
lst name=defaults
  !-- slightly smaller fragsizes work better because of slop --
  int name=hl.fragsize70/int
  !-- allow 50% slop on fragment sizes --
  float name=hl.regex.slop0.5/float 
  !-- a basic sentence pattern --
  str name=hl.regex.pattern[-\w ,/\n\']{20,200}/str
/lst
   /fragmenter
   
   !-- Configure the standard formatter --
   formatter name=html class=org.apache.solr.highlight.HtmlFormatter 
default=true
lst name=defaults
 str name=hl.simple.pre![CDATA[strong]]/str
 str name=hl.simple.post![CDATA[/strong]]/str
/lst
   /formatter
  /highlighting


From schema

field name=content_highlight type=text_highlight indexed=true 
stored=true required=false compressed=true termVectors=true 
termPositions=true/

fieldType name=text_highlight class=solr.TextField 
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1
catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType


Any pointers anybody can provide would be greatly appreciated.

Jake


RE: Solr Terms and Date field issues

2011-05-05 Thread Viswa S

Please find attached the schema and some test data (test.xml).

Thanks for looking this.
Viswa


 Date: Thu, 5 May 2011 19:08:31 -0400
 Subject: Re: Solr Terms and Date field issues
 From: erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 
 H, this is puzzling. If you could come up with a couple of xml
 files and a schema
 that illustrate this, I'll see what I can see...
 
 Thanks,
 Erick
 
 On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote:
 
  Erik,
 
  I suspected the same, and setup a test instance to reproduce this. The date 
  field I used is setup to capture indexing time, in other words the schema 
  has a default value of NOW. However, I have reproduced this issue with 
  fields which do no have defaults too.
 
  On the second one, I did a delete-commit (with expungeDeletes=true) and 
  then a optimize. All other fields show updated terms except the date 
  fields. I have also double checked to see if the Luke handler has any 
  different terms, and it did not.
 
 
  Thanks
  Viswa
 
 
  Date: Wed, 4 May 2011 08:17:39 -0400
  Subject: Re: Solr Terms and Date field issues
  From: erickerick...@gmail.com
  To: solr-user@lucene.apache.org
 
  Hmmm, this *looks* like you've changed your schema without
  re-indexing all your data so you're getting old (string?) values in
  that field, but that's just a guess. If this is really happening on a
  clean index it's a problem.
 
  I'm also going to guess that you're not really deleting the documents
  you think. Are you committing after the deletes?
 
  Best
  Erick
 
  On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote:
  
   Hello,
  
   The terms query for a date field seems to get populated with some weird 
   dates, many of these dates (1970,2009,2011-04-23) are not present in the 
   indexed data.  Please see sample data below
  
   I also notice that a delete and optimize does not remove the relevant 
   terms for date fields, the string fields seems work fine.
  
   Thanks
   Viswa
  
   Results from Terms component:
  
  
   int name=2011-05-04T02:01:32.928Z3479/int
  
   int name=2011-05-04T02:00:19.2Z3479/int
  
   int name=2011-05-03T22:34:58.432Z3479/int
  
   int name=2011-04-23T01:36:14.336Z3479/int
  
   int name=2009-03-13T13:23:01.248Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=2011-05-04T02:01:34.592Z265/int
  
  
   Result from facet component, rounded by seconds.:
  
   lst name=InsertTime
   int name=2011-05-04T02:01:32Z1/int
  
   int name=2011-05-04T02:01:33Z1148/int
  
   int name=2011-05-04T02:01:34Z2333/int
  
   str name=gap+1SECOND/str
  
   date name=start2011-05-03T06:14:14Z/date
  
   date name=end2011-05-04T06:14:14Z/date/lst
  
 
  add

doc
	field name=fullTextLogI suspected the same, and setup a test instance to reproduce this/field
/doc
doc
	field name=fullTextLogThe date field I used is setup to capture indexing time, in other words the schema has a default value of NOW/field
/doc
doc
	field name=fullTextLogHowever, I have reproduced this issue with fields which do not have defaults too./field
/doc
doc
	field name=fullTextLog Lorem Ipsum is simply dummy text of the printing and typesetting industry/field
/doc
doc
	field name=fullTextLogContrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old./field
/doc
/add

?xml version=1.0 encoding=UTF-8 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

!--  
 This is the Solr schema file. This file should be named schema.xml and
 should be in the conf directory under the solr home
 (i.e. ./solr/conf/schema.xml by default) 
 or located where the classloader for the Solr webapp can find it.

 This example schema is the recommended starting point for users.
 It should be kept correct and concise, usable out-of-the-box.

 For more information, on how to customize this file, please see
 http://wiki.apache.org/solr/SchemaXml

 PERFORMANCE NOTE: this schema includes many optional features and should not
 be used for benchmarking.  To improve 

RE: Solr Terms and Date field issues

2011-05-05 Thread Ahmet Arslan


It is okey to see weird things in admin/schema.jsp or terms component with trie 
based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/

If you really need terms component, consider using copyField (tdate to string 
type)



 
Please find attached the schema and some test data (test.xml).

Thanks for looking this.
Viswa


 Date: Thu, 5 May 2011 19:08:31 -0400
 Subject: Re: Solr Terms and Date field issues
 From: erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 
 H, this is puzzling. If you could come up with a couple of xml
 files and a schema
 that illustrate this, I'll see what I can see...
 
 Thanks,
 Erick
 
 On Wed, May 4, 2011 at 7:05 PM, Viswa S svis...@hotmail.com wrote:
 
  Erik,
 
  I suspected the same, and setup a test instance to reproduce this. The date 
  field I used is setup to capture indexing time, in other words the schema 
  has a default value of NOW. However, I have reproduced this issue with 
  fields which do no have defaults too.
 
  On the second one, I did a delete-commit (with expungeDeletes=true) and 
  then a optimize. All other fields show updated terms except the date 
  fields. I have also double checked to see if the Luke handler has any 
  different terms, and it did not.
 
 
  Thanks
  Viswa
 
 
  Date: Wed, 4 May 2011 08:17:39 -0400
  Subject: Re: Solr Terms and Date field issues
  From: erickerick...@gmail.com
  To: solr-user@lucene.apache.org
 
  Hmmm, this *looks* like you've changed your schema without
  re-indexing all your data so you're getting old (string?) values in
  that field, but that's just a guess. If this is really happening on a
  clean index it's a problem.
 
  I'm also going to guess that you're not really deleting the documents
  you think. Are you committing after the deletes?
 
  Best
  Erick
 
  On Wed, May 4, 2011 at 2:18 AM, Viswa S svis...@hotmail.com wrote:
  
   Hello,
  
   The terms query for a date field seems to get populated with some weird 
   dates, many of these dates (1970,2009,2011-04-23) are not present in the 
   indexed data.  Please see sample data below
  
   I also notice that a delete and optimize does not remove the relevant 
   terms for date fields, the string fields seems work fine.
  
   Thanks
   Viswa
  
   Results from Terms component:
  
  
   int name=2011-05-04T02:01:32.928Z3479/int
  
   int name=2011-05-04T02:00:19.2Z3479/int
  
   int name=2011-05-03T22:34:58.432Z3479/int
  
   int name=2011-04-23T01:36:14.336Z3479/int
  
   int name=2009-03-13T13:23:01.248Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=1970-01-01T00:00:00Z3479/int
  
   int name=2011-05-04T02:01:34.592Z265/int
  
  
   Result from facet component, rounded by seconds.:
  
   lst name=InsertTime
   int name=2011-05-04T02:01:32Z1/int
  
   int name=2011-05-04T02:01:33Z1148/int
  
   int name=2011-05-04T02:01:34Z2333/int
  
   str name=gap+1SECOND/str
  
   date name=start2011-05-03T06:14:14Z/date
  
   date name=end2011-05-04T06:14:14Z/date/lst
  
 
  



Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Shawn Heisey
I am running into this problem as well, but only sporadically, and only 
in my 3.1 test environment, not 1.4.1 production.  I may have narrowed 
things down, I am interested now in learning whether this is a problem 
with the MySQL connector or DIH.



On 4/21/2011 6:09 PM, Scott Bigelow wrote:

Thanks for the e-mail. I probably should have provided more details,
but I was more interested in making sure I was approaching the problem
correctly (using DIH, with one big SELECT statement for millions of
rows) instead of solving this specific problem. Here's a partial
stacktrace from this specific problem:

...
Caused by: java.io.EOFException: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was
unexpectedly lost.
 at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
 ... 22 more
Apr 21, 2011 3:53:28 AM
org.apache.solr.handler.dataimport.EntityProcessorBase getNext
SEVERE: getNext() failed for query 'REDACTED'
org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure

The last packet successfully received from the server was 128
milliseconds ago.  The last packet sent successfully to the server was
25,273,484 milliseconds ago.
...




Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Scott Bigelow
Alex, thanks for your response. I suspect you're right about
autoCommit; i ended up solving the problem by merely moving the entire
Solr install, untouched, to a significantly larger instance (EC2
m1.small to m1.large). I think it is appropriately sized now for the
quantity and intensity of queries that will be thrown at it when it
enters production, so I never bothered to get it working on the
smaller instance.

Your entity examples are interesting, I wonder if you could create
some count table to make up for MySQL's lack of row generator. Either
way, it seems like paging through results would be a must-have for any
enterprise-level indexer, and I'm surprised to find it missing in
Solr.

When relying on the delta import mechanism for updates, it's not like
one would need the consistency of pulling the entire record set as a
single, isolated query, since the delta import is designed to fetch
new documents and merge them in to a slightly out-of-date/inconsistent
index.


On Thu, May 5, 2011 at 12:10 PM, Alexey Serba ase...@gmail.com wrote:
 {quote}
 ...
 Caused by: java.io.EOFException: Can not read response from server.
 Expected to read 4 bytes, read 0 bytes before connection was
 unexpectedly lost.
       at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
       at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
       ... 22 more
 Apr 21, 2011 3:53:28 AM
 org.apache.solr.handler.dataimport.EntityProcessorBase getNext
 SEVERE: getNext() failed for query 'REDACTED'
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
 Communications link failure

 The last packet successfully received from the server was 128
 milliseconds ago.  The last packet sent successfully to the server was
 25,273,484 milliseconds ago.
 ...
 {quote}

 It could probably be because of autocommit / segment merging. You
 could try to disable autocommit / increase mergeFactor

 {quote}
 I've used sphinx in the past, which uses multiple queries to pull out
 a subset of records ranged based on PrimaryKey, does Solr offer
 functionality similar to this? It seems that once a Solr index gets to
 a certain size, the indexing of a batch takes longer than MySQL's
 net_write_timeout, so it kills the connection.
 {quote}

 I was thinking about some hackish solution to paginate results
 entity name =pages query=SELECT id FROM generate_series( (SELECT
 count(*) from source_table) / 1000 ) ... 
  entity name=records query=SELECT * from source_table LIMIT 1000
 OFFSET ${pages.id}*1000
  /entity
 /entity
 Or something along those lines ( you'd need to to calculate offset in
 pages query )

 But unfortunately MySQL does not provide generate_series function
 (it's postgres function and there'r similar solutions for oracle and
 mssql).


 On Mon, Apr 25, 2011 at 3:59 AM, Scott Bigelow eph...@gmail.com wrote:
 Thank you everyone for your help. I ended up getting the index to work
 using the exact same config file on a (substantially) larger instance.

 On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson erickerick...@gmail.com 
 wrote:
 {{{A custom indexer, so that's a fairly common practice? So when you are
 dealing with these large indexes, do you try not to fully rebuild them
 when you can? It's not a nightly thing, but something to do in case of
 a disaster? Is there a difference in the performance of an index that
 was built all at once vs. one that has had delta inserts and updates
 applied over a period of months?}}}

 Is it a common practice? Like all of this, it depends. It's certainly
 easier to let DIH do the work. Sometimes DIH doesn't have all the
 capabilities necessary. Or as Chris said, in the case where you already
 have a system built up and it's easier to just grab the output from
 that and send it to Solr, perhaps with SolrJ and not use DIH. Some people
 are just more comfortable with their own code...

 Do you try not to fully rebuild. It depends on how painful a full rebuild
 is. Some people just like the simplicity of starting over every 
 day/week/month.
 But you *have* to be able to rebuild your index in case of disaster, and
 a periodic full rebuild certainly keeps that process up to date.

 Is there a difference...delta inserts...updates...applied over months. Not
 if you do an optimize. When a document is deleted (or updated), it's only
 marked as deleted. The associated data is still in the index. Optimize will
 reclaim that space and compact the segments, perhaps down to one.
 But there's no real operational difference between a newly-rebuilt index
 and one that's been optimized. If you don't delete/update, there's not
 much reason to optimize either

 I'll leave the DIH to others..

 Best
 Erick

 On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow eph...@gmail.com wrote:
 Thanks for the e-mail. I probably should have provided more details,
 but I was more interested in making sure I was approaching the problem
 correctly (using DIH, with 

Re: Testing the limits of non-Java Solr

2011-05-05 Thread William Bell
Yeah you don't need Java to use Solr. PHP, Curl, Python, HTTP Request
APIs all work fine.

The purpose of Solr is to wrap Lucene into a REST-like API that anyone
can call using HTTP.



On Thu, May 5, 2011 at 4:35 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Short answer: Yes, you can deploy a Solr cluster and write an application that
 talks to it without writing any Java (but it may be PHP or Python or 
 unless
 that application is you typing telnet my-solr-server 8983 )

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
 From: Jack Repenning jrepenn...@collab.net
 To: solr-user@lucene.apache.org
 Sent: Thu, May 5, 2011 6:28:31 PM
 Subject: Testing the limits of non-Java Solr

 What's the probability that I can build a non-trivial Solr app without 
 writing
any Java?

 I've been planning to use Solr, Lucene, and existing plug-ins,  and sort of
hoping not to write any Java (the app itself is Ruby / Rails). The  dox (such 
as
http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but  
my
planning's all been no Java.]

 I'm just beginning the design work in  earnest, and I suddenly notice that it
seems every mail thread, blog, or example  starts out Java-free, but somehow
ends up involving Java code. I'm not sure I  yet understand all these 
snippets;
conceivably some of the Java I see could just  as easily be written in another
language, but it makes me wonder. Is it  realistic to plan a sizable Solr
application without some Java  programming?

 I know, I know, I know: everything depends on the details.  I'd be interested
even in anecdotes: has anyone ever achieved this before? Also,  what are the
clues I should look for that I need to step into the Java realm? I  
understand,
for example, that it's possible to write filters and tokenizers to  do stuff 
not
available in any standard one; in this case, the clue would be I  can't find
what I want in the standard list, I guess. Are there other things I  should
look for?

 -==-
 Jack Repenning
 Technologist
 Codesion  Business Unit
 CollabNet, Inc.
 8000 Marina Boulevard, Suite  600
 Brisbane, California 94005
 office: +1 650.228.2562
 twitter: http://twitter.com/jrep













Re: fast case-insensitive autocomplete

2011-05-05 Thread William Bell
Are you giving that solution away? What is the costs? etc!!



On Thu, May 5, 2011 at 2:58 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 I haven't used Suggester yet, but couldn't you feed it all lowercase content 
 and
 then lowercase whatever the user is typing before sending it to Suggester to
 avoid case mismatch?

 Autocomplete on http://search-lucene.com/ uses
 http://sematext.com/products/autocomplete/index.html if you want a shortcut.

 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
 From: Kusenda, Brandyn J brandyn-kuse...@uiowa.edu
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thu, May 5, 2011 9:22:03 AM
 Subject: fast case-insensitive autocomplete

 Hi.
 I need an autocomplete solution to handle case-insensitive queries  but
 return the original text with the case still intact.   I've  experimented
 with both the Suggester and TermComponent methods.   TermComponent is working
 when I use the regex option, however, it is far to  slow.   I get the speed i
 want by using term.prefix for by using the  suggester but it's case
 sensitive.

 Here is an example operating on a  user directory:

 Query: bran
 Results: Branden Smith, Brandon Thompson,  Brandon Verner, Brandy Finny, 
 Brian
Smith, ...

 A solution that I would  expect to work would be to store two fields; one
 containing the original text  and the other containing the lowercase.  Then
 convert the query to lower  case and run the query against the lower case
 field and return the original  (case preserved) field.
 Unfortunately, I can't get a TermComponent query to  return additional
 fields.  It only returns the field it's searching  against.  Should this work
 or can I only return additional fields for  standard queries.

 Thanks in advance,
 Brandyn




Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread William Bell
Is there a parser that can take a string and tell you what part is an
address, and what is not?

Split the field into 2 fields?

Search: Dr. Bell in Denver, CO
Search: Dr. Smith near 10722 Main St, Denver, CO
Search: Denver, CO for Cardiologist

Thoughts?

2011/5/5 François Schiettecatte fschietteca...@gmail.com:
 Rajani

 You might also want to look at Balie ( http://balie.sourceforge.net/ ), from 
 the web site:

 Features:

        • language identification
        • tokenization
        • sentence boundary detection
        • named-entity recognition


 Can't vouch for it though.




 On May 5, 2011, at 4:58 AM, Jan Høydahl wrote:

 Hi,

 Solr does not have lemmatization out of the box.

 You'll have to find 3rd party analyzers, and the most known such is from 
 BasisTech. Please contact them to learn more.

 I'm not aware of any open source lemmatizers for Solr.

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 5. mai 2011, at 10.34, rajini maski wrote:

 Does the solr enable lemmatization concept?



  I found a documentation that gives an information as solr enables
 lemmatization concept. Here is the link :
 http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf

 Can anyone help me finding the jar specified in that document so that i can
 add it as plugin.
 jar :rlp.solr.RLPTokenizerFactory


 Thanks and Regards,
 Rajani Maski





DIH disconnecting long-lived MySQL connections

2011-05-05 Thread Shawn Heisey
I am using DIH with the MySQL connector to import data into my index.  
When doing a full import in my 3.1 test environment, it sometimes loses 
connection with the database and ends up rolling back the import.  My 
import configuration uses a single query, so there's no possibility of a 
reconnect fixing this.  Visit http://pastebin.com/Ya9DBMEP for the error 
log.  I'm using mysql-connector-java-5.1.15-bin.jar.


It seems that this occurs when Solr is busy doing multiple segment 
merges, when there are two merges partially complete and it's working on 
a third, causing ongoing index activity to cease for several minutes.  
Indexing activity seems to be fine up until there are three merges in 
progress.


This is a virtual environment using Xen on CentOS5, two VMs.  The host 
has SATA RAID1, so there's not a lot of I/O capacity.  When both virtual 
machines are busy indexing, it can't keep up with the load, and one 
segment merge doesn't have time to complete before it's built up enough 
segments to start another one, which puts the first one on hold.  If I 
build one virtual machine at a time, it doesn't do this, but then it 
takes twice as long.  My 1.4.1 production systems builds all six shards 
at the same time when it's doing a full rebuild, but that's using RAID10.


I grabbed a sniffer trace of the MySQL connection from the database 
server.  After the last actual data packet in the capture, there is a 
173 second pause followed by a Request Quit packet from the VM, then 
the connection is torn down normally.


My best guess right now is that the idle-timeout-minutes setting in 
JDBC is coming into play here during my single query, and that it's set 
to 3 minutes.  The Internet cannot seem to tell me what the default 
value is for this setting, and I do not see it mentioned anywhere in the 
MySQL/J source code.  I tried adding  idle-timeout-minutes=30 to the 
datasource definition in my DIH config, it didn't seem to do anything.


Am I on the right track?  Is there any way to configure DIH so that it 
won't do this?


Thanks,
Shawn



RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Rohit
Hi Craig,

Thanks for the response, actually what we need to achive is see group by
results based on dates like,

2011-01-01  23
2011-01-02  14
2011-01-03  40
2011-01-04  10

Now the records in my table run into millions, grouping the result based on
UTC date would not produce the right result since the result should be
grouped on users timezone.  Is there anyway we can achieve this in Solr?

Regards,
Rohit



-Original Message-
From: Craig Stires [mailto:craig.sti...@gmail.com] 
Sent: 06 May 2011 04:30
To: solr-user@lucene.apache.org
Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date
String:


Rohit,

The solr server using TrieDateField must receive values in the format
2011-01-07T17:00:30Z

This should be a UTC-based datetime.  The offset can be applied once you get
your results back from solr
   SimpleDateFormat df =   new SimpleDateFormat(format);
   df.setTimeZone(TimeZone.getTimeZone(IST));
   java.util.Date dateunix = df.parse(datetime);


-Craig


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Friday, 6 May 2011 2:31 AM
To: solr-user@lucene.apache.org
Subject: Solr: org.apache.solr.common.SolrException: Invalid Date String:

Hi,

I am new to solr and this is my first attempt at indexing solr data, I am
getting the following exception while indexing,

org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at
org.apache.solr.schema.DateField.parseMath(DateField.java:165) at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)

I understand from reading some articles that Solr stores time only in UTC,
this is the query i am trying to index,

Select id,text,'language',links,tweetType,source,location,
bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim
e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla
ssified,locationDetail,
geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m
entions,senderInfScr,
createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'),
'%Y-%m-%d') as
IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d')
as
ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d')
as
EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d')
as MET,sign(classified) as sentiment from

Why i am doing this timezone conversion is because i need to group results
by the user timezone. How can i achieve this?

Regards, Rohit