How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread bryan rasmussen
I am asking specifically because I am wondering if it is worth my time
too read the Enterprise server book or if there is too much of a
branch between the two?

If I read the book are there any parts of the book specifically that
won't be relevant?

Thanks,
Bryan Rasmussen


Re: Patch problems solr 1.4 - solr-2010

2011-05-05 Thread roySolr
Hello,

thanks for the answers, i use branch 1.4 and i have succesfully patch
solr-2010.

Now i want to use the collate spellchecking. How does my url look like. I
tried this but
it's not working(It's the same as solr without solr-2010).

http://localhost:8983/solr/select?q=man unitet&spellcheck.q=man
unitet&spellcheck=true&spellcheck.build=true&spellcheck.collate=true&spellcheck.collateExtendedResult=true&spellcheck.maxCollations=10&spellcheck.maxCollationTries=10

I get the collapse "man united" as suggestion. Man is good spelled, but not
in this phrase. It must
be "manchester united" and i want that solr requerying the collapse and only
give the suggestion
if it gives some results. How can i fix this??

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Patch-problems-solr-1-4-solr-2010-tp2898443p2902546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread rajini maski
Does the solr enable lemmatization concept?



   I found a documentation that gives an information as solr enables
lemmatization concept. Here is the link :
http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf

Can anyone help me finding the jar specified in that document so that i can
add it as plugin.
 jar :rlp.solr.RLPTokenizerFactory


Thanks and Regards,
Rajani Maski


Re: JsonUpdateRequestHandler

2011-05-05 Thread Jan Høydahl
Justine,

The JSON update request handler was added in Solr 3.1. Please download this 
version and try again.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 3. mai 2011, at 22.34, Justine Mathews wrote:

> Hi,
> 
> When I have add the Json request handler as below for update in solrconfig.xml
> 
> 
> I am getting following error. Version : apache-solr-1.4.1.  Could you please 
> help...
> 
> Error is shown below,
> 
> 
> Check your log files for more detailed information on what may be wrong.
> 
> If you want solr to continue after configuration errors, change:
> 
> false
> 
> in solrconfig.xml
> 
> -
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.JsonUpdateRequestHandler'
>at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
>at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
>at 
> org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:449)
>at 
> org.apache.solr.core.RequestHandlers.initHandlersFromConfig(RequestHandlers.java:152)
>at org.apache.solr.core.SolrCore.(SolrCore.java:556)
>at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
>at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
>at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>at 
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
>at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
>at 
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
>at 
> org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
>at 
> org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
>at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>at 
> org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
>at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>at 
> org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
>at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>at 
> org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
>at org.mortbay.jetty.Server.doStart(Server.java:210)
>at 
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>at java.lang.reflect.Method.invoke(Unknown Source)
>at org.mortbay.start.Main.invokeMain(Main.java:183)
>at org.mortbay.start.Main.start(Main.java:497)
>at org.mortbay.start.Main.main(Main.java:115)
> Caused by: java.lang.ClassNotFoundException: solr.JsonUpdateRequestHandler
>at java.net.URLClassLoader$1.run(Unknown Source)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(Unknown Source)
>at java.lang.ClassLoader.loadClass(Unknown Source)
>at java.net.FactoryURLClassLoader.loadClass(Unknown Source)
>at java.lang.ClassLoader.loadClass(Unknown Source)
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Unknown Source)
>at 
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:359)
>... 30 more
> RequestURI=/solr/
> 
> 
> --
> Regards,
> Justine K Mathews, MCSD.NET
> Mob: +44-(0) 7795268546
> http://www.justinemathews.com
> http://uk.linkedin.com/in/justinemathews
> 



Re: copyField

2011-05-05 Thread Ahmet Arslan
> if i define different fields with different boosts and then
> copy them into
> another field and make a search by using this universal
> field, the boosting
> will be done? 

No. copyField just copies raw content.


Re: How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread Jan Høydahl
Hi,

Solr IS an enterprise search server. And there is only one edition :)
I'd wait a few more weeks until the Solr 3.1 books are available, and then read 
up on it.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 09.37, bryan rasmussen wrote:

> I am asking specifically because I am wondering if it is worth my time
> too read the Enterprise server book or if there is too much of a
> branch between the two?
> 
> If I read the book are there any parts of the book specifically that
> won't be relevant?
> 
> Thanks,
> Bryan Rasmussen



Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread Jan Høydahl
Hi,

Solr does not have lemmatization out of the box.

You'll have to find 3rd party analyzers, and the most known such is from 
BasisTech. Please contact them to learn more.

I'm not aware of any open source lemmatizers for Solr.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 10.34, rajini maski wrote:

> Does the solr enable lemmatization concept?
> 
> 
> 
>   I found a documentation that gives an information as solr enables
> lemmatization concept. Here is the link :
> http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf
> 
> Can anyone help me finding the jar specified in that document so that i can
> add it as plugin.
> jar :rlp.solr.RLPTokenizerFactory
> 
> 
> Thanks and Regards,
> Rajani Maski



Re: How much does Solr enterprise server differ from the non Enterprise server?

2011-05-05 Thread bryan rasmussen
ok, I just saw the thing about syncing the version numbers.

Is there any information on these Solr 3.1 books? Publishers,
publication dates, website on them?

Mvh,
Bryan Rasmussen

On Thu, May 5, 2011 at 10:57 AM, Jan Høydahl  wrote:
> Hi,
>
> Solr IS an enterprise search server. And there is only one edition :)
> I'd wait a few more weeks until the Solr 3.1 books are available, and then 
> read up on it.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 5. mai 2011, at 09.37, bryan rasmussen wrote:
>
>> I am asking specifically because I am wondering if it is worth my time
>> too read the Enterprise server book or if there is too much of a
>> branch between the two?
>>
>> If I read the book are there any parts of the book specifically that
>> won't be relevant?
>>
>> Thanks,
>> Bryan Rasmussen
>
>


Why is org.apache.solr.response.XMLWriter final?

2011-05-05 Thread Gabriele Kahlout
Hello,

It's final in the trunk, and has always been since conception in 2006 at
revision 372455. Why?

-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Format date before indexing it

2011-05-05 Thread Marc SCHNEIDER
Hi,

I have to index records that have fields containing date.
This date can be : "2011", "2011-05", "2015-05-01". Trailing characters also
can be slashes.
I'd like to convert theses values into a valid date for Solr.

So my question is : what is the best way to achieve this?
1) Use solr.DateField and make my own filter to that I get the date in the
right format
2) Subclass solr.DateField ?

Thanks in advance,
Marc.


Is it possible to load all indexed data in search request

2011-05-05 Thread Kannan
Hi 

 I can load all indexed data using /select request and query param as "*:*".
I tried  same with /Search request but it didn't work. Even it didn't work
for "*" as query value. I am using "disMax" handler. Is it possible to load
all indexed data in search and suggest request?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-load-all-indexed-data-in-search-request-tp2902808p2902808.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to load all indexed data in search request

2011-05-05 Thread Gora Mohanty
On Thu, May 5, 2011 at 3:48 PM, Kannan  wrote:
> Hi
>
>  I can load all indexed data using /select request and query param as "*:*".
> I tried  same with /Search request but it didn't work. Even it didn't work
> for "*" as query value. I am using "disMax" handler. Is it possible to load
> all indexed data in search and suggest request?

If I understand correctly, you are trying to retrieve all Solr records in one
go: Question 3.8 in the FAQ ( http://wiki.apache.org/solr/FAQ )
addresses this.

Regards,
Gora


Re: Is it possible to load all indexed data in search request

2011-05-05 Thread Ahmet Arslan


> I am using "disMax" handler. Is it
> possible to load
> all indexed data in search and suggest request?

With dismax, you can use q.alt=*:* parameter. Don't use q parameter at all.


Re: Format date before indexing it

2011-05-05 Thread Ahmet Arslan


--- On Thu, 5/5/11, Marc SCHNEIDER  wrote:

> From: Marc SCHNEIDER 
> Subject: Format date before indexing it
> To: "solr-user" 
> Date: Thursday, May 5, 2011, 12:51 PM
> Hi,
> 
> I have to index records that have fields containing date.
> This date can be : "2011", "2011-05", "2015-05-01".
> Trailing characters also
> can be slashes.
> I'd like to convert theses values into a valid date for
> Solr.
> 
> So my question is : what is the best way to achieve this?
> 1) Use solr.DateField and make my own filter to that I get
> the date in the
> right format
> 2) Subclass solr.DateField ?

http://wiki.apache.org/solr/UpdateRequestProcessor 
or 
http://wiki.apache.org/solr/DataImportHandler#Transformer if you are using DIH.


Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread François Schiettecatte
Rajani

You might also want to look at Balie ( http://balie.sourceforge.net/ ), from 
the web site:

Features:

• language identification
• tokenization
• sentence boundary detection
• named-entity recognition


Can't vouch for it though.




On May 5, 2011, at 4:58 AM, Jan Høydahl wrote:

> Hi,
> 
> Solr does not have lemmatization out of the box.
> 
> You'll have to find 3rd party analyzers, and the most known such is from 
> BasisTech. Please contact them to learn more.
> 
> I'm not aware of any open source lemmatizers for Solr.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 5. mai 2011, at 10.34, rajini maski wrote:
> 
>> Does the solr enable lemmatization concept?
>> 
>> 
>> 
>>  I found a documentation that gives an information as solr enables
>> lemmatization concept. Here is the link :
>> http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf
>> 
>> Can anyone help me finding the jar specified in that document so that i can
>> add it as plugin.
>> jar :rlp.solr.RLPTokenizerFactory
>> 
>> 
>> Thanks and Regards,
>> Rajani Maski
> 



[ann] Lily 1.0 is out: Smart Data at Scale, made Easy!

2011-05-05 Thread Steven Noels
Hi all,

We’re really proud to release the first official major release of Lily
- our flagship repository for scalable data and content management,
after 18 months of intense engineering work. We’re thrilled being
first to launch the first open source, general-purpose,
highly-scalable yet flexible data repository based on NOSQL/BigData
technology: read all about it below.

>What

Lily is a data and content repository made for the Age of Data: it
allows you to store and manage vast amounts of data, and in the future
will allow you to monetize user interactions by tracking and analyzing
audience data.

Lily makes Big Data easy with a high-level, developer-friendly data
model with rich types, versioning and schema management. Lily offers
simple Java and REST APIs for creating, reading and managing data. Its
flexible indexing mechanism supports interactive and batch-oriented
index maintenance.

Lily is the foundation for any large-scale data-centric application:
social media, e-commerce, large content management applications,
product catalogs, archiving, media asset management: any data-centric
application with an ambition to scale beyond a single-server setup.

Lily is dead serious about Scale. The Lily repository has been tested
to scale beyond any common content repository technology out there,
due to its inherently distributed architecture, providing economically
affordable, robust, and high-performing data management services for
any kind of enterprise application.

>For whom

Lily puts BigData technology within reach of enterprise and corporate
developers, wrapping high-care leading-edge technology in a
developer-and administrator-friendly package. Lily offers the
flexibility and scalability of Apache HBase, the de-facto leading
Google BigTable implementation, and the sophistication and robustness
of Apache SOLR, the market leader of open source enterprise and
internet search. Lily sits on the shoulders of these Big Data
revolution leaders, and provides additional ease of use needed for
corporate adoption.

>Thanks

Lily builds further upon the best data and search technology out
there: Apache HBase and SOLR. HBase is in use at some of the largest
data properties out there: Facebook, StumbleUpon and Yahoo! SOLR is
rapidly replacing proprietary enterprise search solutions all over the
place and is one of the most popular open source projects at the
Apache Software Foundation. We're thankful for the developer
communities working hard on these projects, and strive hard to
contribute back where possible. We're also appreciative of the
commercial service suppliers backing these projects: Lucid Imagination
and Cloudera.

>Where

Everything Lily can be found at www.lilyproject.org. Enjoy!

Thanks,

The Lily team @ http://outerthought.org/

Outerthought
Scalable Smart Data, made Easy
Makers of Kauri, Daisy CMS and Lily


Programmatic restructuring of a Solr cloud

2011-05-05 Thread Sergey Sazonov

Dear Solr Experts,

First of all, I would like to thank you for your patience when answering 
questions of those who are less experienced.


And now to the main topic: I would like to learn whether it is possible 
to restructure a Solr cloud programmatically.


Let me describe the system we are designing to make the requirements 
clear. The indexed documents are certain log entries. We are planning to 
shard them by month, and only keep the last 12 months in the index. We 
are going to replicate each shard across several servers.


Now, the user is always required to search within a single month (= 
shard). Most importantly, we expect an absolute majority of the requests 
to query the current month, with only a minor load on the previous 
months. In order to utilise the cluster most efficiently, we would like 
a majority of the servers to contain replicas of the current month data, 
and have only one or two servers per older month. To this end, we are 
planning to have a set of slaves that "migrate" from master to master, 
depending on which master holds the data for the current month. When a 
new month starts, those slaves have to be reconfigured to hold the new 
shard and to replicate from the new master (their old master now holding 
the data for the previous month).


Since this operation has to be done every month, we are naturally 
considering automating it. So my question is whether anyone has faced a 
similar problem before, and what is the best way to solve it. We are not 
committed to any solution, or even architecture, so feel free to propose 
different solutions. The only requirement is that a majority of the 
servers should be able to serve requests to the current month at any 
given moment.


Thank you in advance for your answers.

Best regards,
Sergey Sazonov.


Re: why query chinese character with bracket become phrase query by default?

2011-05-05 Thread Michael McCandless
Unfortunately, the current out-of-the-box defaults (example config)
for Solr are a disaster for non-whitespace languages (CJK, Thai,
etc.), ie, exactly what you've hit.

This is because Lucene's QueryParser can unexpectedly, dangerously,
create PhraseQuery even when the user did not ask for it ("auto
phrase").  Not only does this mean no results for non-whitespace
languages, but it also means worse search performance (PhraseQuery is
usually more costly than TermQuerys).

Lucene leaves this "auto phrase" behavior off by default, but Solr
defaults it to on.

Robert's email gives a good description of how you can turn it off.

The very first thing every non-whitespace language Solr app should do
is turn  off autoGeneratePhraseQueries!

Mike

http://blog.mikemccandless.com

On Wed, May 4, 2011 at 8:21 PM, cyang2010  wrote:
> Hi,
>
> In solr admin query full interface page, the following query with english
> become term query according to debug :
>
> title_en_US: (blood red)
>
> 
> title_en_US: (blood red)
> title_en_US: (blood red)
> title_en_US:blood title_en_US:red
> title_en_US:blood title_en_US:red
>
>
> However, using the same syntax with two chinese terms, the query result into
> a phrase query:
>
> title_zh_CN: (我活)
>
> 
> title_zh_CN: (我活)
> title_zh_CN: (我活)
> PhraseQuery(title_zh_CN:"我 活")
> title_zh_CN:"我 活"
>
>
> I do have different tokenizer/filter for those two different fields.
> title_en_US is using all those common english specific tokenizer, while
> title_zh_CN uses solr.ChineseTokenizerFactory.
>
> I don't think those tokenizer determin whether things within bracket become
> term queries or phrase queries.
>
> I really need to blindly pass user-input text to a solr field without doing
> any parsing, and hope it is all doing term query for each term contained in
> the search text.
>
> How do i achieve that?
>
> Thanks,
>
>
> cy
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/why-query-chinese-character-with-bracket-become-phrase-query-by-default-tp2901542p2901542.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


How do I debug "Unable to evaluate expression using this context" printed at start?

2011-05-05 Thread Gabriele Kahlout
I've tried to re-install solr on tomcat, and now when I launch tomcat in
debug mode I see the following exception relating to solr. It's not enough
to understand the problem (and fix it), but I don't know where to look for
more (or what to do). Please help me.

Following the tutorial and discussion here, this is my context descriptor
(solr.xml):



  


(the war exists)
$ ls $SOLR_HOME/dist/solr.war
/Users/simpatico/SOLR_HOME//dist/solr.war

$ ls $SOLR_HOME/conf/solrconfig.xml
/Users/simpatico/SOLR_HOME//conf/solrconfig.xml

When Tomcat starts:

INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME
May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader 
INFO: Solr home set to '/Users/simpatico/SOLR_HOME/'
...
INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to
classloader
May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log
SEVERE:
*javax.xml.transform.TransformerException: Unable to evaluate expression
using this context*
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at
org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:98)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
Caused by: java.lang.RuntimeException: Unable to evaluate expression using
this context
at
com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
at
com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
... 18 more
-
java.lang.RuntimeException: Unable to evaluate expression using this context
at
com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
at
com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at
org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:98)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
at
org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
--- linked to --
javax.xml.xpath.XPathExpressionException
at
com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:289)
at
org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:30

Re: Programmatic restructuring of a Solr cloud

2011-05-05 Thread Jan Høydahl
Hi,

One approach if you're using Amazon is using BeanStalk

* Create one master with 12 cores, named "jan", "feb", "mar" etc
* Every month, you clear the current month index and switch indexing to it
  You will only have one master, because you're only indexing to one month at a 
time
* For each of the 12 months, setup an Amazon BeanStalk instance with a Solr 
replica pointing to its master
  This way, Amazon will spin off replicas as needed
  NOTE: Your replica could still be located at /solr/select even if it 
replicates from /solr/may/replication
* You only query the replicas, and the client will control whether to query one 
or more shards
  
&shards=jan.elasticbeanstalk.com/solr,feb.elasticbeanstalk.com/solr,mar.elasticbeanstalk.com/solr

After this is setup, you have 0 config to worry about :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 14.03, Sergey Sazonov wrote:

> Dear Solr Experts,
> 
> First of all, I would like to thank you for your patience when answering 
> questions of those who are less experienced.
> 
> And now to the main topic: I would like to learn whether it is possible to 
> restructure a Solr cloud programmatically.
> 
> Let me describe the system we are designing to make the requirements clear. 
> The indexed documents are certain log entries. We are planning to shard them 
> by month, and only keep the last 12 months in the index. We are going to 
> replicate each shard across several servers.
> 
> Now, the user is always required to search within a single month (= shard). 
> Most importantly, we expect an absolute majority of the requests to query the 
> current month, with only a minor load on the previous months. In order to 
> utilise the cluster most efficiently, we would like a majority of the servers 
> to contain replicas of the current month data, and have only one or two 
> servers per older month. To this end, we are planning to have a set of slaves 
> that "migrate" from master to master, depending on which master holds the 
> data for the current month. When a new month starts, those slaves have to be 
> reconfigured to hold the new shard and to replicate from the new master 
> (their old master now holding the data for the previous month).
> 
> Since this operation has to be done every month, we are naturally considering 
> automating it. So my question is whether anyone has faced a similar problem 
> before, and what is the best way to solve it. We are not committed to any 
> solution, or even architecture, so feel free to propose different solutions. 
> The only requirement is that a majority of the servers should be able to 
> serve requests to the current month at any given moment.
> 
> Thank you in advance for your answers.
> 
> Best regards,
> Sergey Sazonov.



Controlling webapp startup

2011-05-05 Thread Benson Margulies
There are two ways to characterize what I'd like to do.

1) use the EmbeddedSolrServer to launch Solr, and subsequently enable
the HTTP GET/json servlet. I can provide the 'servlet' wiring, I just
need to be able to hand an HttpServletRequest to something and
retrieve in return the same json that would come back from the usual
Solr servlet.

2) Use the usual Solr servlet apparatus, but defer its startup until
other code in the webapp makes up its mind about configuration and
calls System.setProperty to locate the solr home and data directories.


fast case-insensitive autocomplete

2011-05-05 Thread Kusenda, Brandyn J
Hi.
I need an autocomplete solution to handle case-insensitive queries but
return the original text with the case still intact.   I've experimented
with both the Suggester and TermComponent methods.  TermComponent is working
when I use the regex option, however, it is far to slow.   I get the speed i
want by using term.prefix for by using the suggester but it's case
sensitive.

Here is an example operating on a user directory:

Query: bran
Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian 
Smith, ...

A solution that I would expect to work would be to store two fields; one
containing the original text and the other containing the lowercase.  Then
convert the query to lower case and run the query against the lower case
field and return the original (case preserved) field.
Unfortunately, I can't get a TermComponent query to return additional
fields.  It only returns the field it's searching against.  Should this work
or can I only return additional fields for standard queries.

Thanks in advance,
Brandyn


RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele,

The sequence should be

1. svn update
2. ant get-maven-poms
3. mvn -N -Pbootstrap install

I think you left out #2 - there was a very recent change to the POMs that 
affects the noggit jar name.

Steve

> -Original Message-
> From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
> Sent: Thursday, May 05, 2011 1:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Is it possible to build Solr as a maven project?
> 
> Thank you so much for this gem, David!
> 
> I still don't manage to build though:
> $ svn update
> At revision 1099684.
> 
> $ mvn clean
> 
> $ mvn -N -Pbootstrap install
> 
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 8.234s
> [INFO] Finished at: Thu May 05 07:21:34 CEST 2011
> [INFO] Final Memory: 12M/81M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
> (install-solr-noggit) on project lucene-solr-grandparent: Error
> installing
> artifact 'org.apache.solr:solr-noggit:jar': Failed to install artifact
> org.apache.solr:solr-noggit:jar:4.0-SNAPSHOT:
> /Users/simpatico/debug/solr4/solr/lib/apache-solr-noggit-r944541.jar (No
> such file or directory) -> [Help 1]
> 
> 
> On Thu, May 5, 2011 at 12:02 AM, Smiley, David W. 
> wrote:
> 
> > Hi folks. What you're supposed to do is run:
> >
> > mvn -N -Pbootstrap install
> >
> > as the very first one-time only step.  It copies several custom jar
> files
> > into your local repository. From then on you can build like normally
> with
> > maven.
> >
> > ~ David Smiley
> > Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
> >
> >
> > On May 4, 2011, at 2:36 PM, Gabriele Kahlout wrote:
> >
> > > but it doesn't build.
> > >
> > > Now, I've checked out solr4 from the trunk and tried to build the
> maven
> > > project there, but it fails downloading berkleydb:
> > >
> > > BUILD FAILURE
> > > -
> ---
> > > Total time: 1:07.367s
> > > Finished at: Wed May 04 20:33:29 CEST 2011
> > > Final Memory: 24M/81M
> > > -
> ---
> > > Failed to execute goal on project lucene-bdb: Could not resolve
> > dependencies
> > > for project org.apache.lucene:lucene-bdb:jar:4.0-SNAPSHOT: Failure to
> > find
> > > com.sleepycat:berkeleydb:jar:4.7.25 in
> > > http://download.carrot2.org/maven2/was cached in the local
> repository,
> > > resolution will not be reattempted until
> > > the update interval of carrot2.org has elapsed or updates are forced
> ->
> > > [Help 1]
> > >
> > >
> > > I looked up to get the jar on my own but I didn't find a 4.7.25
> version,
> > the
> > > latest on oracle website (java edition) is 4.1. Where can i download
> this
> > > maven dependency from?
> > >
> > > On Wed, May 4, 2011 at 1:26 PM, Gabriele Kahlout
> > > wrote:
> > >
> > >> It worked after checking out the dev-tools folder. Thank you!
> > >>
> > >>
> > >> On Wed, May 4, 2011 at 1:20 PM, lboutros  wrote:
> > >>
> > >>> 
> > >>>  > >>> description="Copy Maven POMs from dev-tools/maven/ to their
> > >>> target
> > >>> locations">
> > >>>   
> > >>> 
> > >>> 
> > >>>   
> > >>> 
> > >>> 
> > >>>   
> > >>> 
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Regards,
> > >> K. Gabriele
> > >>
> > >> --- unchanged since 20/9/10 ---
> > >> P.S. If the subject contains "[LON]" or the addressee acknowledges
> the
> > >> receipt within 48 hours then I don't resend the email.
> > >> subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> > >> time(x) < Now + 48h) ⇒ ¬resend(I, this).
> > >>
> > >> If an email is sent by a sender that is not a trusted contact or the
> > email
> > >> does not contain a valid code then the email is not received. A
> valid
> > code
> > >> starts with a hyphen and ends with "X".
> > >> ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
> y ∈
> > >> L(-[a-z]+[0-9]X)).
> > >>
> > >>
> > >
> > >
> > > --
> > > Regards,
> > > K. Gabriele
> > >
> > > --- unchanged since 20/9/10 ---
> > > P.S. If the subject contains "[LON]" or the addressee acknowledges
> the
> > > receipt within 48 hours then I don't resend the email.
> > > subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
> > time(x)
> > > < Now + 48h) ⇒ ¬resend(I, this).
> > >
> > > If an email is sent by a sender that is not a trusted contact or the
> > email
> > > does not contain a valid code then the email is not received. A valid
> > code
> > > starts with a hyphen and ends with "X".
> > > ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧
> y ∈
> > > L(-[a-z]+[0-9]X)).
> >
> >
> >
> >
> >
> >
> 
> 
> --
> Regards,
> K. Gabriele
> 
> --- unchanged since 20/9/10

Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James

Hi All,

I have solr and tika installed and am happily extracting and indexing 
various files.
Unfortunately on some word documents it blows up since it tries to 
auto-generate a 'title' field but my title field in the schema is single 
valued.


Here is my config for the extract handler...

class="org.apache.solr.handler.extraction.ExtractingRequestHandler">


ignored_



Is there a config option to make it only extract text, or ideally to 
allow me to specify which metadata fields to accept ?


E.g. I'd like to use any author metadata it finds but to not use any 
title metadata it finds as I want title to be single valued and set 
explicitly using a literal.title in the post request.


I did look around for some docs but all i can find are very basic 
examples. there's no comprehensive configuration documentation out there 
as far as I can tell.



ALSO...

I get some other bad responses coming back such as...

Apache Tomcat/6.0.28 - Error 
report 
HTTP Status 500 - org.ap

ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

java.lang.NoSuchMethodError: 
org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)

at java.lang.Thread.run(Thread.java:636)
type Status 
reportmessage 
org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;


For the above my url was...

 
http://localhost:8080/solr/update/extract?literal.id=3922&defaultField=content&fmap.content=content&uprefix=ignored_&stream.contentType=application%2Fvnd.ms-powerpoint&commit=true&literal.title=Reactor+cycle+141&literal.not
es=&literal.tag=UCN_production&literal.author=Maurits+van+der+Grinten

I guess there's something special I need to be able to process power 
point files ? Maybe I need to get the latest apache POI ? Any 
suggestions welcome...



Regards,

Emyr


Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Jay Luker
Hi Emyr,

You could try using the "extractOnly=true" parameter [1]. Of course,
you'll need to repost the extracted text manually.

--jay

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


On Thu, May 5, 2011 at 9:36 AM, Emyr James  wrote:
> Hi All,
>
> I have solr and tika installed and am happily extracting and indexing
> various files.
> Unfortunately on some word documents it blows up since it tries to
> auto-generate a 'title' field but my title field in the schema is single
> valued.
>
> Here is my config for the extract handler...
>
>  class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
> 
> ignored_
> 
> 
>
> Is there a config option to make it only extract text, or ideally to allow
> me to specify which metadata fields to accept ?
>
> E.g. I'd like to use any author metadata it finds but to not use any title
> metadata it finds as I want title to be single valued and set explicitly
> using a literal.title in the post request.
>
> I did look around for some docs but all i can find are very basic examples.
> there's no comprehensive configuration documentation out there as far as I
> can tell.
>
>
> ALSO...
>
> I get some other bad responses coming back such as...
>
> Apache Tomcat/6.0.28 - Error report
> HTTP Status 500 - org.ap
> ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
>
> java.lang.NoSuchMethodError:
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
>    at
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
>    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>    at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
>    at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
>    at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>    at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>    at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>    at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>    at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>    at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>    at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>    at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>    at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>    at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>    at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>    at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>    at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>    at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>    at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>    at
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>    at java.lang.Thread.run(Thread.java:636)
> type Status
> reportmessage
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
>
> For the above my url was...
>
>  http://localhost:8080/solr/update/extract?literal.id=3922&defaultField=content&fmap.content=content&uprefix=ignored_&stream.contentType=application%2Fvnd.ms-powerpoint&commit=true&literal.title=Reactor+cycle+141&literal.not
> es=&literal.tag=UCN_production&literal.author=Maurits+van+der+Grinten
>
> I guess there's something special I need to be able to process power point
> files ? Maybe I need to get the latest apache POI ? Any suggestions
> welcome...
>
>
> Regards,
>
> Emyr
>


Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread Gabriele Kahlout
Okay, that sequence worked, but then shouldn't I be able to do $ mvn install
afterwards? This is what I get:

...
Compiling 478 source files to /Users/simpatico/debug/solr4/solr/build/solr
-
COMPILATION ERROR :
-
org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
package com.google.common.io does not exist
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
com.google.common.collect does not exist
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[29,27] package
com.google.common.io does not exist
org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[29,4] cannot
find symbol
symbol  : variable ByteStreams
location: class org.apache.solr.spelling.suggest.fst.InputStreamDataInput
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[128,57] cannot find
symbol
symbol  : variable Lists
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[170,26] cannot find
symbol
symbol  : variable Lists
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[203,35] cannot find
symbol
symbol  : variable Lists
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[529,6] cannot find
symbol
symbol  : variable Closeables
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
org/apache/solr/spelling/suggest/fst/FSTLookup.java:[551,6] cannot find
symbol
symbol  : variable Closeables
location: class org.apache.solr.spelling.suggest.fst.FSTLookup
9 errors
-

Reactor Summary:

Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS [13.255s]
Lucene parent POM . SUCCESS [0.199s]
Lucene Core ... SUCCESS [15.528s]
Lucene Test Framework . SUCCESS [4.657s]
Lucene Common Analyzers ... SUCCESS [16.770s]
Lucene Contrib Ant  SUCCESS [1.103s]
Lucene Contrib bdb  SUCCESS [0.883s]
Lucene Contrib bdb-je . SUCCESS [0.872s]
Lucene Database aggregator POM  SUCCESS [0.091s]
Lucene Demo ... SUCCESS [0.842s]
Lucene Memory . SUCCESS [0.726s]
Lucene Queries  SUCCESS [1.559s]
Lucene Highlighter  SUCCESS [3.007s]
Lucene InstantiatedIndex .. SUCCESS [1.224s]
Lucene Lucli .. SUCCESS [1.579s]
Lucene Miscellaneous .. SUCCESS [1.163s]
Lucene Query Parser ... SUCCESS [4.274s]
Lucene Spatial  SUCCESS [1.159s]
Lucene Spellchecker ... SUCCESS [0.841s]
Lucene Swing .. SUCCESS [1.177s]
Lucene Wordnet  SUCCESS [0.816s]
Lucene XML Query Parser ... SUCCESS [1.197s]
Lucene Contrib aggregator POM . SUCCESS [0.079s]
Lucene ICU Analysis Components  SUCCESS [1.494s]
Lucene Phonetic Filters ... SUCCESS [0.759s]
Lucene Smart Chinese Analyzer . SUCCESS [3.534s]
Lucene Stempel Analyzer ... SUCCESS [1.537s]
Lucene Analysis Modules aggregator POM  SUCCESS [0.081s]
Lucene Benchmark .. SUCCESS [3.693s]
Lucene Modules aggregator POM . SUCCESS [0.147s]
Apache Solr parent POM  SUCCESS [0.099s]
Apache Solr Solrj . SUCCESS [3.670s]
Apache Solr Core .. FAILURE [7.842s]

On Thu, May 5, 2011 at 3:36 PM, Steven A Rowe  wrote:

> Hi Gabriele,
>
> The sequence should be
>
> 1. svn update
> 2. ant get-maven-poms
> 3. mvn -N -Pbootstrap install
>
> I think you left out #2 - there was a very recent change to the POMs that
> affects the noggit jar name.
>
> Steve
>
> > -Original Message-
> > From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
> > Sent: Thursday, May 05, 2011 1:22 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Is it possible to build Solr as a maven project?
> >
> > Thank you so much for this gem, David!
> >
> > I still don't manage to build though:
> > $ svn update
> > At revision 1099684.
> >
> > $ mvn clean
> >
> > $ mvn -N -Pbootstrap install
> >
> > [INFO]
> > 
> > [INFO] BUILD FAILURE
> > [INFO]
> >

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James
Thanks for the suggestion but there surely must be a better way than 
that to do it ?
I don't want to post the whole file up, get it extracted on the server, 
send the extracted text back to the client then send it all back up to 
the server again as plain text.


On 05/05/11 14:55, Jay Luker wrote:

Hi Emyr,

You could try using the "extractOnly=true" parameter [1]. Of course,
you'll need to repost the extracted text manually.

--jay

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


On Thu, May 5, 2011 at 9:36 AM, Emyr James  wrote:

Hi All,

I have solr and tika installed and am happily extracting and indexing
various files.
Unfortunately on some word documents it blows up since it tries to
auto-generate a 'title' field but my title field in the schema is single
valued.

Here is my config for the extract handler...



ignored_



Is there a config option to make it only extract text, or ideally to allow
me to specify which metadata fields to accept ?

E.g. I'd like to use any author metadata it finds but to not use any title
metadata it finds as I want title to be single valued and set explicitly
using a literal.title in the post request.

I did look around for some docs but all i can find are very basic examples.
there's no comprehensive configuration documentation out there as far as I
can tell.


ALSO...

I get some other bad responses coming back such as...

Apache Tomcat/6.0.28 - Error report
HTTP Status 500 - org.ap
ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

java.lang.NoSuchMethodError:
org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:636)
type  Status
reportmessage
org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

For the above my url was...

  
http://localhost:8080/solr/update/extract?literal.id=3922&defaultField=content&fmap.content=content&uprefix=ignored_&stream.contentType=application%2Fvnd.ms-powerpoint&commit=true&literal.title=Reactor+cycle+141&literal.not
es=&literal.tag=UCN_production&literal.author=Maurits+van+der+Grinten

I guess there's something special I need to be able to process power point
files ? Maybe I need to get the latest apache POI ? Any suggestions
welcome...


Regards,

Emyr





Re: why query chinese character with bracket become phrase query by default?

2011-05-05 Thread Yonik Seeley
2011/5/5 Michael McCandless :
> The very first thing every non-whitespace language Solr app should do
> is turn  off autoGeneratePhraseQueries!

Luckily, this is configurable per FieldType... so if it doesn't exist
yet, we should come up with a good
CJK fieldtype to add to the example schema.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco


Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Anuj Kumar
Hi Emyr,

You can try the XPath based approach and see if that works. Also, see if
dynamic fields can help you for the meta data fields.

References-
http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters
http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput

Regards,
Anuj

On Thu, May 5, 2011 at 7:28 PM, Emyr James  wrote:

> Thanks for the suggestion but there surely must be a better way than that
> to do it ?
> I don't want to post the whole file up, get it extracted on the server,
> send the extracted text back to the client then send it all back up to the
> server again as plain text.
>
>
> On 05/05/11 14:55, Jay Luker wrote:
>
>> Hi Emyr,
>>
>> You could try using the "extractOnly=true" parameter [1]. Of course,
>> you'll need to repost the extracted text manually.
>>
>> --jay
>>
>> [1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only
>>
>>
>> On Thu, May 5, 2011 at 9:36 AM, Emyr James
>>  wrote:
>>
>>> Hi All,
>>>
>>> I have solr and tika installed and am happily extracting and indexing
>>> various files.
>>> Unfortunately on some word documents it blows up since it tries to
>>> auto-generate a 'title' field but my title field in the schema is single
>>> valued.
>>>
>>> Here is my config for the extract handler...
>>>
>>> >> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
>>> 
>>> ignored_
>>> 
>>> 
>>>
>>> Is there a config option to make it only extract text, or ideally to
>>> allow
>>> me to specify which metadata fields to accept ?
>>>
>>> E.g. I'd like to use any author metadata it finds but to not use any
>>> title
>>> metadata it finds as I want title to be single valued and set explicitly
>>> using a literal.title in the post request.
>>>
>>> I did look around for some docs but all i can find are very basic
>>> examples.
>>> there's no comprehensive configuration documentation out there as far as
>>> I
>>> can tell.
>>>
>>>
>>> ALSO...
>>>
>>> I get some other bad responses coming back such as...
>>>
>>> Apache Tomcat/6.0.28 - Error
>>> report
>>> HTTP Status 500 - org.ap
>>> ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
>>>
>>> java.lang.NoSuchMethodError:
>>>
>>> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
>>>at
>>>
>>> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
>>>at
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>>>at
>>> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>>>at
>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
>>>at
>>> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
>>>at
>>>
>>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>>>at
>>>
>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>>>at
>>>
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>at
>>>
>>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>>>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>>>at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>>at
>>>
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>>at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>at
>>>
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>at
>>>
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>at
>>>
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>at
>>>
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>>at
>>>
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>at
>>>
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>at
>>>
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>>>at
>>>
>>> org.apache.coyote.http11.

Re: Field names with a period (.)

2011-05-05 Thread Leonardo Souza
Thanks Gora!

[ ]'s
Leonardo da S. Souza
 °v°   Linux user #375225
 /(_)\   http://counter.li.org/
 ^ ^



On Thu, May 5, 2011 at 3:09 AM, Gora Mohanty  wrote:

> On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza 
> wrote:
> > Hi guys,
> >
> > Can i have a field name with a period(.) ?
> > Like in *file.size*
>
> Cannot find now where this is documented, but from what I remember it is
> recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in
> field names, and some special characters are known to cause problems.
>
> Regards,
> Gora
>


RE: Patch problems solr 1.4 - solr-2010

2011-05-05 Thread Dyer, James
There is still a functionality gap in Solr's spellchecker even with Solr-2010 
applied.  If a user enters a word that is in the dictionary, solr will never 
try to correct it.  The only way around this is to use 
spellcheck.onlyMorePopular.  The problem with this approach is 
"onlyMorePopular" causes the spellchecker to assume *every* word in the query 
is a misspelling and it won't even consider the original terms in building 
collations.  What is needed is a hybrid option that will try to build 
collations using combinations of original terms, corrected terms and "more 
popular" terms.  To my knowledge, there is no way to get the spellchecker to do 
that currently.

On the other hand, if you're pretty sure "man" is not in the dictionary, try 
upping spellcheck.count to something higher than the default (20 maybe?)...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: roySolr [mailto:royrutten1...@gmail.com] 
Sent: Thursday, May 05, 2011 3:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Patch problems solr 1.4 - solr-2010

Hello,

thanks for the answers, i use branch 1.4 and i have succesfully patch
solr-2010.

Now i want to use the collate spellchecking. How does my url look like. I
tried this but
it's not working(It's the same as solr without solr-2010).

http://localhost:8983/solr/select?q=man unitet&spellcheck.q=man
unitet&spellcheck=true&spellcheck.build=true&spellcheck.collate=true&spellcheck.collateExtendedResult=true&spellcheck.maxCollations=10&spellcheck.maxCollationTries=10

I get the collapse "man united" as suggestion. Man is good spelled, but not
in this phrase. It must
be "manchester united" and i want that solr requerying the collapse and only
give the suggestion
if it gives some results. How can i fix this??

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Patch-problems-solr-1-4-solr-2010-tp2898443p2902546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Highlighting words with non-ascii chars

2011-05-05 Thread Pavel Kukačka
Thanks for the suggestion, Peter;

the problem was elsewhere though - somewhere in the highlighting
module.
I've fixed it by adding (into the field definition in schema.xml) a
custom czech charFilter (mappings from "í" => "i") - then it started to
work as expected.

Cheers,
Pavel


Peter Wolanin píše v Po 02. 05. 2011 v 17:38 +0200:
> Does your servlet container have the URI encoding set correctly, e.g.
> URIEncoding="UTF-8" for tomcat6?
> 
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> 
> Older versions of Jetty use ISO-8859-1 as the default URI encoding,
> but jetty 6 should use UTF-8 as default:
> 
> http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings
> 
> -Peter
> 
> On Sat, Apr 30, 2011 at 6:31 AM, Pavel Kukačka  
> wrote:
> > Hello,
> >
> >I've hit a (probably trivial) roadblock I don't know how to overcome 
> > with Solr 3.1:
> > I have a document with common fields (title, keywords, content) and I'm
> > trying to use highlighting.
> >With queries using ASCII characters there is no problem; it works 
> > smoothly. However,
> > when I search using a czech word including non-ascii chars (like "slovíčko" 
> > for example - 
> > http://localhost:8983/solr/select/?q=slov%C3%AD%C4%8Dko&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
> >  the document is found, but
> > the response doesn't contain the highlighted snippet in the highlighting 
> > node - there is only an
> > empty node - like this:
> > **
> > .
> > .
> > .
> > 
> >  
> > 
> > 
> >
> >
> > When searching for the other keyword ( 
> > http://localhost:8983/solr/select/?q=slovo&version=2.2&start=0&rows=10&indent=on&hl=on&hl.fl=*),
> >  the resulting response is fine - like this:
> > 
> > 
> >  
> > 
> >  slovíčko  > id="highlighting">slovo
> >
> >  
> > 
> >
> > 
> >
> > Did anyone come accross this problem?
> > Cheers,
> > Pavel
> >
> >
> >
> 
> 
> 




Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Emyr James

Hi,
I'm not really sure how these can help with my problem. Can you give a 
bit more info on this ?


I think what i'm after is a fairly common request..

http://lucene.472066.n3.nabble.com/Controlling-Tika-s-metadata-td2378677.html
http://lucene.472066.n3.nabble.com/Select-tika-output-for-extract-only-td499059.html#a499062

Did the change that Yonik Seely mentions to allow more control over the 
output ever make it into 1.4 ?


Regards,
Emyr

On 05/05/11 15:01, Anuj Kumar wrote:

Hi Emyr,

You can try the XPath based approach and see if that works. Also, see if
dynamic fields can help you for the meta data fields.

References-
http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters
http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput

Regards,
Anuj

On Thu, May 5, 2011 at 7:28 PM, Emyr James  wrote:


Thanks for the suggestion but there surely must be a better way than that
to do it ?
I don't want to post the whole file up, get it extracted on the server,
send the extracted text back to the client then send it all back up to the
server again as plain text.


On 05/05/11 14:55, Jay Luker wrote:


Hi Emyr,

You could try using the "extractOnly=true" parameter [1]. Of course,
you'll need to repost the extracted text manually.

--jay

[1] http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only


On Thu, May 5, 2011 at 9:36 AM, Emyr James
  wrote:


Hi All,

I have solr and tika installed and am happily extracting and indexing
various files.
Unfortunately on some word documents it blows up since it tries to
auto-generate a 'title' field but my title field in the schema is single
valued.

Here is my config for the extract handler...



ignored_



Is there a config option to make it only extract text, or ideally to
allow
me to specify which metadata fields to accept ?

E.g. I'd like to use any author metadata it finds but to not use any
title
metadata it finds as I want title to be single valued and set explicitly
using a literal.title in the post request.

I did look around for some docs but all i can find are very basic
examples.
there's no comprehensive configuration documentation out there as far as
I
can tell.


ALSO...

I get some other bad responses coming back such as...

Apache Tomcat/6.0.28 - Error
report
HTTP Status 500 - org.ap
ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

java.lang.NoSuchMethodError:

org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
at

org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
at

org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
at

org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at

org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at

org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at

org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at

org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at

org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at

org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at

org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at

org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at

org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at

org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at

org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at

org.apache.coyote.http11.Http11Processor.process(Http11Proces

Re: Text Only Extraction Using Solr and Tika

2011-05-05 Thread Ramirez, Paul M (388J)
Hey Emyr,

Looking at your stack trace below my guess is that you have two conflicting 
Apache POI jars in your classpath. The odd stack trace is indicative of that as 
the class loader is likely loading some other version of  the DirectoryNode 
class that doesn't have the iterator method. 

> java.lang.NoSuchMethodError: 
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

Thanks,
Paul Ramirez


On May 5, 2011, at 6:36 AM, Emyr James wrote:

> Hi All,
> 
> I have solr and tika installed and am happily extracting and indexing 
> various files.
> Unfortunately on some word documents it blows up since it tries to 
> auto-generate a 'title' field but my title field in the schema is single 
> valued.
> 
> Here is my config for the extract handler...
> 
>  class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
> 
> ignored_
> 
> 
> 
> Is there a config option to make it only extract text, or ideally to 
> allow me to specify which metadata fields to accept ?
> 
> E.g. I'd like to use any author metadata it finds but to not use any 
> title metadata it finds as I want title to be single valued and set 
> explicitly using a literal.title in the post request.
> 
> I did look around for some docs but all i can find are very basic 
> examples. there's no comprehensive configuration documentation out there 
> as far as I can tell.
> 
> 
> ALSO...
> 
> I get some other bad responses coming back such as...
> 
> Apache Tomcat/6.0.28 - Error 
> report 
> HTTP Status 500 - org.ap
> ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
> 
> java.lang.NoSuchMethodError: 
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
> at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
> at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
> at java.lang.Thread.run(Thread.java:636)
> type Status 
> reportmessage 
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
> 
> For the above my url was...
> 
>  
> http://localhost:8080/solr/update/extract?literal.id=3922&defaultField=content&fmap.content=content&uprefix=ignored_&stream.contentType=application%2Fvnd.ms-powerpoint&commit=true&literal.title=Reactor+cycle+141&literal.not
> es=&literal.tag=UCN_production&literal.author=Maurits+van+der+Gri

Re: UIMA analysisEngine path

2011-05-05 Thread Barry Hathaway

Tommaso,

Thanks. Now Solr finds the descriptor; however, I think this is very bad 
practice.
Descriptors really aren't meant to be jarred up. They often contain 
relative paths.

For example, in my case I have a directory that looks like:
appassemble
|- desc
|- pear

where the AnalysisEngine descriptor contained in desc is an aggregate 
analysis engine and
refers to other analysis engines packaged as installed PEAR files in the 
pear subdirectory.
As such, the descriptor contains relative paths pointing into the pear 
subdirectory.
Grabbing the descriptor from the jar breaks that since 
OverridingParamsAEProvider

uses the XMLInputSource method without relative path signature.

Barry

On 5/4/2011 6:16 AM, Tommaso Teofili wrote:

Hello Barry,
the main AnalysisEngine descriptor defined inside the
element should be inside one of the jars imported with the  elements.
At the moment it cannot be taken from expanded directories but it should be
easy to do it (and indeed useful) modifying the
OverridingParamsAEProvider class
[1] at line 57.
Hope this helps,
Tommaso

[1] :
http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_1/solr/contrib/uima/src/main/java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java?view=markup

2011/5/3 Barry Hathaway


I'm new to Solr and trying to get it call a UIMA aggregate analysis engine
and not having much luck.
The null pointer exception indicates that it can't find the xml file
associated with the engine.
I have tried a number of combinations of a path in the
  element, but nothing
seems to work. In addition, I've put the directory containing the
descriptor in both the classpath
when starting the server and in a  element in solrconfig.xml. So:

What "classpath" does the  tag effectively search for to
locate the descriptor?

Do the  entries in solrconfig.xml affect this classpath?

Do the engine descriptors have to be in a jar or can they be in an expanded
directory?

Thanks in advance.

Barry








Re: How do I debug "Unable to evaluate expression using this context" printed at start?

2011-05-05 Thread Gabriele Kahlout
While the question remains valid, I found there reason to my problem.
Backing up I had saved Tomcat's descriptor file in my $SOLR_HOME and Solr
was trying to read it as described in SolrCore
Wiki
.

What saved me was remembering Chris's earlier
remark. Thank you Chris!


On Thu, May 5, 2011 at 2:58 PM, Gabriele Kahlout
wrote:

> I've tried to re-install solr on tomcat, and now when I launch tomcat in
> debug mode I see the following exception relating to solr. It's not enough
> to understand the problem (and fix it), but I don't know where to look for
> more (or what to do). Please help me.
>
> Following the tutorial and discussion here, this is my context descriptor
> (solr.xml):
>
> 
>  crossContext="true">
>value="/Users/simpatico/SOLR_HOME" override="true"/>
> 
>
> (the war exists)
> $ ls $SOLR_HOME/dist/solr.war
> /Users/simpatico/SOLR_HOME//dist/solr.war
>
> $ ls $SOLR_HOME/conf/solrconfig.xml
> /Users/simpatico/SOLR_HOME//conf/solrconfig.xml
>
> When Tomcat starts:
> 
> INFO: Using JNDI solr.home: /Users/simpatico/SOLR_HOME
> May 5, 2011 2:46:50 PM org.apache.solr.core.SolrResourceLoader 
> INFO: Solr home set to '/Users/simpatico/SOLR_HOME/'
> ...
> INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/wstx-asl-3.2.7.jar' to
> classloader
> May 5, 2011 2:46:50 PM org.apache.solr.common.SolrException log
> SEVERE:
> *javax.xml.transform.TransformerException: Unable to evaluate expression
> using this context*
> at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363)
> at
> com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
> at
> com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
> at
> org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:98)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
> at
> org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
> at
> org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5035)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> Caused by: java.lang.RuntimeException: Unable to evaluate expression using
> this context
> at
> com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
> at
> com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
> at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
> ... 18 more
> -
> java.lang.RuntimeException: Unable to evaluate expression using this
> context
> at
> com.sun.org.apache.xpath.internal.axes.NodeSequence.setRoot(NodeSequence.java:212)
> at
> com.sun.org.apache.xpath.internal.axes.LocPathIterator.execute(LocPathIterator.java:210)
> at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:335)
> at
> com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:213)
> at
> com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
> at
> org.apache.solr.core.CoreContainer.readProperties(CoreContainer.java:303)
> at org.apache.solr.core.CoreContainer.load(CoreContainer.java:242)
> at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
> at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
> at
> org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:273)
> at
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:254)
> at
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:372)
> at
> org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:98)
> at
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4382)
> at
> org.apache.catalina.core.StandardContext$2.call(StandardContext.java:5040)
> 

SpellCheckComponent issue

2011-05-05 Thread Siddharth Powar
Hi,

(Sorry, emailing again because the last post was not posted...)

I have been using using SolrSpellCheckcomponent. One of my requirements is
that if a user types something like "add", solr would return "adidas". To
get something like this, I used EdgeNGramsFilterFactory and applied it to
the fields that I am indexing. So for adidas I will have something like "a",
"ad", "adi", "adid"... Correct me if I'm wrong, shouldnt the distance
algorithm used internally, match adidas with this approach?


Thanks,
Sid


Re: fast case-insensitive autocomplete

2011-05-05 Thread Jan Høydahl
Hi,

Try this solution using a Solr core: 
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 5. mai 2011, at 15.22, Kusenda, Brandyn J wrote:

> Hi.
> I need an autocomplete solution to handle case-insensitive queries but
> return the original text with the case still intact.   I've experimented
> with both the Suggester and TermComponent methods.  TermComponent is working
> when I use the regex option, however, it is far to slow.   I get the speed i
> want by using term.prefix for by using the suggester but it's case
> sensitive.
> 
> Here is an example operating on a user directory:
> 
> Query: bran
> Results: Branden Smith, Brandon Thompson, Brandon Verner, Brandy Finny, Brian 
> Smith, ...
> 
> A solution that I would expect to work would be to store two fields; one
> containing the original text and the other containing the lowercase.  Then
> convert the query to lower case and run the query against the lower case
> field and return the original (case preserved) field.
> Unfortunately, I can't get a TermComponent query to return additional
> fields.  It only returns the field it's searching against.  Should this work
> or can I only return additional fields for standard queries.
> 
> Thanks in advance,
> Brandyn



RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
Hi Gabriele,

On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
> Okay, that sequence worked, but then shouldn't I be able to do $ mvn
> install afterwards? This is what I get:
...
> COMPILATION ERROR :
> -
> org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
> package com.google.common.io does not exist
> org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
> com.google.common.collect does not exist
...

"mvn install" should work, but it doesn't - I can reproduce this error on my 
machine.  This is a bug in the Maven build.  

The nightly Lucene/Solr Maven build on Jenkins should have caught this 
compilation failure three weeks ago, when Dawid Weiss committed his work under 
.  Unfortunately, the nightly 
builds were using the results of compilation under the Ant build, rather than 
compiling from scratch.  I have committed a fix to the nightly build script so 
this won't happen again.

The Maven build bug is that the Solr-core Google Guava dependency was scoped as 
test-only.  Until SOLR-2378, that was true, but it is no longer.  So the fix is 
simply to remove test from the dependency declaration in the 
Solr-core POM.  I've committed this too.

If you "svn update" you will get these two fixes.

Thank you very much for persisting, and reporting the problems you have 
encountered.

Steve



Re: apache-solr-3.1 slow stats component queries

2011-05-05 Thread Johannes Goll
Hi,

I bench-marked the slow stats queries (6 point estimate) using the same
hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which
returns only the sum and count for statistics component results. Solr/Lucene
is run on jetty.

The relationship between query time and set of found documents is linear
when using the stats component (R^2 0.99). I guess this is expected as the
application needs to scan/sum-up the stat field for all matching documents?

Are there any plans for caching stat results for a certain stat field along
with the documents that match a filter query ? Any other ideas that could
help to improve this (hardware/software configuration) ?  Even for a subset
of 10M entries, the stat search takes on the order of 10 seconds.

Thanks in advance.
Johannes



2011/4/18 Johannes Goll 

> any ideas why in this case the stats summaries are so slow  ?  Thank you
> very much in advance for any ideas/suggestions. Johannes
>
>
> 2011/4/5 Johannes Goll 
>
>> Hi,
>>
>> thank you for making the new apache-solr-3.1 available.
>>
>> I have installed the version from
>>
>> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>>
>> and am running into very slow stats component queries (~ 1 minute)
>> for fetching the computed sum of the stats field
>>
>> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>>
>> 52825
>>
>> #documents: 78,359,699
>> total RAM: 256G
>> vm arguments:  -server -xmx40G
>>
>> the stats.field specification is as follows:
>> > stored="false" required="true" multiValued="false"
>> default="1"/>
>>
>> filter queries that narrow down the #docs help to reduce it -
>> QTime seems to be proportional to the number of docs being returned
>> by a filter query.
>>
>> Is there any way to improve the performance of such stats queries ?
>> Caching only helped to improve the filter query performance but if
>> larger subsets are being returned, QTime increases unacceptably.
>>
>> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
>> I have created a custom 3.1 version that does only return the sum. But
>> this
>> only slightly improved the performance. Of course I could somehow cache
>> the larger sum queries on the client side but I want to do this only as a
>> last resort.
>>
>> Thank you very much in advance for any ideas/suggestions.
>>
>> Johannes
>>
>>
>
>
> --
> Johannes Goll
> 211 Curry Ford Lane
> Gaithersburg, Maryland 20878
>


Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread lboutros
Thanks Steve, this will be really simpler next time :)

Is it documented somewhere ? If no, perhaps could we add something in this
page for example ?

http://wiki.apache.org/solr/FrontPage#Solr_Development

or here :

http://wiki.apache.org/solr/NightlyBuilds

Ludovic.

2011/5/5 steve_rowe [via Lucene] <
ml-node+2904178-33932273-383...@n3.nabble.com>

> Hi Gabriele,
>
> On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
> > Okay, that sequence worked, but then shouldn't I be able to do $ mvn
> > install afterwards? This is what I get:
> ...
> > COMPILATION ERROR :
> > -
> > org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
> > package com.google.common.io does not exist
> > org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
> > com.google.common.collect does not exist
> ...
>
> "mvn install" should work, but it doesn't - I can reproduce this error on
> my machine.  This is a bug in the Maven build.
>
> The nightly Lucene/Solr Maven build on Jenkins should have caught this
> compilation failure three weeks ago, when Dawid Weiss committed his work
> under .  Unfortunately,
> the nightly builds were using the results of compilation under the Ant
> build, rather than compiling from scratch.  I have committed a fix to the
> nightly build script so this won't happen again.
>
> The Maven build bug is that the Solr-core Google Guava dependency was
> scoped as test-only.  Until SOLR-2378, that was true, but it is no longer.
>  So the fix is simply to remove test from the dependency
> declaration in the Solr-core POM.  I've committed this too.
>
> If you "svn update" you will get these two fixes.
>
> Thank you very much for persisting, and reporting the problems you have
> encountered.
>
> Steve
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2904178.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383...@n3.nabble.com
> To unsubscribe from Solr - User, click 
> here.
>
>


-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-build-Solr-as-a-maven-project-tp2898068p2904375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread Gabriele Kahlout
Steven, thank you!

$ mvn -DskipTests=true install
works!

[INFO] Reactor Summary:
[INFO]
[INFO] Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS
[13.142s]
[INFO] Lucene parent POM . SUCCESS [0.345s]
[INFO] Lucene Core ... SUCCESS [18.448s]
[INFO] Lucene Test Framework . SUCCESS [3.560s]
[INFO] Lucene Common Analyzers ... SUCCESS [7.739s]
[INFO] Lucene Contrib Ant  SUCCESS [1.265s]
[INFO] Lucene Contrib bdb  SUCCESS [1.332s]
[INFO] Lucene Contrib bdb-je . SUCCESS [1.321s]
[INFO] Lucene Database aggregator POM  SUCCESS [0.242s]
[INFO] Lucene Demo ... SUCCESS [1.813s]
[INFO] Lucene Memory . SUCCESS [2.412s]
[INFO] Lucene Queries  SUCCESS [2.275s]
[INFO] Lucene Highlighter  SUCCESS [2.985s]
[INFO] Lucene InstantiatedIndex .. SUCCESS [2.170s]
[INFO] Lucene Lucli .. SUCCESS [1.814s]
[INFO] Lucene Miscellaneous .. SUCCESS [1.998s]
[INFO] Lucene Query Parser ... SUCCESS [2.755s]
[INFO] Lucene Spatial  SUCCESS [1.314s]
[INFO] Lucene Spellchecker ... SUCCESS [1.535s]
[INFO] Lucene Swing .. SUCCESS [1.233s]
[INFO] Lucene Wordnet  SUCCESS [1.309s]
[INFO] Lucene XML Query Parser ... SUCCESS [1.483s]
[INFO] Lucene Contrib aggregator POM . SUCCESS [0.151s]
[INFO] Lucene ICU Analysis Components  SUCCESS [2.728s]
[INFO] Lucene Phonetic Filters ... SUCCESS [1.765s]
[INFO] Lucene Smart Chinese Analyzer . SUCCESS [3.709s]
[INFO] Lucene Stempel Analyzer ... SUCCESS [4.241s]
[INFO] Lucene Analysis Modules aggregator POM  SUCCESS [0.213s]
[INFO] Lucene Benchmark .. SUCCESS [2.926s]
[INFO] Lucene Modules aggregator POM . SUCCESS [0.307s]
[INFO] Apache Solr parent POM  SUCCESS [0.233s]
[INFO] Apache Solr Solrj . SUCCESS [3.780s]
[INFO] Apache Solr Core .. SUCCESS [9.693s]
[INFO] Apache Solr Search Server . SUCCESS [6.739s]
[INFO] Apache Solr Test Framework  SUCCESS [2.699s]
[INFO] Apache Solr Analysis Extras ... SUCCESS [3.868s]
[INFO] Apache Solr Clustering  SUCCESS [6.736s]
[INFO] Apache Solr DataImportHandler . SUCCESS [4.914s]
[INFO] Apache Solr DataImportHandler Extras .. SUCCESS [2.721s]
[INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS [0.253s]
[INFO] Apache Solr Content Extraction Library  SUCCESS [1.909s]
[INFO] Apache Solr - UIMA integration  SUCCESS [1.922s]
[INFO] Apache Solr Contrib aggregator POM  SUCCESS [0.211s]
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 2:18.040s
[INFO] Finished at: Thu May 05 20:39:09 CEST 2011
[INFO] Final Memory: 38M/90M
[INFO]


On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe  wrote:

> Hi Gabriele,
>
> On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
> > Okay, that sequence worked, but then shouldn't I be able to do $ mvn
> > install afterwards? This is what I get:
> ...
> > COMPILATION ERROR :
> > -
> > org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
> > package com.google.common.io does not exist
> > org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
> > com.google.common.collect does not exist
> ...
>
> "mvn install" should work, but it doesn't - I can reproduce this error on
> my machine.  This is a bug in the Maven build.
>
> The nightly Lucene/Solr Maven build on Jenkins should have caught this
> compilation failure three weeks ago, when Dawid Weiss committed his work
> under .  Unfortunately,
> the nightly builds were using the results of compilation under the Ant
> build, rather than compiling from scratch.  I have committed a fix to the
> nightly build script so this won't happen again.
>
> The Maven build bug is that the Solr-core Google Guava dependency was
> scoped as test-only.  Until SOLR-2378, that was true, but

OverlappingFileLockException when concurrent commits in solr

2011-05-05 Thread nitesh nandy
Hello,

I'm using solr version 1.4.0 with tomcat 6. I've 2 solr instances running as
2 different web apps with separate data folders. My application requires
frequent commits from multiple clients. I've noticed that when more than one
client try to commit at the same time, these OverlappingFileLockException
start to appear. Can anything be done to rectify this problem? Please find
the error log below. Thanks

---
HTTP Status 500 - null

java.nio.channels.OverlappingFileLockException
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1215)
 at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1117)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:923)
 at java.nio.channels.FileChannel.tryLock(FileChannel.java:978)
at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233)
 at org.apache.lucene.store.Lock.obtain(Lock.java:73)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550)
 at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1407)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
 at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
 at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
 at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
 at java.lang.Thread.run(Thread.java:636)
type Status
reportmessage null

java.nio.channels.OverlappingFileLockException
at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.checkList(FileChannelImpl.java:1215)
 at
sun.nio.ch.FileChannelImpl$SharedFileLockTable.add(FileChannelImpl.java:1117)
at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:923)
 at java.nio.channels.FileChannel.tryLock(FileChannel.java:978)
at org.apache.lucene.store.NativeFSLock.obtain(NativeFSLockFactory.java:233)
 at org.apache.lucene.store.Lock.obtain(Lock.java:73)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1550)
 at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1407)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:190)
 at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:98)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
 at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:220)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1317)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java

DIH for e-mails

2011-05-05 Thread m _ 米蟲ы~
I’m using Data Import Handler for index emails.


The problem is that I wanna add my own field such as security_number.


Someone have any idea?


Regards,


--
  
 James  Bond Fang

DIH for e-mails

2011-05-05 Thread m _ 米蟲ы~
I’m using Data Import Handler for index emails.


The problem is that I wanna add my own field such as security_number.


Someone have any idea?


Regards,


--
  
 James  Bond Fang

DIH for e-mails

2011-05-05 Thread 方振鹏



I’m using Data Import Handler for index emails.

The problem is that I wanna add my own field such as security_number.

Someone have any idea?

Regards,

Jame Bond Fang



Re: DIH for e-mails

2011-05-05 Thread Peter Sturge
The best way to add your own fields is to create a custom Transformer sub-class.
See:
http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FDataImportHandler

This will guide you through the steps.

Peter


2011/5/5 方振鹏 :
>
>
>
> I’m using Data Import Handler for index emails.
>
> The problem is that I wanna add my own field such as security_number.
>
> Someone have any idea?
>
> Regards,
>
> Jame Bond Fang
>
>


Re: How do i I modify XMLWriter to write foobar?

2011-05-05 Thread Chris Hostetter

: $ xmlstarlet sel -t -c "/config/queryResponseWriter" conf/solrconfig.xml
: 
: 
: Now I comment the line in Solrconfix.xml, and there's no more writer.
: $ xmlstarlet sel -t -c "/config/queryResponseWriter" conf/solrconfig.xml
: 
: I make a query, and the XMLResponseWriter is still in charge.
: *$ curl -L http://localhost:8080/solr/select?q=apache*
: 

...

Your example request is not specifying a "wt" param.

in addition to the response writers declared in your solrconfig.xml, there 
are response writers that exist implicitly unless you define your own 
instances that override those names (xml, json, python, etc...)

the real question is: what writer do you *want* to have used when no wt is 
specified?

whatever the answer is: declare n instance of that writer with 
default="true" in your solrconfig.xml


-Hoss


Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Alexey Serba
{quote}
...
Caused by: java.io.EOFException: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was
unexpectedly lost.
   at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
   at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
   ... 22 more
Apr 21, 2011 3:53:28 AM
org.apache.solr.handler.dataimport.EntityProcessorBase getNext
SEVERE: getNext() failed for query 'REDACTED'
org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure

The last packet successfully received from the server was 128
milliseconds ago.  The last packet sent successfully to the server was
25,273,484 milliseconds ago.
...
{quote}

It could probably be because of autocommit / segment merging. You
could try to disable autocommit / increase mergeFactor

{quote}
I've used sphinx in the past, which uses multiple queries to pull out
a subset of records ranged based on PrimaryKey, does Solr offer
functionality similar to this? It seems that once a Solr index gets to
a certain size, the indexing of a batch takes longer than MySQL's
net_write_timeout, so it kills the connection.
{quote}

I was thinking about some hackish solution to paginate results

  
  

Or something along those lines ( you'd need to to calculate offset in
pages query )

But unfortunately MySQL does not provide generate_series function
(it's postgres function and there'r similar solutions for oracle and
mssql).


On Mon, Apr 25, 2011 at 3:59 AM, Scott Bigelow  wrote:
> Thank you everyone for your help. I ended up getting the index to work
> using the exact same config file on a (substantially) larger instance.
>
> On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson  
> wrote:
>> {{{A custom indexer, so that's a fairly common practice? So when you are
>> dealing with these large indexes, do you try not to fully rebuild them
>> when you can? It's not a nightly thing, but something to do in case of
>> a disaster? Is there a difference in the performance of an index that
>> was built all at once vs. one that has had delta inserts and updates
>> applied over a period of months?}}}
>>
>> Is it a common practice? Like all of this, "it depends". It's certainly
>> easier to let DIH do the work. Sometimes DIH doesn't have all the
>> capabilities necessary. Or as Chris said, in the case where you already
>> have a system built up and it's easier to just grab the output from
>> that and send it to Solr, perhaps with SolrJ and not use DIH. Some people
>> are just more comfortable with their own code...
>>
>> "Do you try not to fully rebuild". It depends on how painful a full rebuild
>> is. Some people just like the simplicity of starting over every 
>> day/week/month.
>> But you *have* to be able to rebuild your index in case of disaster, and
>> a periodic full rebuild certainly keeps that process up to date.
>>
>> "Is there a difference...delta inserts...updates...applied over months". Not
>> if you do an optimize. When a document is deleted (or updated), it's only
>> marked as deleted. The associated data is still in the index. Optimize will
>> reclaim that space and compact the segments, perhaps down to one.
>> But there's no real operational difference between a newly-rebuilt index
>> and one that's been optimized. If you don't delete/update, there's not
>> much reason to optimize either
>>
>> I'll leave the DIH to others..
>>
>> Best
>> Erick
>>
>> On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow  wrote:
>>> Thanks for the e-mail. I probably should have provided more details,
>>> but I was more interested in making sure I was approaching the problem
>>> correctly (using DIH, with one big SELECT statement for millions of
>>> rows) instead of solving this specific problem. Here's a partial
>>> stacktrace from this specific problem:
>>>
>>> ...
>>> Caused by: java.io.EOFException: Can not read response from server.
>>> Expected to read 4 bytes, read 0 bytes before connection was
>>> unexpectedly lost.
>>>        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
>>>        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
>>>        ... 22 more
>>> Apr 21, 2011 3:53:28 AM
>>> org.apache.solr.handler.dataimport.EntityProcessorBase getNext
>>> SEVERE: getNext() failed for query 'REDACTED'
>>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>>> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
>>> Communications link failure
>>>
>>> The last packet successfully received from the server was 128
>>> milliseconds ago.  The last packet sent successfully to the server was
>>> 25,273,484 milliseconds ago.
>>> ...
>>>
>>>
>>> A custom indexer, so that's a fairly common practice? So when you are
>>> dealing with these large indexes, do you try not to fully rebuild them
>>> when you can? It's not a nightly thing, but something to do in case of
>>> a disaster? Is there a difference in the performance

RE: Is it possible to build Solr as a maven project?

2011-05-05 Thread Steven A Rowe
You're welcome, I'm glad you got it to work. - Steve

> -Original Message-
> From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
> Sent: Thursday, May 05, 2011 2:41 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Is it possible to build Solr as a maven project?
> 
> Steven, thank you!
> 
> $ mvn -DskipTests=true install
> works!
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS
> [13.142s]
> [INFO] Lucene parent POM . SUCCESS
> [0.345s]
> [INFO] Lucene Core ... SUCCESS
> [18.448s]
> [INFO] Lucene Test Framework . SUCCESS
> [3.560s]
> [INFO] Lucene Common Analyzers ... SUCCESS
> [7.739s]
> [INFO] Lucene Contrib Ant  SUCCESS
> [1.265s]
> [INFO] Lucene Contrib bdb  SUCCESS
> [1.332s]
> [INFO] Lucene Contrib bdb-je . SUCCESS
> [1.321s]
> [INFO] Lucene Database aggregator POM  SUCCESS
> [0.242s]
> [INFO] Lucene Demo ... SUCCESS
> [1.813s]
> [INFO] Lucene Memory . SUCCESS
> [2.412s]
> [INFO] Lucene Queries  SUCCESS
> [2.275s]
> [INFO] Lucene Highlighter  SUCCESS
> [2.985s]
> [INFO] Lucene InstantiatedIndex .. SUCCESS
> [2.170s]
> [INFO] Lucene Lucli .. SUCCESS
> [1.814s]
> [INFO] Lucene Miscellaneous .. SUCCESS
> [1.998s]
> [INFO] Lucene Query Parser ... SUCCESS
> [2.755s]
> [INFO] Lucene Spatial  SUCCESS
> [1.314s]
> [INFO] Lucene Spellchecker ... SUCCESS
> [1.535s]
> [INFO] Lucene Swing .. SUCCESS
> [1.233s]
> [INFO] Lucene Wordnet  SUCCESS
> [1.309s]
> [INFO] Lucene XML Query Parser ... SUCCESS
> [1.483s]
> [INFO] Lucene Contrib aggregator POM . SUCCESS
> [0.151s]
> [INFO] Lucene ICU Analysis Components  SUCCESS
> [2.728s]
> [INFO] Lucene Phonetic Filters ... SUCCESS
> [1.765s]
> [INFO] Lucene Smart Chinese Analyzer . SUCCESS
> [3.709s]
> [INFO] Lucene Stempel Analyzer ... SUCCESS
> [4.241s]
> [INFO] Lucene Analysis Modules aggregator POM  SUCCESS
> [0.213s]
> [INFO] Lucene Benchmark .. SUCCESS
> [2.926s]
> [INFO] Lucene Modules aggregator POM . SUCCESS
> [0.307s]
> [INFO] Apache Solr parent POM  SUCCESS
> [0.233s]
> [INFO] Apache Solr Solrj . SUCCESS
> [3.780s]
> [INFO] Apache Solr Core .. SUCCESS
> [9.693s]
> [INFO] Apache Solr Search Server . SUCCESS
> [6.739s]
> [INFO] Apache Solr Test Framework  SUCCESS
> [2.699s]
> [INFO] Apache Solr Analysis Extras ... SUCCESS
> [3.868s]
> [INFO] Apache Solr Clustering  SUCCESS
> [6.736s]
> [INFO] Apache Solr DataImportHandler . SUCCESS
> [4.914s]
> [INFO] Apache Solr DataImportHandler Extras .. SUCCESS
> [2.721s]
> [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS
> [0.253s]
> [INFO] Apache Solr Content Extraction Library  SUCCESS
> [1.909s]
> [INFO] Apache Solr - UIMA integration  SUCCESS
> [1.922s]
> [INFO] Apache Solr Contrib aggregator POM  SUCCESS
> [0.211s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 2:18.040s
> [INFO] Finished at: Thu May 05 20:39:09 CEST 2011
> [INFO] Final Memory: 38M/90M
> [INFO]
> 
> 
> On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe  wrote:
> 
> > Hi Gabriele,
> >
> > On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
> > > Okay, that sequence worked, but then shouldn't I be able to do $ mvn
> > > install afterwards? This is what I get:
> > ...
> > > COMPILATION ERROR :
> > > -
> > > org/apache/solr/spelling/suggest/fst/InputStreamDataInput.java:[7,27]
> > > package com.google.common.io does not exist
> > > org/apache/solr/spelling/suggest/fst/FSTLookup.java:[28,32] package
> > > com.google.common.collect does not exist
> > ...
> >
> > "mvn install" should work, but it doesn't - I can reproduce this error
> on
> > my machine.  This is a bug in the Maven build.
> >
> > The nightly Lucene/Sol

Re: SpellCheckComponent issue

2011-05-05 Thread Em
Hi Sid,

unfortunately not and as far as I know it is not possible to realize your
requirements with Solr's SpellCheck-Packages (I talk about V. 1.4, since
there are some changes in 3.1).

Regards,
Em

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-issue-tp2903926p2904839.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Rohit
Hi,

I am new to solr and this is my first attempt at indexing solr data, I am
getting the following exception while indexing,

org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at
org.apache.solr.schema.DateField.parseMath(DateField.java:165) at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)

I understand from reading some articles that Solr stores time only in UTC,
this is the query i am trying to index,

Select id,text,'language',links,tweetType,source,location,
bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim
e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla
ssified,locationDetail,
geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m
entions,senderInfScr,
createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'),
'%Y-%m-%d') as
IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d')
as
ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d')
as
EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d')
as MET,sign(classified) as sentiment from

Why i am doing this timezone conversion is because i need to group results
by the user timezone. How can i achieve this?

Regards, Rohit

 



Re: How do i I modify XMLWriter to write foobar?

2011-05-05 Thread Gabriele Kahlout
I've now tried to write my own QueryResponseWriter plugin[1], as a maven
project depending on Solr Core 3.1, which is the same version of Solr I've
installed. It seems I'm not able to get rid of some cache.


$ xmlstarlet sel -t -c "/config/queryResponseWriter" conf/solrconfig.xml



Restarted tomcat after changing solrconfig.xml and placing indexplugins.jar
in $SOLR_HOME/
At tomcat boot:
INFO: Adding 'file:/Users/simpatico/SOLR_HOME/lib/IndexPlugins.jar' to
classloader

I get legacy code of the plugin for both, and I don't understand why. At
least the xml should be different. Why could this be? How to find out?
http://localhost:8080/solr/select?q=apache&wt=Test and
http://localhost:8080/solr/select?q=apache&wt=xml
XML Parsing Error: syntax error
Location: http://localhost:8080/solr/select?q=apache&wt=xml (//Test
Line Number 1, Column 1:
foobarresponseHeaderstatusQTimeparamsqapachewtxmlresponse00foobar
^

It seems the new code for TestQueryResponseWriter[1] seems to never be
executed since i added a severe log statement that doesn't appear in tomcat
logs. Where are those caches?

Thank you in advance.

[1]
package com.mysimpatico.me.indexplugins;

import java.io.*;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.solr.request.XMLResponseWriter;


/**
 * Hello world!
 *
 */
public class TestQueryResponseWriter extends XMLResponseWriter{

@Override
public void write(Writer writer,
org.apache.solr.request.SolrQueryRequest request,
org.apache.solr.response.SolrQueryResponse response) throws IOException {

Logger.getLogger(TestQueryResponseWriter.class.getName()).log(Level.SEVERE,
"Hello from TestQueryResponseWriter");
super.write(writer, request, response);
}
}


On Thu, May 5, 2011 at 9:01 PM, Chris Hostetter wrote:

>
> : $ xmlstarlet sel -t -c "/config/queryResponseWriter" conf/solrconfig.xml
> : 
> :
> : Now I comment the line in Solrconfix.xml, and there's no more writer.
> : $ xmlstarlet sel -t -c "/config/queryResponseWriter" conf/solrconfig.xml
> :
> : I make a query, and the XMLResponseWriter is still in charge.
> : *$ curl -L http://localhost:8080/solr/select?q=apache*
> : 
>
> ...
>
> Your example request is not specifying a "wt" param.
>
> in addition to the response writers declared in your solrconfig.xml, there
> are response writers that exist implicitly unless you define your own
> instances that override those names (xml, json, python, etc...)
>
> the real question is: what writer do you *want* to have used when no wt is
> specified?
>
> whatever the answer is: declare n instance of that writer with
> default="true" in your solrconfig.xml
>
>
> -Hoss
>



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).


Re: Is it possible to build Solr as a maven project?

2011-05-05 Thread Gabriele Kahlout
Just for the reference.

$ svn update
At revision 1099940.

On Thu, May 5, 2011 at 9:14 PM, Steven A Rowe  wrote:

> You're welcome, I'm glad you got it to work. - Steve
>
> > -Original Message-
> > From: Gabriele Kahlout [mailto:gabri...@mysimpatico.com]
> > Sent: Thursday, May 05, 2011 2:41 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Is it possible to build Solr as a maven project?
> >
> > Steven, thank you!
> >
> > $ mvn -DskipTests=true install
> > works!
> >
> > [INFO] Reactor Summary:
> > [INFO]
> > [INFO] Grandparent POM for Apache Lucene Java and Apache Solr  SUCCESS
> > [13.142s]
> > [INFO] Lucene parent POM . SUCCESS
> > [0.345s]
> > [INFO] Lucene Core ... SUCCESS
> > [18.448s]
> > [INFO] Lucene Test Framework . SUCCESS
> > [3.560s]
> > [INFO] Lucene Common Analyzers ... SUCCESS
> > [7.739s]
> > [INFO] Lucene Contrib Ant  SUCCESS
> > [1.265s]
> > [INFO] Lucene Contrib bdb  SUCCESS
> > [1.332s]
> > [INFO] Lucene Contrib bdb-je . SUCCESS
> > [1.321s]
> > [INFO] Lucene Database aggregator POM  SUCCESS
> > [0.242s]
> > [INFO] Lucene Demo ... SUCCESS
> > [1.813s]
> > [INFO] Lucene Memory . SUCCESS
> > [2.412s]
> > [INFO] Lucene Queries  SUCCESS
> > [2.275s]
> > [INFO] Lucene Highlighter  SUCCESS
> > [2.985s]
> > [INFO] Lucene InstantiatedIndex .. SUCCESS
> > [2.170s]
> > [INFO] Lucene Lucli .. SUCCESS
> > [1.814s]
> > [INFO] Lucene Miscellaneous .. SUCCESS
> > [1.998s]
> > [INFO] Lucene Query Parser ... SUCCESS
> > [2.755s]
> > [INFO] Lucene Spatial  SUCCESS
> > [1.314s]
> > [INFO] Lucene Spellchecker ... SUCCESS
> > [1.535s]
> > [INFO] Lucene Swing .. SUCCESS
> > [1.233s]
> > [INFO] Lucene Wordnet  SUCCESS
> > [1.309s]
> > [INFO] Lucene XML Query Parser ... SUCCESS
> > [1.483s]
> > [INFO] Lucene Contrib aggregator POM . SUCCESS
> > [0.151s]
> > [INFO] Lucene ICU Analysis Components  SUCCESS
> > [2.728s]
> > [INFO] Lucene Phonetic Filters ... SUCCESS
> > [1.765s]
> > [INFO] Lucene Smart Chinese Analyzer . SUCCESS
> > [3.709s]
> > [INFO] Lucene Stempel Analyzer ... SUCCESS
> > [4.241s]
> > [INFO] Lucene Analysis Modules aggregator POM  SUCCESS
> > [0.213s]
> > [INFO] Lucene Benchmark .. SUCCESS
> > [2.926s]
> > [INFO] Lucene Modules aggregator POM . SUCCESS
> > [0.307s]
> > [INFO] Apache Solr parent POM  SUCCESS
> > [0.233s]
> > [INFO] Apache Solr Solrj . SUCCESS
> > [3.780s]
> > [INFO] Apache Solr Core .. SUCCESS
> > [9.693s]
> > [INFO] Apache Solr Search Server . SUCCESS
> > [6.739s]
> > [INFO] Apache Solr Test Framework  SUCCESS
> > [2.699s]
> > [INFO] Apache Solr Analysis Extras ... SUCCESS
> > [3.868s]
> > [INFO] Apache Solr Clustering  SUCCESS
> > [6.736s]
> > [INFO] Apache Solr DataImportHandler . SUCCESS
> > [4.914s]
> > [INFO] Apache Solr DataImportHandler Extras .. SUCCESS
> > [2.721s]
> > [INFO] Apache Solr DataImportHandler aggregator POM .. SUCCESS
> > [0.253s]
> > [INFO] Apache Solr Content Extraction Library  SUCCESS
> > [1.909s]
> > [INFO] Apache Solr - UIMA integration  SUCCESS
> > [1.922s]
> > [INFO] Apache Solr Contrib aggregator POM  SUCCESS
> > [0.211s]
> > [INFO]
> > 
> > [INFO] BUILD SUCCESS
> > [INFO]
> > 
> > [INFO] Total time: 2:18.040s
> > [INFO] Finished at: Thu May 05 20:39:09 CEST 2011
> > [INFO] Final Memory: 38M/90M
> > [INFO]
> > 
> >
> > On Thu, May 5, 2011 at 6:53 PM, Steven A Rowe  wrote:
> >
> > > Hi Gabriele,
> > >
> > > On 5/5/2011 at 9:57 AM, Gabriele Kahlout wrote:
> > > > Okay, that sequence worked, but then shouldn't I be able to do $ mvn
> > > > install afterwards? This is what I get:
> > > ...
> > > > COMPILATION ERROR :
> > > > -
> > > > org/apache/solr/spelling/suggest/fst/InputStreamDataInput.j

Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
Hi,

Sorry for the possible double post, I wrote this up but had the
incorrect sender address, so I am guessing that my previous one is going
to be rejected by the list moderation daemon.

I am trying to figure out options for the following problem. I am on
Solr 1.4.1 (Lucene 2.9.1).

I have search results which are going to be ranked by the user (using a
thumbs up/down) and would translate to a score between -1 and +1. 

This data is stored in a database table (
unique_id
thumbs_up
thumbs_down
num_calls

as the thumbs up/down component is clicked.

We want to be able to sort the results by the following score =
(thumbs_up - thumbs_down) / (num_calls). The unique_id field refers to
the one referenced as  in the schema.xml.

Based on the following conversation:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html 

...my understanding is that I need to:

1) subclass FieldType to create my own RankFieldType. 
2) In this class I override the getSortField() method to return my
custom FieldSortComparatorSource object.
3) Build the custom FieldSortComparatorSource object which returns a
custom FieldSortComparator object in newComparator().
4) Configure the field type of class RankFieldType (rank_t), and a field
(called rank) of field type rank_t in schema.xml of type RankFieldType.
5) use sort=rank+desc to do the sort.

My question is: is there a simpler/more performant way? The number of
database lookups seems like its going to be pretty high with this
approach. And its hard to believe that my problem is new, so I am
guessing this is either part of some Solr configuration I am missing, or
there is some other (possibly simpler) approach I am overlooking.

Pointers to documentation or code (or even keywords I could google)
would be much appreciated.

TIA for all your help,

Sujit




Re: Custom sorting based on external (database) data

2011-05-05 Thread Ahmet Arslan


--- On Thu, 5/5/11, Sujit Pal  wrote:

> From: Sujit Pal 
> Subject: Custom sorting based on external (database) data
> To: "solr-user" 
> Date: Thursday, May 5, 2011, 11:03 PM
> Hi,
> 
> Sorry for the possible double post, I wrote this up but had
> the
> incorrect sender address, so I am guessing that my previous
> one is going
> to be rejected by the list moderation daemon.
> 
> I am trying to figure out options for the following
> problem. I am on
> Solr 1.4.1 (Lucene 2.9.1).
> 
> I have search results which are going to be ranked by the
> user (using a
> thumbs up/down) and would translate to a score between -1
> and +1. 
> 
> This data is stored in a database table (
> unique_id
> thumbs_up
> thumbs_down
> num_calls
> 
> as the thumbs up/down component is clicked.
> 
> We want to be able to sort the results by the following
> score =
> (thumbs_up - thumbs_down) / (num_calls). The unique_id
> field refers to
> the one referenced as  in the schema.xml.
> 
> Based on the following conversation:
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html
> 
> 
> ...my understanding is that I need to:
> 
> 1) subclass FieldType to create my own RankFieldType. 
> 2) In this class I override the getSortField() method to
> return my
> custom FieldSortComparatorSource object.
> 3) Build the custom FieldSortComparatorSource object which
> returns a
> custom FieldSortComparator object in newComparator().
> 4) Configure the field type of class RankFieldType
> (rank_t), and a field
> (called rank) of field type rank_t in schema.xml of type
> RankFieldType.
> 5) use sort=rank+desc to do the sort.
> 
> My question is: is there a simpler/more performant way? The
> number of
> database lookups seems like its going to be pretty high
> with this
> approach. And its hard to believe that my problem is new,
> so I am
> guessing this is either part of some Solr configuration I
> am missing, or
> there is some other (possibly simpler) approach I am
> overlooking.
> 
> Pointers to documentation or code (or even keywords I could
> google)
> would be much appreciated.

Looks like it can be done with 
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html 
and 
http://wiki.apache.org/solr/FunctionQuery

You can dump your table into three text files. Issue a commit to load these 
changes.

Sort by function query is available in Solr3.1 though.


Re: Custom sorting based on external (database) data

2011-05-05 Thread Sujit Pal
Thank you Ahmet, looks like we could use this. Basically we would do
periodic dumps of the (unique_id|computed_score) sorted by score and
write it out to this file followed by a commit.

Found some more info here, for the benefit of others looking for
something similar:
http://dev.tailsweep.com/solr-external-scoring/ 

On Thu, 2011-05-05 at 13:12 -0700, Ahmet Arslan wrote:
> 
> --- On Thu, 5/5/11, Sujit Pal  wrote:
> 
> > From: Sujit Pal 
> > Subject: Custom sorting based on external (database) data
> > To: "solr-user" 
> > Date: Thursday, May 5, 2011, 11:03 PM
> > Hi,
> > 
> > Sorry for the possible double post, I wrote this up but had
> > the
> > incorrect sender address, so I am guessing that my previous
> > one is going
> > to be rejected by the list moderation daemon.
> > 
> > I am trying to figure out options for the following
> > problem. I am on
> > Solr 1.4.1 (Lucene 2.9.1).
> > 
> > I have search results which are going to be ranked by the
> > user (using a
> > thumbs up/down) and would translate to a score between -1
> > and +1. 
> > 
> > This data is stored in a database table (
> > unique_id
> > thumbs_up
> > thumbs_down
> > num_calls
> > 
> > as the thumbs up/down component is clicked.
> > 
> > We want to be able to sort the results by the following
> > score =
> > (thumbs_up - thumbs_down) / (num_calls). The unique_id
> > field refers to
> > the one referenced as  in the schema.xml.
> > 
> > Based on the following conversation:
> > http://www.mail-archive.com/solr-user@lucene.apache.org/msg06322.html
> > 
> > 
> > ...my understanding is that I need to:
> > 
> > 1) subclass FieldType to create my own RankFieldType. 
> > 2) In this class I override the getSortField() method to
> > return my
> > custom FieldSortComparatorSource object.
> > 3) Build the custom FieldSortComparatorSource object which
> > returns a
> > custom FieldSortComparator object in newComparator().
> > 4) Configure the field type of class RankFieldType
> > (rank_t), and a field
> > (called rank) of field type rank_t in schema.xml of type
> > RankFieldType.
> > 5) use sort=rank+desc to do the sort.
> > 
> > My question is: is there a simpler/more performant way? The
> > number of
> > database lookups seems like its going to be pretty high
> > with this
> > approach. And its hard to believe that my problem is new,
> > so I am
> > guessing this is either part of some Solr configuration I
> > am missing, or
> > there is some other (possibly simpler) approach I am
> > overlooking.
> > 
> > Pointers to documentation or code (or even keywords I could
> > google)
> > would be much appreciated.
> 
> Looks like it can be done with 
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>  
> and 
> http://wiki.apache.org/solr/FunctionQuery
> 
> You can dump your table into three text files. Issue a commit to load these 
> changes.
> 
> Sort by function query is available in Solr3.1 though.



force "0" results from within a search component?

2011-05-05 Thread Frederik Kraus
Hi guys,

another question on custom search components:

Is there any way to force the response to be "0 results" from within a search 
component (and break out of the component chain)?

I'm doing some checks in my first-component and in some cases would like to 
stop processing the request and just pretend, that there are 0 results ...

Thanks,

Fred. 

Re: fast case-insensitive autocomplete

2011-05-05 Thread Otis Gospodnetic
Hi,

I haven't used Suggester yet, but couldn't you feed it all lowercase content 
and 
then lowercase whatever the user is typing before sending it to Suggester to 
avoid case mismatch?

Autocomplete on http://search-lucene.com/ uses 
http://sematext.com/products/autocomplete/index.html if you want a shortcut.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Kusenda, Brandyn J" 
> To: "solr-user@lucene.apache.org" 
> Sent: Thu, May 5, 2011 9:22:03 AM
> Subject: fast case-insensitive autocomplete
> 
> Hi.
> I need an autocomplete solution to handle case-insensitive queries  but
> return the original text with the case still intact.   I've  experimented
> with both the Suggester and TermComponent methods.   TermComponent is working
> when I use the regex option, however, it is far to  slow.   I get the speed i
> want by using term.prefix for by using the  suggester but it's case
> sensitive.
> 
> Here is an example operating on a  user directory:
> 
> Query: bran
> Results: Branden Smith, Brandon Thompson,  Brandon Verner, Brandy Finny, 
> Brian 
>Smith, ...
> 
> A solution that I would  expect to work would be to store two fields; one
> containing the original text  and the other containing the lowercase.  Then
> convert the query to lower  case and run the query against the lower case
> field and return the original  (case preserved) field.
> Unfortunately, I can't get a TermComponent query to  return additional
> fields.  It only returns the field it's searching  against.  Should this work
> or can I only return additional fields for  standard queries.
> 
> Thanks in advance,
> Brandyn
> 


Re: force "0" results from within a search component?

2011-05-05 Thread Ahmet Arslan
> Is there any way to force the response to be "0 results"
> from within a search component (and break out of the
> component chain)?
> 
> I'm doing some checks in my first-component and in some
> cases would like to stop processing the request and just
> pretend, that there are 0 results ...

Yes. You can disable all underlying components by their parameters.

setParam("query","false");
setParam("facet","false");
setParam("hl","false");

etc..


Re: why query chinese character with bracket become phrase query by default?

2011-05-05 Thread cyang2010
Nice, it works like a charm.

I am using solr 1.4.1.  Here is my configuration for the chinese field:

   

   
 


   
  

   



Now when I get the expected hassle free parsing on solr side:


title_zh_CN:(我活)
title_zh_CN:(我活)
title_zh_CN:我 title_zh_CN:活
title_zh_CN:我 title_zh_CN:活



--
View this message in context: 
http://lucene.472066.n3.nabble.com/why-query-chinese-character-with-bracket-become-phrase-query-by-default-tp2901542p2905784.html
Sent from the Solr - User mailing list archive at Nabble.com.


Thoughts on Search Analytics?

2011-05-05 Thread Otis Gospodnetic
Hi,

I'd like to solicit your thoughts about Search Analytics if you are  doing any 
sort of analysis/reporting of search logs or click stream or  anything related.

* Which information or reports do you find the most useful and why?
* Which reports would you like to have, but don't have for whatever  reason 
(don't have the needed data, or it's too hard to produce such  reports, or ...)
* Which tool(s) or service(s) do you use and find the most useful?

I'm preparing a presentation on the topic of Search Analytics, so I'm  trying 
to 

solicit opinions, practices, desires, etc. on this topic.

Your thoughts would be greatly appreciated.  If you could reply  directly, that 
would be great, since this may be a bit OT for the list.

Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


Testing the limits of non-Java Solr

2011-05-05 Thread Jack Repenning
What's the probability that I can build a non-trivial Solr app without writing 
any Java?

I've been planning to use Solr, Lucene, and existing plug-ins, and sort of 
hoping not to write any Java (the app itself is Ruby / Rails). The dox (such as 
http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but my 
planning's all been "no Java."]

I'm just beginning the design work in earnest, and I suddenly notice that it 
seems every mail thread, blog, or example starts out Java-free, but somehow 
ends up involving Java code. I'm not sure I yet understand all these snippets; 
conceivably some of the Java I see could just as easily be written in another 
language, but it makes me wonder. Is it realistic to plan a sizable Solr 
application without some Java programming?

I know, I know, I know: everything depends on the details. I'd be interested 
even in anecdotes: has anyone ever achieved this before? Also, what are the 
clues I should look for that I need to step into the Java realm? I understand, 
for example, that it's possible to write filters and tokenizers to do stuff not 
available in any standard one; in this case, the clue would be "I can't find 
what I want in the standard list," I guess. Are there other things I should 
look for?

-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep











PGP.sig
Description: This is a digitally signed message part


Re: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Ahmet Arslan
> org.apache.solr.common.SolrException: Invalid Date
> String:'2011-01-07' at
> org.apache.solr.schema.DateField.parseMath(DateField.java:165)

Solr accepts date in the following format: 2011-01-07T00:00:00Z

> I understand from reading some articles that Solr stores
> time only in UTC,
> this is the query i am trying to index,

It seems that you are fetching data from a Relational Database. You may 
consider using http://wiki.apache.org/solr/DataImportHandler

> Why i am doing this timezone conversion is because i need
> to group results
> by the user timezone. How can i achieve this?

Save timezone info in a field and facet on that field?
http://wiki.apache.org/solr/SimpleFacetParameters


Re: Testing the limits of non-Java Solr

2011-05-05 Thread Otis Gospodnetic
Short answer: Yes, you can deploy a Solr cluster and write an application that 
talks to it without writing any Java (but it may be PHP or Python or unless 
that application is you typing telnet my-solr-server 8983 )

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Jack Repenning 
> To: solr-user@lucene.apache.org
> Sent: Thu, May 5, 2011 6:28:31 PM
> Subject: Testing the limits of non-Java Solr
> 
> What's the probability that I can build a non-trivial Solr app without 
> writing  
>any Java?
> 
> I've been planning to use Solr, Lucene, and existing plug-ins,  and sort of 
>hoping not to write any Java (the app itself is Ruby / Rails). The  dox (such 
>as 
>http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but  
>my 
>planning's all been "no Java."]
> 
> I'm just beginning the design work in  earnest, and I suddenly notice that it 
>seems every mail thread, blog, or example  starts out Java-free, but somehow 
>ends up involving Java code. I'm not sure I  yet understand all these 
>snippets; 
>conceivably some of the Java I see could just  as easily be written in another 
>language, but it makes me wonder. Is it  realistic to plan a sizable Solr 
>application without some Java  programming?
> 
> I know, I know, I know: everything depends on the details.  I'd be interested 
>even in anecdotes: has anyone ever achieved this before? Also,  what are the 
>clues I should look for that I need to step into the Java realm? I  
>understand, 
>for example, that it's possible to write filters and tokenizers to  do stuff 
>not 
>available in any standard one; in this case, the clue would be "I  can't find 
>what I want in the standard list," I guess. Are there other things I  should 
>look for?
> 
> -==-
> Jack Repenning
> Technologist
> Codesion  Business Unit
> CollabNet, Inc.
> 8000 Marina Boulevard, Suite  600
> Brisbane, California 94005
> office: +1 650.228.2562
> twitter: http://twitter.com/jrep
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


Re: Thoughts on Search Analytics?

2011-05-05 Thread François Schiettecatte
When I ran the search engine at Feedster, I wrote a perl script that ran 
nightly and gave me:

total number of searches
total number of searches per hour
N most frequent searches
max time for a search
min time for a search
mean time for searches
median time for searches
N slowest searches
warnings
errors

all the above per index (core in SOLR)

The script generated a text file (for me) and an Excel spreadsheet (for the 
management)

François


On May 5, 2011, at 6:25 PM, Otis Gospodnetic wrote:

> Hi,
> 
> I'd like to solicit your thoughts about Search Analytics if you are  doing 
> any 
> sort of analysis/reporting of search logs or click stream or  anything 
> related.
> 
> * Which information or reports do you find the most useful and why?
> * Which reports would you like to have, but don't have for whatever  reason 
> (don't have the needed data, or it's too hard to produce such  reports, or 
> ...)
> * Which tool(s) or service(s) do you use and find the most useful?
> 
> I'm preparing a presentation on the topic of Search Analytics, so I'm  trying 
> to 
> 
> solicit opinions, practices, desires, etc. on this topic.
> 
> Your thoughts would be greatly appreciated.  If you could reply  directly, 
> that 
> would be great, since this may be a bit OT for the list.
> 
> Thanks!
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/



RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Craig Stires

Rohit,

The solr server using TrieDateField must receive values in the format
2011-01-07T17:00:30Z

This should be a UTC-based datetime.  The offset can be applied once you get
your results back from solr
   SimpleDateFormat df =   new SimpleDateFormat(format);
   df.setTimeZone(TimeZone.getTimeZone("IST"));
   java.util.Date dateunix = df.parse(datetime);


-Craig


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Friday, 6 May 2011 2:31 AM
To: solr-user@lucene.apache.org
Subject: Solr: org.apache.solr.common.SolrException: Invalid Date String:

Hi,

I am new to solr and this is my first attempt at indexing solr data, I am
getting the following exception while indexing,

org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at
org.apache.solr.schema.DateField.parseMath(DateField.java:165) at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)

I understand from reading some articles that Solr stores time only in UTC,
this is the query i am trying to index,

Select id,text,'language',links,tweetType,source,location,
bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim
e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla
ssified,locationDetail,
geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m
entions,senderInfScr,
createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'),
'%Y-%m-%d') as
IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d')
as
ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d')
as
EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d')
as MET,sign(classified) as sentiment from

Why i am doing this timezone conversion is because i need to group results
by the user timezone. How can i achieve this?

Regards, Rohit

 




Re: Solr Terms and Date field issues

2011-05-05 Thread Erick Erickson
H, this is puzzling. If you could come up with a couple of xml
files and a schema
that illustrate this, I'll see what I can see...

Thanks,
Erick

On Wed, May 4, 2011 at 7:05 PM, Viswa S  wrote:
>
> Erik,
>
> I suspected the same, and setup a test instance to reproduce this. The date 
> field I used is setup to capture indexing time, in other words the schema has 
> a default value of "NOW". However, I have reproduced this issue with fields 
> which do no have defaults too.
>
> On the second one, I did a delete->commit (with expungeDeletes=true) and then 
> a optimize. All other fields show updated terms except the date fields. I 
> have also double checked to see if the Luke handler has any different terms, 
> and it did not.
>
>
> Thanks
> Viswa
>
>
>> Date: Wed, 4 May 2011 08:17:39 -0400
>> Subject: Re: Solr Terms and Date field issues
>> From: erickerick...@gmail.com
>> To: solr-user@lucene.apache.org
>>
>> Hmmm, this *looks* like you've changed your schema without
>> re-indexing all your data so you're getting old (string?) values in
>> that field, but that's just a guess. If this is really happening on a
>> clean index it's a problem.
>>
>> I'm also going to guess that you're not really deleting the documents
>> you think. Are you committing after the deletes?
>>
>> Best
>> Erick
>>
>> On Wed, May 4, 2011 at 2:18 AM, Viswa S  wrote:
>> >
>> > Hello,
>> >
>> > The terms query for a date field seems to get populated with some weird 
>> > dates, many of these dates (1970,2009,2011-04-23) are not present in the 
>> > indexed data.  Please see sample data below
>> >
>> > I also notice that a delete and optimize does not remove the relevant 
>> > terms for date fields, the string fields seems work fine.
>> >
>> > Thanks
>> > Viswa
>> >
>> > Results from Terms component:
>> >
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 3479
>> >
>> > 265
>> >
>> >
>> > Result from facet component, rounded by seconds.:
>> >
>> > 
>> > 1
>> >
>> > 1148
>> >
>> > 2333
>> >
>> > +1SECOND
>> >
>> > 2011-05-03T06:14:14Z
>> >
>> > 2011-05-04T06:14:14Z
>> >
>


Re: Is it possible to use sub-fields or multivalued fields for boosting?

2011-05-05 Thread Erick Erickson
For a truly universal field, I'm not at all sure how you'd proceed. But if you
know what your sub-fields are in advance, have you considered just making
them regular fields and them throwing (d)dismax at it?

Best
Erick

On Wed, May 4, 2011 at 11:51 PM, deniz  wrote:
> okay... let me make the situation more clear... I am trying to create an
> universal field which includes information about users like firstname,
> surname, gender, location etc. When I enter something e.g London, I would
> like to match any users having 'London' in any field firstname, surname or
> location. But if it matches name or surname, I would like to give a higher
> weight.
>
> so my question is... is it possible to have sub-fields? like
> 
>   blabla
>   blabla
>   blabla
>   blabla
> 
>
> or any other ideas for implementing such feature?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Field names with a period (.)

2011-05-05 Thread Erick Erickson
I remember the same, except I think I've seen the recommendation that you
make all the letters lower-case. As I remember, there are some interesting
edge cases that you might run into later with upper case.

But I can't remember the specifics either

Erick

On Thu, May 5, 2011 at 10:08 AM, Leonardo Souza  wrote:
> Thanks Gora!
>
> [ ]'s
> Leonardo da S. Souza
>  °v°   Linux user #375225
>  /(_)\   http://counter.li.org/
>  ^ ^
>
>
>
> On Thu, May 5, 2011 at 3:09 AM, Gora Mohanty  wrote:
>
>> On Thu, May 5, 2011 at 5:08 AM, Leonardo Souza 
>> wrote:
>> > Hi guys,
>> >
>> > Can i have a field name with a period(.) ?
>> > Like in *file.size*
>>
>> Cannot find now where this is documented, but from what I remember it is
>> recommended to use only characters A-Z, a-z, 0-9, and underscore (_) in
>> field names, and some special characters are known to cause problems.
>>
>> Regards,
>> Gora
>>
>


Solr 3.1 returning entire highlighted field

2011-05-05 Thread Jake Brownell
Hi,

After upgrading from Solr 1.4.0 to 3.1, are highlighting has gone from 
highlighting short pieces of text to displaying what appears to be the entire 
contents of the highlighted field. 

The request using solrj is setting the following:

params.setHighlight(true);
params.setHighlightSnippets(3);
params.set("hl.fl", "content_highlight");

From solrconfig


  

  dismax
  
  regex


  spellcheck


   
   
   

 100

   

   
   

  
  70
  
  0.5 
  
  [-\w ,/\n\"']{20,200}

   
   
   
   

 
 

   
  


From schema












Any pointers anybody can provide would be greatly appreciated.

Jake


RE: Solr Terms and Date field issues

2011-05-05 Thread Viswa S

Please find attached the schema and some test data (test.xml).

Thanks for looking this.
Viswa


> Date: Thu, 5 May 2011 19:08:31 -0400
> Subject: Re: Solr Terms and Date field issues
> From: erickerick...@gmail.com
> To: solr-user@lucene.apache.org
> 
> H, this is puzzling. If you could come up with a couple of xml
> files and a schema
> that illustrate this, I'll see what I can see...
> 
> Thanks,
> Erick
> 
> On Wed, May 4, 2011 at 7:05 PM, Viswa S  wrote:
> >
> > Erik,
> >
> > I suspected the same, and setup a test instance to reproduce this. The date 
> > field I used is setup to capture indexing time, in other words the schema 
> > has a default value of "NOW". However, I have reproduced this issue with 
> > fields which do no have defaults too.
> >
> > On the second one, I did a delete->commit (with expungeDeletes=true) and 
> > then a optimize. All other fields show updated terms except the date 
> > fields. I have also double checked to see if the Luke handler has any 
> > different terms, and it did not.
> >
> >
> > Thanks
> > Viswa
> >
> >
> >> Date: Wed, 4 May 2011 08:17:39 -0400
> >> Subject: Re: Solr Terms and Date field issues
> >> From: erickerick...@gmail.com
> >> To: solr-user@lucene.apache.org
> >>
> >> Hmmm, this *looks* like you've changed your schema without
> >> re-indexing all your data so you're getting old (string?) values in
> >> that field, but that's just a guess. If this is really happening on a
> >> clean index it's a problem.
> >>
> >> I'm also going to guess that you're not really deleting the documents
> >> you think. Are you committing after the deletes?
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, May 4, 2011 at 2:18 AM, Viswa S  wrote:
> >> >
> >> > Hello,
> >> >
> >> > The terms query for a date field seems to get populated with some weird 
> >> > dates, many of these dates (1970,2009,2011-04-23) are not present in the 
> >> > indexed data.  Please see sample data below
> >> >
> >> > I also notice that a delete and optimize does not remove the relevant 
> >> > terms for date fields, the string fields seems work fine.
> >> >
> >> > Thanks
> >> > Viswa
> >> >
> >> > Results from Terms component:
> >> >
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 265
> >> >
> >> >
> >> > Result from facet component, rounded by seconds.:
> >> >
> >> > 
> >> > 1
> >> >
> >> > 1148
> >> >
> >> > 2333
> >> >
> >> > +1SECOND
> >> >
> >> > 2011-05-03T06:14:14Z
> >> >
> >> > 2011-05-04T06:14:14Z
> >> >
> >
  


	I suspected the same, and setup a test instance to reproduce this


	The date field I used is setup to capture indexing time, in other words the schema has a default value of NOW


	However, I have reproduced this issue with fields which do not have defaults too.


	 Lorem Ipsum is simply dummy text of the printing and typesetting industry


	Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.









  

  





















































  

  




  







  
  






  





  








  





  




  
  





  







  








  



  


  



  



  




  


  



 
 

 


 
   




	


   
   
   
   
   
   
   
   
   

   
   
   
   
   
   

   

   
   

   


   
   
 

 
 id

 
 fullTextLog

 
 

  


   
	   
   

   
   
 

 
 
 
 





RE: Solr Terms and Date field issues

2011-05-05 Thread Ahmet Arslan


It is okey to see weird things in admin/schema.jsp or terms component with trie 
based types. Please see http://search-lucene.com/m/WEfSI1Yi4562/

If you really need terms component, consider using copyField (tdate to string 
type)



 
Please find attached the schema and some test data (test.xml).

Thanks for looking this.
Viswa


> Date: Thu, 5 May 2011 19:08:31 -0400
> Subject: Re: Solr Terms and Date field issues
> From: erickerick...@gmail.com
> To: solr-user@lucene.apache.org
> 
> H, this is puzzling. If you could come up with a couple of xml
> files and a schema
> that illustrate this, I'll see what I can see...
> 
> Thanks,
> Erick
> 
> On Wed, May 4, 2011 at 7:05 PM, Viswa S  wrote:
> >
> > Erik,
> >
> > I suspected the same, and setup a test instance to reproduce this. The date 
> > field I used is setup to capture indexing time, in other words the schema 
> > has a default value of "NOW". However, I have reproduced this issue with 
> > fields which do no have defaults too.
> >
> > On the second one, I did a delete->commit (with expungeDeletes=true) and 
> > then a optimize. All other fields show updated terms except the date 
> > fields. I have also double checked to see if the Luke handler has any 
> > different terms, and it did not.
> >
> >
> > Thanks
> > Viswa
> >
> >
> >> Date: Wed, 4 May 2011 08:17:39 -0400
> >> Subject: Re: Solr Terms and Date field issues
> >> From: erickerick...@gmail.com
> >> To: solr-user@lucene.apache.org
> >>
> >> Hmmm, this *looks* like you've changed your schema without
> >> re-indexing all your data so you're getting old (string?) values in
> >> that field, but that's just a guess. If this is really happening on a
> >> clean index it's a problem.
> >>
> >> I'm also going to guess that you're not really deleting the documents
> >> you think. Are you committing after the deletes?
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, May 4, 2011 at 2:18 AM, Viswa S  wrote:
> >> >
> >> > Hello,
> >> >
> >> > The terms query for a date field seems to get populated with some weird 
> >> > dates, many of these dates (1970,2009,2011-04-23) are not present in the 
> >> > indexed data.  Please see sample data below
> >> >
> >> > I also notice that a delete and optimize does not remove the relevant 
> >> > terms for date fields, the string fields seems work fine.
> >> >
> >> > Thanks
> >> > Viswa
> >> >
> >> > Results from Terms component:
> >> >
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 3479
> >> >
> >> > 265
> >> >
> >> >
> >> > Result from facet component, rounded by seconds.:
> >> >
> >> > 
> >> > 1
> >> >
> >> > 1148
> >> >
> >> > 2333
> >> >
> >> > +1SECOND
> >> >
> >> > 2011-05-03T06:14:14Z
> >> >
> >> > 2011-05-04T06:14:14Z
> >> >
> >
  



Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Shawn Heisey
I am running into this problem as well, but only sporadically, and only 
in my 3.1 test environment, not 1.4.1 production.  I may have narrowed 
things down, I am interested now in learning whether this is a problem 
with the MySQL connector or DIH.



On 4/21/2011 6:09 PM, Scott Bigelow wrote:

Thanks for the e-mail. I probably should have provided more details,
but I was more interested in making sure I was approaching the problem
correctly (using DIH, with one big SELECT statement for millions of
rows) instead of solving this specific problem. Here's a partial
stacktrace from this specific problem:

...
Caused by: java.io.EOFException: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was
unexpectedly lost.
 at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
 at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
 ... 22 more
Apr 21, 2011 3:53:28 AM
org.apache.solr.handler.dataimport.EntityProcessorBase getNext
SEVERE: getNext() failed for query 'REDACTED'
org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
Communications link failure

The last packet successfully received from the server was 128
milliseconds ago.  The last packet sent successfully to the server was
25,273,484 milliseconds ago.
...




Re: Indexing 20M documents from MySQL with DIH

2011-05-05 Thread Scott Bigelow
Alex, thanks for your response. I suspect you're right about
autoCommit; i ended up solving the problem by merely moving the entire
Solr install, untouched, to a significantly larger instance (EC2
m1.small to m1.large). I think it is appropriately sized now for the
quantity and intensity of queries that will be thrown at it when it
enters production, so I never bothered to get it working on the
smaller instance.

Your  examples are interesting, I wonder if you could create
some count table to make up for MySQL's lack of row generator. Either
way, it seems like paging through results would be a must-have for any
enterprise-level indexer, and I'm surprised to find it missing in
Solr.

When relying on the delta import mechanism for updates, it's not like
one would need the consistency of pulling the entire record set as a
single, isolated query, since the delta import is designed to fetch
new documents and merge them in to a slightly out-of-date/inconsistent
index.


On Thu, May 5, 2011 at 12:10 PM, Alexey Serba  wrote:
> {quote}
> ...
> Caused by: java.io.EOFException: Can not read response from server.
> Expected to read 4 bytes, read 0 bytes before connection was
> unexpectedly lost.
>       at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:2539)
>       at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:2989)
>       ... 22 more
> Apr 21, 2011 3:53:28 AM
> org.apache.solr.handler.dataimport.EntityProcessorBase getNext
> SEVERE: getNext() failed for query 'REDACTED'
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> com.mysql.jdbc.exceptions.jdbc4.CommunicationsException:
> Communications link failure
>
> The last packet successfully received from the server was 128
> milliseconds ago.  The last packet sent successfully to the server was
> 25,273,484 milliseconds ago.
> ...
> {quote}
>
> It could probably be because of autocommit / segment merging. You
> could try to disable autocommit / increase mergeFactor
>
> {quote}
> I've used sphinx in the past, which uses multiple queries to pull out
> a subset of records ranged based on PrimaryKey, does Solr offer
> functionality similar to this? It seems that once a Solr index gets to
> a certain size, the indexing of a batch takes longer than MySQL's
> net_write_timeout, so it kills the connection.
> {quote}
>
> I was thinking about some hackish solution to paginate results
> 
>  
>  
> 
> Or something along those lines ( you'd need to to calculate offset in
> pages query )
>
> But unfortunately MySQL does not provide generate_series function
> (it's postgres function and there'r similar solutions for oracle and
> mssql).
>
>
> On Mon, Apr 25, 2011 at 3:59 AM, Scott Bigelow  wrote:
>> Thank you everyone for your help. I ended up getting the index to work
>> using the exact same config file on a (substantially) larger instance.
>>
>> On Fri, Apr 22, 2011 at 5:46 AM, Erick Erickson  
>> wrote:
>>> {{{A custom indexer, so that's a fairly common practice? So when you are
>>> dealing with these large indexes, do you try not to fully rebuild them
>>> when you can? It's not a nightly thing, but something to do in case of
>>> a disaster? Is there a difference in the performance of an index that
>>> was built all at once vs. one that has had delta inserts and updates
>>> applied over a period of months?}}}
>>>
>>> Is it a common practice? Like all of this, "it depends". It's certainly
>>> easier to let DIH do the work. Sometimes DIH doesn't have all the
>>> capabilities necessary. Or as Chris said, in the case where you already
>>> have a system built up and it's easier to just grab the output from
>>> that and send it to Solr, perhaps with SolrJ and not use DIH. Some people
>>> are just more comfortable with their own code...
>>>
>>> "Do you try not to fully rebuild". It depends on how painful a full rebuild
>>> is. Some people just like the simplicity of starting over every 
>>> day/week/month.
>>> But you *have* to be able to rebuild your index in case of disaster, and
>>> a periodic full rebuild certainly keeps that process up to date.
>>>
>>> "Is there a difference...delta inserts...updates...applied over months". Not
>>> if you do an optimize. When a document is deleted (or updated), it's only
>>> marked as deleted. The associated data is still in the index. Optimize will
>>> reclaim that space and compact the segments, perhaps down to one.
>>> But there's no real operational difference between a newly-rebuilt index
>>> and one that's been optimized. If you don't delete/update, there's not
>>> much reason to optimize either
>>>
>>> I'll leave the DIH to others..
>>>
>>> Best
>>> Erick
>>>
>>> On Thu, Apr 21, 2011 at 8:09 PM, Scott Bigelow  wrote:
 Thanks for the e-mail. I probably should have provided more details,
 but I was more interested in making sure I was approaching the problem
 correctly (using DIH, with one big SELECT statement for millions of
 rows) instead of solving this specific problem. Here's a partial
>>

Re: Testing the limits of non-Java Solr

2011-05-05 Thread William Bell
Yeah you don't need Java to use Solr. PHP, Curl, Python, HTTP Request
APIs all work fine.

The purpose of Solr is to wrap Lucene into a REST-like API that anyone
can call using HTTP.



On Thu, May 5, 2011 at 4:35 PM, Otis Gospodnetic
 wrote:
> Short answer: Yes, you can deploy a Solr cluster and write an application that
> talks to it without writing any Java (but it may be PHP or Python or 
> unless
> that application is you typing telnet my-solr-server 8983 )
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: Jack Repenning 
>> To: solr-user@lucene.apache.org
>> Sent: Thu, May 5, 2011 6:28:31 PM
>> Subject: Testing the limits of non-Java Solr
>>
>> What's the probability that I can build a non-trivial Solr app without 
>> writing
>>any Java?
>>
>> I've been planning to use Solr, Lucene, and existing plug-ins,  and sort of
>>hoping not to write any Java (the app itself is Ruby / Rails). The  dox (such 
>>as
>>http://wiki.apache.org/solr/FAQ) seem encouraging. [I *can* write Java, but  
>>my
>>planning's all been "no Java."]
>>
>> I'm just beginning the design work in  earnest, and I suddenly notice that it
>>seems every mail thread, blog, or example  starts out Java-free, but somehow
>>ends up involving Java code. I'm not sure I  yet understand all these 
>>snippets;
>>conceivably some of the Java I see could just  as easily be written in another
>>language, but it makes me wonder. Is it  realistic to plan a sizable Solr
>>application without some Java  programming?
>>
>> I know, I know, I know: everything depends on the details.  I'd be interested
>>even in anecdotes: has anyone ever achieved this before? Also,  what are the
>>clues I should look for that I need to step into the Java realm? I  
>>understand,
>>for example, that it's possible to write filters and tokenizers to  do stuff 
>>not
>>available in any standard one; in this case, the clue would be "I  can't find
>>what I want in the standard list," I guess. Are there other things I  should
>>look for?
>>
>> -==-
>> Jack Repenning
>> Technologist
>> Codesion  Business Unit
>> CollabNet, Inc.
>> 8000 Marina Boulevard, Suite  600
>> Brisbane, California 94005
>> office: +1 650.228.2562
>> twitter: http://twitter.com/jrep
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>


Re: fast case-insensitive autocomplete

2011-05-05 Thread William Bell
Are you giving that solution away? What is the costs? etc!!



On Thu, May 5, 2011 at 2:58 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> I haven't used Suggester yet, but couldn't you feed it all lowercase content 
> and
> then lowercase whatever the user is typing before sending it to Suggester to
> avoid case mismatch?
>
> Autocomplete on http://search-lucene.com/ uses
> http://sematext.com/products/autocomplete/index.html if you want a shortcut.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: "Kusenda, Brandyn J" 
>> To: "solr-user@lucene.apache.org" 
>> Sent: Thu, May 5, 2011 9:22:03 AM
>> Subject: fast case-insensitive autocomplete
>>
>> Hi.
>> I need an autocomplete solution to handle case-insensitive queries  but
>> return the original text with the case still intact.   I've  experimented
>> with both the Suggester and TermComponent methods.   TermComponent is working
>> when I use the regex option, however, it is far to  slow.   I get the speed i
>> want by using term.prefix for by using the  suggester but it's case
>> sensitive.
>>
>> Here is an example operating on a  user directory:
>>
>> Query: bran
>> Results: Branden Smith, Brandon Thompson,  Brandon Verner, Brandy Finny, 
>> Brian
>>Smith, ...
>>
>> A solution that I would  expect to work would be to store two fields; one
>> containing the original text  and the other containing the lowercase.  Then
>> convert the query to lower  case and run the query against the lower case
>> field and return the original  (case preserved) field.
>> Unfortunately, I can't get a TermComponent query to  return additional
>> fields.  It only returns the field it's searching  against.  Should this work
>> or can I only return additional fields for  standard queries.
>>
>> Thanks in advance,
>> Brandyn
>>
>


Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread William Bell
Is there a parser that can take a string and tell you what part is an
address, and what is not?

Split the field into 2 fields?

Search: Dr. Bell in Denver, CO
Search: Dr. Smith near 10722 Main St, Denver, CO
Search: Denver, CO for Cardiologist

Thoughts?

2011/5/5 François Schiettecatte :
> Rajani
>
> You might also want to look at Balie ( http://balie.sourceforge.net/ ), from 
> the web site:
>
> Features:
>
>        • language identification
>        • tokenization
>        • sentence boundary detection
>        • named-entity recognition
>
>
> Can't vouch for it though.
>
>
>
>
> On May 5, 2011, at 4:58 AM, Jan Høydahl wrote:
>
>> Hi,
>>
>> Solr does not have lemmatization out of the box.
>>
>> You'll have to find 3rd party analyzers, and the most known such is from 
>> BasisTech. Please contact them to learn more.
>>
>> I'm not aware of any open source lemmatizers for Solr.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> On 5. mai 2011, at 10.34, rajini maski wrote:
>>
>>> Does the solr enable lemmatization concept?
>>>
>>>
>>>
>>>  I found a documentation that gives an information as solr enables
>>> lemmatization concept. Here is the link :
>>> http://www.basistech.com/knowledge-center/search/2010-09-language-identification-language-support-and-entity-extraction.pdf
>>>
>>> Can anyone help me finding the jar specified in that document so that i can
>>> add it as plugin.
>>> jar :rlp.solr.RLPTokenizerFactory
>>>
>>>
>>> Thanks and Regards,
>>> Rajani Maski
>>
>
>


DIH disconnecting long-lived MySQL connections

2011-05-05 Thread Shawn Heisey
I am using DIH with the MySQL connector to import data into my index.  
When doing a full import in my 3.1 test environment, it sometimes loses 
connection with the database and ends up rolling back the import.  My 
import configuration uses a single query, so there's no possibility of a 
reconnect fixing this.  Visit http://pastebin.com/Ya9DBMEP for the error 
log.  I'm using mysql-connector-java-5.1.15-bin.jar.


It seems that this occurs when Solr is busy doing multiple segment 
merges, when there are two merges partially complete and it's working on 
a third, causing ongoing index activity to cease for several minutes.  
Indexing activity seems to be fine up until there are three merges in 
progress.


This is a virtual environment using Xen on CentOS5, two VMs.  The host 
has SATA RAID1, so there's not a lot of I/O capacity.  When both virtual 
machines are busy indexing, it can't keep up with the load, and one 
segment merge doesn't have time to complete before it's built up enough 
segments to start another one, which puts the first one on hold.  If I 
build one virtual machine at a time, it doesn't do this, but then it 
takes twice as long.  My 1.4.1 production systems builds all six shards 
at the same time when it's doing a full rebuild, but that's using RAID10.


I grabbed a sniffer trace of the MySQL connection from the database 
server.  After the last actual data packet in the capture, there is a 
173 second pause followed by a "Request Quit" packet from the VM, then 
the connection is torn down normally.


My best guess right now is that the "idle-timeout-minutes" setting in 
JDBC is coming into play here during my single query, and that it's set 
to 3 minutes.  The Internet cannot seem to tell me what the default 
value is for this setting, and I do not see it mentioned anywhere in the 
MySQL/J source code.  I tried adding  idle-timeout-minutes="30" to the 
datasource definition in my DIH config, it didn't seem to do anything.


Am I on the right track?  Is there any way to configure DIH so that it 
won't do this?


Thanks,
Shawn



RE: Solr: org.apache.solr.common.SolrException: Invalid Date String:

2011-05-05 Thread Rohit
Hi Craig,

Thanks for the response, actually what we need to achive is see group by
results based on dates like,

2011-01-01  23
2011-01-02  14
2011-01-03  40
2011-01-04  10

Now the records in my table run into millions, grouping the result based on
UTC date would not produce the right result since the result should be
grouped on users timezone.  Is there anyway we can achieve this in Solr?

Regards,
Rohit



-Original Message-
From: Craig Stires [mailto:craig.sti...@gmail.com] 
Sent: 06 May 2011 04:30
To: solr-user@lucene.apache.org
Subject: RE: Solr: org.apache.solr.common.SolrException: Invalid Date
String:


Rohit,

The solr server using TrieDateField must receive values in the format
2011-01-07T17:00:30Z

This should be a UTC-based datetime.  The offset can be applied once you get
your results back from solr
   SimpleDateFormat df =   new SimpleDateFormat(format);
   df.setTimeZone(TimeZone.getTimeZone("IST"));
   java.util.Date dateunix = df.parse(datetime);


-Craig


-Original Message-
From: Rohit [mailto:ro...@in-rev.com] 
Sent: Friday, 6 May 2011 2:31 AM
To: solr-user@lucene.apache.org
Subject: Solr: org.apache.solr.common.SolrException: Invalid Date String:

Hi,

I am new to solr and this is my first attempt at indexing solr data, I am
getting the following exception while indexing,

org.apache.solr.common.SolrException: Invalid Date String:'2011-01-07' at
org.apache.solr.schema.DateField.parseMath(DateField.java:165) at
org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:169) at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277)

I understand from reading some articles that Solr stores time only in UTC,
this is the query i am trying to index,

Select id,text,'language',links,tweetType,source,location,
bio,url,utcOffset,timeZone,frenCnt,createdAt,createdOnGMT,createdOnServerTim
e,follCnt,favCnt,totStatusCnt,usrCrtDate,humanSentiment,replied,replyMsg,cla
ssified,locationDetail,
geonameid,country,continent,placeLongitude,placeLatitude,listedCnt,hashtag,m
entions,senderInfScr,
createdOnGMTDate,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+05:30'),
'%Y-%m-%d') as
IST,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+01:00'),'%Y-%m-%d')
as
ECT,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+02:00'),'%Y-%m-%d')
as
EET,DATE_FORMAT(CONVERT_TZ(createdOnGMTDate,'+00:00','+03:30'),'%Y-%m-%d')
as MET,sign(classified) as sentiment from

Why i am doing this timezone conversion is because i need to group results
by the user timezone. How can i achieve this?

Regards, Rohit

 




Solr Search fails

2011-05-05 Thread deniz
Hi all. have been trying to implement a universal search on a field but
somehow it fails...

when I make a full import everything is ok I can see the indexed field. But
when i make a query like 

universal:Male

it shows no match

any ideas?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to use sub-fields or multivalued fields for boosting?

2011-05-05 Thread deniz
it seems like I will use dismax... I have tried some other ways but dismax
seems the best :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2907094.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search fails

2011-05-05 Thread Grijesh
What is your field type and analysis chain

-
Thanx: 
Grijesh 
www.gettinhahead.co.in 
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907097.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search fails

2011-05-05 Thread deniz
type is string and i use standard analyzer ( i am not sure what you mean by
the word chain )

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Search-fails-tp2907093p2907104.html
Sent from the Solr - User mailing list archive at Nabble.com.