date:20110516

Re: Order of words in proximity search

2011-05-16 Thread lboutros

the key phrase was this one :) :

A sloppy phrase query specifies a maximum slop, or the number of
positions tokens need to be moved to get a match. 

so you could search for foo bar~101 in your example.

Ludovic.


-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946620.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Order of words in proximity search

2011-05-16 Thread lboutros

I would prefer to put a higher slop number instead of a boolean clause : 200
perhaps in your specific case.

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946645.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: K-Stemmer for Solr 3.1

2011-05-16 Thread Bernd Fehling


I don't know if it is allowed to modify Lucid code and add it to jira.
If someone from Lucid would give me the permission and the Solr developers
have nothing against it I won't mind adding the Lucid KStemmer to jira
for Solr 3.x and 4.x.

There are several Lucid KStemmer users which I can see from the many requests
which I got. Also the Lucid KStemmer is faster than the standard KStemmer.

Bernd

Am 16.05.2011 06:33, schrieb Bill Bell:

Did you upload the code to Jira?

On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de
wrote:


I backported a Lucid KStemmer version from solr 4.0 which I found
somewhere.
Just changed from
import org.apache.lucene.analysis.util.CharArraySet;  // solr4.0
to
import org.apache.lucene.analysis.CharArraySet;  // solr3.1

Bernd


Am 12.05.2011 16:32, schrieb Mark:

java.lang.AbstractMethodError:
org.apache.lucene.analysis.TokenStream.incrementToken()Z

Would you mind explaining your modifications? Thanks

On 5/11/11 11:14 PM, Bernd Fehling wrote:


Am 12.05.2011 02:05, schrieb Mark:

It appears that the older version of the Lucid Works KStemmer is
incompatible with Solr 3.1. Has anyone been able to get this to work?
If not,
what are you using as an alternative?

Thanks


Lucid KStemmer works nice with Solr3.1 after some minor mods to
KStemFilter.java and KStemFilterFactory.java.
What problems do you have?

Bernd





--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*

Opening a file at a page where I encounter a hit

2011-05-16 Thread Vignesh Raj

Hi,

I am using ASP.Net MVC and solrnet for my search tool. The files I index
include pdf files, word docs, excel etc...

I am able to search and retrieve all the docs with a hit. Now the problem
lies in opening the files with a hit.

When I open the file, it should open at the location where the hit is
encountered. 

How do i manage this? It will be even more helpful if I can highlight the
hit inside the opened document?

 

Please help me in this regard.

 

Regards

Vignesh

Re: Opening a file at a page where I encounter a hit

2011-05-16 Thread Gora Mohanty

On Mon, May 16, 2011 at 12:00 PM, Vignesh Raj
vignesh...@greatminds.co.in wrote:
 Hi,

 I am using ASP.Net MVC and solrnet for my search tool. The files I index
 include pdf files, word docs, excel etc...

 I am able to search and retrieve all the docs with a hit. Now the problem
 lies in opening the files with a hit.

 When I open the file, it should open at the location where the hit is
 encountered.

 How do i manage this? It will be even more helpful if I can highlight the
 hit inside the opened document?

One way to display the document text is to also store it in Solr.
There are two issues with this:
* The Solr index will grow considerably. However, the performance
   limits are still acceptable to us, with a ~60GB index size.
* You will probably lose formatting from the documents. One
   can manage to retain much of the original formatting by pre-
   processing the text to format it before indexing into Solr.
   However, this is not perfect.

The other way is to retain in Solr a path to the original document
that you can then serve from the filesystem:
* How to do this depends on how you are indexing into Solr.
* Highlighting query terms, and opening the document at the
  right place has to be done by external programs (note that
  one document can have multiple matches, so that there is
  no a priori right place to open the document).

Regards,
Gora

Re: Order of words in proximity search

2011-05-16 Thread Tor Henning Ueland

Hi,

The strange part is that i have actually tried a slop of 1000 (1K),
and the results are still different. This even when the test data has
a limiter of 10K for each sentence.
(This means that a sloppy phrase should only give hits where the
complete sentence is found, yet it is not the result...)

Hope that explains the issue a bit better :)

Regards
Tor

On Mon, May 16, 2011 at 8:08 AM, lboutros boutr...@gmail.com wrote:
 the key phrase was this one :) :

 A sloppy phrase query specifies a maximum slop, or the number of
 positions tokens need to be moved to get a match. 

 so you could search for foo bar~101 in your example.

 Ludovic.


 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946620.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Mvh
Tor Henning Ueland

Re: Order of words in proximity search

2011-05-16 Thread lboutros

The analyzer of the field you are using could impact the Phrase Query Slop. 
Could you copy/paste the part of the schema ?

Ludovic.

-
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946764.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Want to Delete Existing Index create fresh index

2011-05-16 Thread Pawan Darira

It is by default commented in solrconfig.xml

On Sat, May 14, 2011 at 10:49 PM, Gabriele Kahlout gabri...@mysimpatico.com
 wrote:

 I guess you are having issues with the datadir. Did you set the datadir in
 solrconfig.xml?

 On Sat, May 14, 2011 at 4:10 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:

  Hi
 
  I am using Solr 1.4.  had changed schema already. When i created the
 index
  for first time, the directory was automatically created  index made
  perfectly fine.
 
  Now, i want to create the index from scratch, so I deleted the whole
  data/index directory  ran the script. Now it is only creating empty
  directories  NO index files inside that.
 
  Thanks
  Pawan
 
 
  On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan dmitry@gmail.com
 wrote:
 
   Hi Pawan,
  
   Which SOLR version do you have installed?
  
   It should be absolutely normal for the data/ sub directory to create
 when
   starting up SOLR.
  
   So just go ahead and post your data into SOLR, if you have changed the
   schema already.
  
   --
   Regards,
  
   Dmitry Kan
  
   On Sat, May 14, 2011 at 4:01 PM, Pawan Darira pawan.dar...@gmail.com
   wrote:
  
I did that. Index directory is created but not contents in that
   
2011/5/14 François Schiettecatte fschietteca...@gmail.com
   
 You can also shut down solr/lucene, do:

rm -rf /YourIndexName/data/index

 and restart, the index directory will be automatically recreated.

 François

 On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote:

  curl --fail $solrIndex/update?commit=true -d
  'deletequery*:*/query/delete' #empty index [1
  

  
 http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
]
 
  did u try?
 
 
  On Sat, May 14, 2011 at 7:26 AM, Pawan Darira 
   pawan.dar...@gmail.com
 wrote:
 
  Hi
 
  I had an existing index created months back. now my database
  schema
has
  changed. i wanted to delete the current data/index directory 
re-create
  the
  fresh index
 
  but it is saying that segments file not found  just create
  blank
  data/index directory. Please help
 
  --
  Thanks,
  Pawan Darira
 
 
 
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee
 acknowledges
   the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
 this)
  ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or
  the
 email
  does not contain a valid code then the email is not received. A
  valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
 subject(x)
  ∧
   y
∈
  L(-[a-z]+[0-9]X)).


   
   
--
Thanks,
Pawan Darira
   
  
 
 
 
  --
  Thanks,
  Pawan Darira
 



 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




-- 
Thanks,
Pawan Darira

Can't seem to get External Field scoring running

2011-05-16 Thread karanveer singh

I want to be able to dynamically change scores without having to
update the entire document.
For this, I started using the External File Field.

I set a fieldType called idRankFile and field called idRank in schema.xml :

fieldType name=idRankFile keyField=id defVal=0 stored=false
indexed=false class=solr.ExternalFileField valType=pfloat /
field name=idRank type=idRankFile /

Now I set the idRank for various id's in a file called
external_idRank.txt in dataDir :

F8V7067-APL-KIT = 1.0
IW-02 = 10.0
9885A004 = 100.0

Originally, the scores for these 3 id's (for my query) was in reverse order.
Now, I query using the following :

http://localhost:8983/solr/select?indent=onq=car%20power%20adaptorfl=id,name_val_:idRank

However, the order for the results remains the same. It seems it
hasn't taken the external field into account

Any ideas how to do this? Is my query correct?

Re: UIMA analysisEngine path

2011-05-16 Thread Tommaso Teofili

Hello,

if you want to take the descriptor from a jar, provided that you configured
the jar inside a lib element in solrconfig, then you just need to write
the correct classpath in the analysisEngine element.
For example if your descriptor resides in com/something/desc/ path inside
the jar then you should set the analysisEngine element as
/com/something/desc/descriptorname.xml
If you instead need to get the descriptor from filesystem try the patch in
SOLR-2501 [1].
Hope this helps,
Tommaso

[1] :  https://issues.apache.org/jira/browse/SOLR-2501

2011/5/13 chamara chama...@gmail.com

 Hi,
  Is this code line 57 needs to be changed to the location where the jar
 files(library files) resides?
  URL url = this.getClass().getResource(location of the jar files); I
 did
 change it but no luck so far. Let me know what i am doing wrong?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Order of words in proximity search

2011-05-16 Thread Tor Henning Ueland

http://pastebin.com/svyefmM6

Pretty standard :)

/Tor

On Mon, May 16, 2011 at 9:18 AM, lboutros boutr...@gmail.com wrote:
 The analyzer of the field you are using could impact the Phrase Query Slop.
 Could you copy/paste the part of the schema ?

 Ludovic.

 -
 Jouve
 France.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Order-of-words-in-proximity-search-tp2938427p2946764.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Mvh
Tor Henning Ueland

[POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Jan Høydahl

Hi,

This poll is to investigate how you currently do or would like to do logging 
with Solr when deploying solr.war to a SEPARATE java application server (such 
as Tomcat, Resin etc) outside of the bundled solr/example. For background on 
how things work in Solr now, see http://wiki.apache.org/solr/SolrLogging and 
for more info on the SLF4J framework, see http://www.slf4j.org/manual.html

Please tick one of the options below with an [X]:

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with re-packaging 
solr.war
[ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

Note that NOT bundling a logger binding with solr.war means defaulting to the 
NOP logger after outputting these lines to stderr:
SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Peter Sturge

 [X]  I always use the JDK logging as bundled in solr.war, that's perfect
 [ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can
choose at deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time,
using an ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Markus Jelsma

 [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
 [X ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can
choose at deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time,
using an ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!

Setting up log4j is easy but encountered issues with versions when switching 
to 3.1.

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Michael McCandless

On Sun, May 15, 2011 at 7:44 PM, Mark Miller markrmil...@gmail.com wrote:

 Could you please revert your commit, until we've reached some
 consensus on this discussion first?

 Let's reach some consensus, but why revert? This has been the behavior - 
 shouldn't the consensus onus be on changing it to begin with? That's how I 
 see it.

To be clear, I'm asking that Yonik revert his commit from yesterday
(rev 1103444), where he added text_nwd fieldType and dynamic fields
*_nwd to the example schema.xml.

I agree we should reach consensus before changing what's already
committed, that's exactly why I'm asking Yonik to revert -- we were in
the middle of discussing this, and I had posted a patch on SOLR-2519,
when he suddenly committed the text_nwd change, yesterday.

Does anyone disagree that Yonik's commit was inappropriate?  This is
not how we work at Apache.

 I'm going to need to get back up to speed on this issue before I can comment 
 more helpfully. Better out of the box support for other languages is 
 important - I think it makes sense to discuss this issue again myself.

+1

Solr, out of box, is just awful for non-whitespace languages (eg CJK,
and others).  And for every user who comes to the list asking for help
(thank you cyang2010!), I imagine there are many others who simply
gave up and walked away (from Solr) when they tried it on CJK
content.

Lucene has made awesome strides in having natural defaults that work
well across many languages, thanks to the hard work of Robert and
others (StandardAnalyzer now actually follows a standard (UAX #29 --
text segmentation), autophrase off in QP, etc.), and I think we should
take advantage of this in Solr, just like ElasticSearch does.

Really, the best solution (I think) would be to have language-specific
fieldTypes (text_en, text_zh, etc.), but I suspect there's a good
amount of work to reach that so in the meantime I think we should fix
the defaults for the text fieldType to work well across many
languages.

Mike

http://blog.mikemccandless.com

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Gora Mohanty

On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl jan@cominvent.com wrote:
[...]
 Please tick one of the options below with an [X]:

 [ X]  I always use the JDK logging as bundled in solr.war, that's perfect
 [ ]  I sometimes use log4j or another framework and am happy with 
 re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
 deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time, using an 
 ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!

Regards,
Gora

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Martijn v Groningen

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

On 16 May 2011 11:32, Gora Mohanty g...@mimirtech.com wrote:

 On Mon, May 16, 2011 at 2:13 PM, Jan Høydahl jan@cominvent.com
 wrote:
 [...]
  Please tick one of the options below with an [X]:
 
  [ X]  I always use the JDK logging as bundled in solr.war, that's perfect
  [ ]  I sometimes use log4j or another framework and am happy with
 re-packaging solr.war
  [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
 deploy time
  [ ]  Let me choose whether to bundle a binding or not at build time,
 using an ANT option
  [ ]  What's wrong with the solr/example Jetty? I never run Solr
 elsewhere!
  [ ]  What? Solr can do logging? How cool!

 Regards,
 Gora




-- 
Met vriendelijke groet,

Martijn van Groningen

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Chantal Ackermann


 Please tick one of the options below with an [X]:
 
 [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
 [X]  I sometimes use log4j or another framework and am happy with 
 re-packaging solr.war

actually : not so happy because our operations team has to repackage it.
But there is no option for
 [X] add the logger configuration to the server's classpath, no
repackaging!

 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
 deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time, using an 
 ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Jan Høydahl

 [X]  I sometimes use log4j or another framework and am happy with 
 re-packaging solr.war
 
 actually : not so happy because our operations team has to repackage it.
 But there is no option for
 [X] add the logger configuration to the server's classpath, no
 repackaging!

That's what happens if we ship solr.war without any pre-set logger binding - 
it's the binding provided in your app-server's classpath which will be used.


And now my vote:

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with re-packaging 
solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

LockObtainedFailedException on solr update

2011-05-16 Thread nitesh nandy

My solr index is updated simultaneously by multiple clients via REST. I use
commitWithing attribute in the add/add command to direct auto commits.

I start getting this error after a couple of days of usage. How do i fix
this ? Please find the error log below. Using solr 3.1 with tomcat  Thanks
--
HTTP Status 500 - Lock obtain timed out: NativeFSLock@
/var/lib/solr/data/index/write.lock

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out:
NativeFSLock@/var/lib/solr/data/index/write.lock
  at org.apache.lucene.store.Lock.obtain(Lock.java:84)
  at org.apache.lucene.index.IndexWriter.lt;initgt;(IndexWriter.java:1097)
  at
org.apache.solr.update.SolrIndexWriter.lt;initgt;(SolrIndexWriter.java:83)
  at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
  at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
  at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
  at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
  at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
  at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
  at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
  at java.lang.Thread.run(Thread.java:662)


-- 
Regards,

Nitesh Nandy

Re: Set Full-Import Clean=False

2011-05-16 Thread Gora Mohanty

On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com wrote:
 Hi

 Where do I set the default value of clean = false when a full-import is
 done.

Append it to the URL, e.g., dataimport?command=full-importclean=false

Regards,
Gora

Re: Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal

I have been doing that, but I want to set it as False by default, so 
that even if the admin forgets to set clean=false in the URL, it doesn't 
do it on its own.

On 16-05-2011 17:38, Gora Mohanty wrote:

On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com  wrote:

Hi

Where do I set the default value of clean = false when a full-import is
done.

Append it to the URL, e.g., dataimport?command=full-importclean=false

Regards,
Gora



--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582

Solr Cell and operations on metadata extracted

2011-05-16 Thread Olivier Tavard

Hi,



I have a question about Solr Cell please.

I index some files. For example, if I want to extract the filename, then use
a hash function on it like MD5 and then store it on Solr ; the correct way
is to use Tika « manually » to extract the metadata I want, do the
transformations on it and then send it to Solr ?

I can’t use directly Solr Cell in this case because I can't do modifications
on the metadata extracted, right ?





Thanks,



Olivier

Re: Set Full-Import Clean=False

2011-05-16 Thread Stefan Matheis

Jasneet,

what about defining the value as a default in the dataimport
request-handler? like the sample at
http://wiki.apache.org/solr/SolrRequestHandler does?

Regards
Stefan

On Mon, May 16, 2011 at 2:10 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com wrote:
 I have been doing that, but I want to set it as False by default, so that
 even if the admin forgets to set clean=false in the URL, it doesn't do it on
 its own.
 On 16-05-2011 17:38, Gora Mohanty wrote:

 On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
 jasneet.sabhar...@ngicorporation.com  wrote:

 Hi

 Where do I set the default value of clean = false when a full-import is
 done.

 Append it to the URL, e.g., dataimport?command=full-importclean=false

 Regards,
 Gora


 --
 Regards

 Jasneet Sabharwal
 Software Developer
 NextGen Invent Corporation
 +91-9871228582

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Mark Miller


On May 16, 2011, at 5:30 AM, Michael McCandless wrote:

 Does anyone disagree that Yonik's commit was inappropriate?  This is
 not how we work at Apache.

Ah - dunno yet - I obviously missed part of the conversation here. I thought 
you where talking about reversing 'autophrase off' as the default, not these 
'quick' new field types.

Excuse me for a moment while I read...

Yeah - seems a little hasty. Not a fan of 'text_nwd' as a field name either. 
Didn't seem malicious to me, but it does seem we should probably work together 
in JIRA/discussion before just shotgunning changes...

Don't know that I care if it's reverted (if we fall back another 10 steps into 
that BS I quit everything and I'm moving to South America), but we should push 
on here either way.

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org

Re: Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal


Stefan,

I have added the DIH request handler in the solrconfig.xml. Do I have to 
add the clean=false in that or somewhere else ?


Regards
Jasneet
On 16-05-2011 18:03, Stefan Matheis wrote:

Jasneet,

what about defining the value as a default in the dataimport
request-handler? like the sample at
http://wiki.apache.org/solr/SolrRequestHandler does?

Regards
Stefan

On Mon, May 16, 2011 at 2:10 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com  wrote:

I have been doing that, but I want to set it as False by default, so that
even if the admin forgets to set clean=false in the URL, it doesn't do it on
its own.
On 16-05-2011 17:38, Gora Mohanty wrote:

On Mon, May 16, 2011 at 5:29 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.comwrote:

Hi

Where do I set the default value of clean = false when a full-import is
done.

Append it to the URL, e.g., dataimport?command=full-importclean=false

Regards,
Gora


--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582





--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582

Re: Set Full-Import Clean=False

2011-05-16 Thread Stefan Matheis

Jasneet

On Mon, May 16, 2011 at 3:10 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com wrote:
 I have added the DIH request handler in the solrconfig.xml.

Exactly there :)

Regards
Stefan

Getting Null pointer exception While doing a full import

2011-05-16 Thread mechravi25

Hi, 
I am doing a full import in one of the cores. But I am getting Null poniter
exception and the import is failing again and again. I also tried clearing
the indexes and started the full import, but still indexing failed. 

The full import request is prefect and I verified it with other full import
requests too. Any Suggestion/Solution will be of great help. Thanks in
advance. The exception is as follows:

May 14, 2011 5:06:56 AM org.apache.solr.core.SolrCore execute
INFO: [core6] webapp=/solr path=/dataimport params={wt=javabinversion=1}
status=0 QTime=0 
May 14, 2011 9:03:55 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:33)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
at org.apache.solr.search.QParser.getQuery(QParser.java:137)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:85)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Thanks  Regards,
Sivaganesh
Email id: sivaganesh_sel...@infosys.com

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-Null-pointer-exception-While-doing-a-full-import-tp2947854p2947854.html
Sent from the Solr - User mailing list archive at Nabble.com.

boolean versus non-boolean search

2011-05-16 Thread Dmitry Kan

Dear list,

Might have missed it from the literature and the list, sorry if so, but:

SOLR 1.4.1
solrQueryParser defaultOperator=AND/


Consider the query:

term1 term2 OR term1 term2 OR term1 term3


Problem: The query produces a hit containing only term1.

Solution: Modified query, grouping with parenthesis

(term1 term2) OR term1 term2 OR term1 term3

produces hits with both term1 and term2 present and other hits that are hit
by OR'ed clauses.


Problem 1. Another modified query, AND instead of parenthesis:

term1 AND term2 OR term1 term2 OR term1 term3

produces same results as the original query and same debug output.

Why is that?

-- 
Regards,

Dmitry Kan

Re: Set Full-Import Clean=False

2011-05-16 Thread Jasneet Sabharwal


Stefan

requestHandler name=/dataimport 
class=org.apache.solr.handler.dataimport.DataImportHandler

lst name=defaults
str 
name=config/home/jasneet/apache-solr-3.1.0/example/solr/conf/data-config.xml/str

str name=cleanfalse/str
/lst
/requestHandler

Should it be like this ?
On 16-05-2011 18:48, Stefan Matheis wrote:

Jasneet

On Mon, May 16, 2011 at 3:10 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com  wrote:

I have added the DIH request handler in the solrconfig.xml.

Exactly there :)

Regards
Stefan



--
Regards

Jasneet Sabharwal
Software Developer
NextGen Invent Corporation
+91-9871228582

RE: SolrDispatchFilter

2011-05-16 Thread Rod.Madden

Yep that fixed my problem ...many thanks !

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, May 13, 2011 6:37 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrDispatchFilter

: This problem is only occurring when using IE8 ( Chrome  FireFox fine
)

if it only happens when using the form on the admin screen (and not when

hitting the URL directly, via shift-reload for example), it may just be 
a differnet manifestation of this silly javascript bug...

https://issues.apache.org/jira/browse/SOLR-2455

-Hoss

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley

On Sun, May 15, 2011 at 1:48 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Could you please revert your commit, until we've reached some
 consensus on this discussion first?

Huh?
I thought everyone was in agreement that we needed more field types
for different languages?
I added my best guess about what a generic type for
non-whitespace-delimited might look like.
Since it's a new field type, it doesn't affect anything.  Hopefully it
only improves the situation
for someone trying to use one of these languages.

The only negative would seem to be if it's worse than nothing (i.e. a
very bad example
because it actually doesn't work for non-whitespace-delimited languages).

The issue about changing defaults on TextField and changing what text does in
the example schema by default is not dependent on this.  They are only related
by the fact that if another field is added/changed then _nwd may
become redundant
and can be removed.  For now, it only seems like an improvement?

Anyway... the whole language of revert seems unnecessarily confrontational.
Feel free to improve what's there (or delete *_nwd if people really
feel it adds no/negative value)

-Yonik

How to index and query C# as whole term?

2011-05-16 Thread Gnanakumar

Hi,

I'm using Apache Solr v3.1.

How do I configure/allow Solr to both index and query the term c# as a
whole word/term?  From Analysis page, I could see that the term c# is
being reduced/converted into just c by solr.WordDelimiterFilterFactory.

Regards,
Gnanam

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Péter Király

 [ ]  I always use the JDK logging as bundled in solr.war, that's perfect
 [x]  I sometimes use log4j or another framework and am happy with 
 re-packaging solr.war
 [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
 deploy time
 [ ]  Let me choose whether to bundle a binding or not at build time, using an 
 ANT option
 [ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
 [ ]  What? Solr can do logging? How cool!

Péter

Re: Set Full-Import Clean=False

2011-05-16 Thread Stefan Matheis

On Mon, May 16, 2011 at 3:27 PM, Jasneet Sabharwal
jasneet.sabhar...@ngicorporation.com wrote:
 Should it be like this ?

Never tried it myself, but what i guess from the Wiki ... Yes. doesn't
work for you, or just asked to be sure, before integrating it?

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley

On Mon, May 16, 2011 at 5:30 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 To be clear, I'm asking that Yonik revert his commit from yesterday
 (rev 1103444), where he added text_nwd fieldType and dynamic fields
 *_nwd to the example schema.xml.

So... your position is that until the text fieldType is changed to
support non-whitespace-delimited languages better, that
no other fieldType should be changed/added to better support
non-whitespace-delimited languages?
Man, that seems political, not technical.

Whatever... I'll revert.

-Yonik

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Simon Willnauer

On Mon, May 16, 2011 at 3:51 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Mon, May 16, 2011 at 5:30 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 To be clear, I'm asking that Yonik revert his commit from yesterday
 (rev 1103444), where he added text_nwd fieldType and dynamic fields
 *_nwd to the example schema.xml.

 So... your position is that until the text fieldType is changed to
 support non-whitespace-delimited languages better, that
 no other fieldType should be changed/added to better support
 non-whitespace-delimited languages?
 Man, that seems political, not technical.

To me it seems neither nor. Its rather the process of improving
aligned with outstanding issues.
It shouldn't feel wrong.

Simon

 Whatever... I'll revert.

 -Yonik

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Michael McCandless

On Mon, May 16, 2011 at 9:51 AM, Yonik Seeley
yo...@lucidimagination.com wrote:

 To be clear, I'm asking that Yonik revert his commit from yesterday
 (rev 1103444), where he added text_nwd fieldType and dynamic fields
 *_nwd to the example schema.xml.

 So... your position is that until the text fieldType is changed to
 support non-whitespace-delimited languages better, that
 no other fieldType should be changed/added to better support
 non-whitespace-delimited languages?

No, that's not my position at all.

My position is: please don't suddenly commit changes, with your way,
while we're still discussing how to solve the issue.  That's not the
Apache way.

This applies in general, not just this case (fixing Solr's
out-of-the-box behavior with non-whitespace languages).

So, it could very well be, after we iterate on SOLR-2519, that we all
agree your baby step is great, in which case let's go forward with
that.  But we should all come to some consensus about that before you
suddenly commit.

 Man, that seems political, not technical.

I'm sorry you feel that way, but it's important to me that we all
follow the Apache way here.  I feel this will only make our community
stronger.

It's also important that any time another committer is uncomfortable
with what just got committed, and asks for a revert, that it *not* be
a big deal.  It's not political, it was just a mistake and the revert
is quick and painless.

We are commit-then-review here, and if someone is uncomfortable, they
should say so and whoever committed should simply revert it and
re-iterate.  This should be a simple  free tool for all of us to
use.

 Whatever... I'll revert.

Thank you.

Mike

Re: Show filename in search result using a FileListEntityProcessor

2011-05-16 Thread Marcel Panse

Hi, thanks for the reply.

I tried a couple of things both in the tika-test entity and in the entity
named 'f'.
In the tika-test entity I tried:

field column=fileName name=${f.fileName} /
field column=fileName name=${f.file} /

even

field column=fileName name=${f.fileAbsolutePath} /

I also tried doing things in the entity 'f' like:

field column=fileName name=fileName/
field column=fileName name=file/

None of it works. I also added fileName to the schema like:

field name=fileName type=string indexed=true stored=true /

In fields. Doesn't help.

Can anyone provide me with a working example? I'm pretty stuck here on
something that seems really trivial and simple :-(



On Sat, May 14, 2011 at 22:56, kbootz kbo...@caci.com wrote:

 There is a JIRA item(can't recall it atm) that addresses the issue with the
 docs. I'm running 3.1 and per your example you should be able to get it
 using ${f.file}. I think* it should also be in the entity desc. but I'm
 also
 new and that's just how I access it.

 GL

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Show-filename-in-search-result-using-a-FileListEntityProcessor-tp2939193p2941305.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Yonik Seeley

On Mon, May 16, 2011 at 10:06 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Mon, May 16, 2011 at 9:51 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:

 To be clear, I'm asking that Yonik revert his commit from yesterday
 (rev 1103444), where he added text_nwd fieldType and dynamic fields
 *_nwd to the example schema.xml.

 So... your position is that until the text fieldType is changed to
 support non-whitespace-delimited languages better, that
 no other fieldType should be changed/added to better support
 non-whitespace-delimited languages?

 No, that's not my position at all.

 My position is: please don't suddenly commit changes, with your way,
 while we're still discussing how to solve the issue.  That's not the
 Apache way.

Dude... everyone has always agreed we need more fieldtypes to support
different languages (as you did earlier in this thread too).  There's been a
history of just adding stuff like that (half of the commits to the example
schema have no associated JIRA issue).

What happens to the default text field will have no bearing on that.
We will still need more field types to support more languages.
Would you be against me adding a text_cjk fieldtype too?

My position: it's silly for a lack of consensus on the text field to
block progesss on any other fieldtype.

-Yonik

Re: How to index and query C# as whole term?

2011-05-16 Thread Jonathan Rochkind

I don't think you'd want to use the string type here. String type is 
almost never appropriate for a field you want to actually search on (it 
is appropriate for fields to facet on).


But you may want to use Text type with different analyzers selected.  
You probably want Text type so the value is still split into different 
tokens on word boundaries; you just don't want an analyzer set that 
removes punctuation.


On 5/16/2011 10:46 AM, Gora Mohanty wrote:

On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com  wrote:

Hi,

I'm using Apache Solr v3.1.

How do I configure/allow Solr to both index and query the term c# as a
whole word/term?  From Analysis page, I could see that the term c# is
being reduced/converted into just c by solr.WordDelimiterFilterFactory.

[...]

Yes, as you have discovered the analyzers for the field type in
question will affect the values indexed.

To index c# exactly as is, you can use the string type, instead
of the text type. However, what you probably want some filters
to be applied, e.g., LowerCaseFilterFactory. Take a look at the
definition of the fieldType text in schema.xml, define a new field
type that has only the tokenizers and analyzers that you need, and
use that type for your field. This Wiki page should be helpful:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Regards,
Gora

Re: boolean versus non-boolean search

2011-05-16 Thread Jonathan Rochkind


Why? Becuase of how the solr/lucene query parser parses?

It parses into seperate tokens/phrases, and then marks each unit as 
mandatory or optional. The operator's joining the tokens/phrases are 
used to determine if a unit is mandatory or optional.


Since your defaultOperator=AND

term1 term2 OR X

is the same as:

term1 AND term2 OR X

because it used the defaultOperator in between term1 and term2, since no 
explicit operator was provided.


Then we get to the one you specifically did add the AND in. I guess that 
it basically groups left-to-right. So:


term1 AND term2 OR X OR Y

is the same as:

term1 AND (term2 OR (X OR Y))

But I guess you already figured this all out, yeah?

On 5/16/2011 9:24 AM, Dmitry Kan wrote:

Dear list,

Might have missed it from the literature and the list, sorry if so, but:

SOLR 1.4.1
solrQueryParser defaultOperator=AND/


Consider the query:

term1 term2 OR term1 term2 OR term1 term3


Problem: The query produces a hit containing only term1.

Solution: Modified query, grouping with parenthesis

(term1 term2) OR term1 term2 OR term1 term3

produces hits with both term1 and term2 present and other hits that are hit
by OR'ed clauses.


Problem 1. Another modified query, AND instead of parenthesis:

term1 AND term2 OR term1 term2 OR term1 term3

produces same results as the original query and same debug output.

Why is that?

RE: How to index and query C# as whole term?

2011-05-16 Thread Robert Petersen

I have always just converted terms like 'C#' or 'C++' into 'csharp' and
'cplusplus' before indexing them and similarly converted those terms if
someone searched on them.  That always has worked just fine for me...
:)

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu] 
Sent: Monday, May 16, 2011 8:28 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index and query C# as whole term?

I don't think you'd want to use the string type here. String type is 
almost never appropriate for a field you want to actually search on (it 
is appropriate for fields to facet on).

But you may want to use Text type with different analyzers selected.  
You probably want Text type so the value is still split into different 
tokens on word boundaries; you just don't want an analyzer set that 
removes punctuation.

On 5/16/2011 10:46 AM, Gora Mohanty wrote:
 On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com  wrote:
 Hi,

 I'm using Apache Solr v3.1.

 How do I configure/allow Solr to both index and query the term c#
as a
 whole word/term?  From Analysis page, I could see that the term
c# is
 being reduced/converted into just c by
solr.WordDelimiterFilterFactory.
 [...]

 Yes, as you have discovered the analyzers for the field type in
 question will affect the values indexed.

 To index c# exactly as is, you can use the string type, instead
 of the text type. However, what you probably want some filters
 to be applied, e.g., LowerCaseFilterFactory. Take a look at the
 definition of the fieldType text in schema.xml, define a new field
 type that has only the tokenizers and analyzers that you need, and
 use that type for your field. This Wiki page should be helpful:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

 Regards,
 Gora

Re: How to index and query C# as whole term?

2011-05-16 Thread Markus Jelsma

Before indexing so outside Solr? Using the SynonymFilter would be easier i 
guess.

On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
 I have always just converted terms like 'C#' or 'C++' into 'csharp' and
 'cplusplus' before indexing them and similarly converted those terms if
 someone searched on them.  That always has worked just fine for me...
 
 :)
 
 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Monday, May 16, 2011 8:28 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index and query C# as whole term?
 
 I don't think you'd want to use the string type here. String type is
 almost never appropriate for a field you want to actually search on (it
 is appropriate for fields to facet on).
 
 But you may want to use Text type with different analyzers selected.
 You probably want Text type so the value is still split into different
 tokens on word boundaries; you just don't want an analyzer set that
 removes punctuation.
 
 On 5/16/2011 10:46 AM, Gora Mohanty wrote:
  On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com  wrote:
  Hi,
  
  I'm using Apache Solr v3.1.
  
  How do I configure/allow Solr to both index and query the term c#
 
 as a
 
  whole word/term?  From Analysis page, I could see that the term
 
 c# is
 
  being reduced/converted into just c by
 
 solr.WordDelimiterFilterFactory.
 
  [...]
  
  Yes, as you have discovered the analyzers for the field type in
  question will affect the values indexed.
  
  To index c# exactly as is, you can use the string type, instead
  of the text type. However, what you probably want some filters
  to be applied, e.g., LowerCaseFilterFactory. Take a look at the
  definition of the fieldType text in schema.xml, define a new field
  type that has only the tokenizers and analyzers that you need, and
  use that type for your field. This Wiki page should be helpful:
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
  
  Regards,
  Gora

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: boolean versus non-boolean search

2011-05-16 Thread Dmitry Kan

Hi Jonathan,

Well, I clearly understand, why 'term1 term2 OR ...' gives exactly same
results as 'term1 AND term2 OR ...', but what I do not get is, why grouping
with parentheses is required to have both term1 and term2 in the same hit
even though AND is the default operator and space between terms is expected
to be treated as AND.

Dmitry


On Mon, May 16, 2011 at 6:33 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Why? Becuase of how the solr/lucene query parser parses?

 It parses into seperate tokens/phrases, and then marks each unit as
 mandatory or optional. The operator's joining the tokens/phrases are used to
 determine if a unit is mandatory or optional.

 Since your defaultOperator=AND

 term1 term2 OR X

 is the same as:

 term1 AND term2 OR X

 because it used the defaultOperator in between term1 and term2, since no
 explicit operator was provided.

 Then we get to the one you specifically did add the AND in. I guess that it
 basically groups left-to-right. So:

 term1 AND term2 OR X OR Y

 is the same as:

 term1 AND (term2 OR (X OR Y))

 But I guess you already figured this all out, yeah?


 On 5/16/2011 9:24 AM, Dmitry Kan wrote:

 Dear list,

 Might have missed it from the literature and the list, sorry if so, but:

 SOLR 1.4.1
 solrQueryParser defaultOperator=AND/


 Consider the query:

 term1 term2 OR term1 term2 OR term1 term3


 Problem: The query produces a hit containing only term1.

 Solution: Modified query, grouping with parenthesis

 (term1 term2) OR term1 term2 OR term1 term3

 produces hits with both term1 and term2 present and other hits that are
 hit
 by OR'ed clauses.


 Problem 1. Another modified query, AND instead of parenthesis:

 term1 AND term2 OR term1 term2 OR term1 term3

 produces same results as the original query and same debug output.

 Why is that?




-- 
Regards,

Dmitry Kan

Re: document storage

2011-05-16 Thread Mike Sokolov


On 05/15/2011 11:48 AM, Erick Erickson wrote:

Where are the documents coming from? Because storing them ONLY in
Solr risks losing them if your index is somehow hosed.
   
In our case, we generally have source documents and can reproduce the 
index if need be, but that's a good point.

Storing them externally only has the advantage that your index will be
much smaller, which helps when replicating as you scale. The downside
here is that highlighting will be more resource-intensive since you're
re-analyzing text in order to highlight.
   
I had been imagining that the Highlighter could use stored term 
positions so as to avoid re-analysis.  Is this incompatible with 
external storage?


We might conceivably need to replicate the documents anyway, even if 
they are stored externally, in order to make them available to a farm of 
servers, although a SAN is another possibility here.


My main concern about storing internally was the cost of merging 
(optimizing) the index.  Presumably that would be increased if the docs 
are stored in it.

So, as usual, it depends (tm). What is the scale you need? What
is the QPS you're thinking of supporting?
   
Things are working well at a small scale, and in that environment I 
think all of these solutions work more or less equally well.  We're 
worrying about 10's of millions of documents and QPS around 50, so I 
expect we will have some significant challenges in coordinating a 
cluster of servers, and we're trying to plan as well as we can for 
that.  We expect updates to be performed in a batch mode - they don't 
have to be real-time, but they might need to be daily.


-Mike

Problem with custom Similarity class

2011-05-16 Thread Alex Grilo

Hi,
I'm new to Solr and I'm trying to use my custom Similarity class but I've
not succeeded on that.

I added some debug information and my class is loaded, but it is not used
when queries are made.

Does someone could help me? If any further information is relevant, I can
provide it.

Thanks in advance
--
Alex Bredariol Grilo
Developer - umamao.com

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Michael McCandless

On Mon, May 16, 2011 at 10:22 AM, Yonik Seeley
yo...@lucidimagination.com wrote:

 My position is: please don't suddenly commit changes, with your way,
 while we're still discussing how to solve the issue.  That's not the
 Apache way.

 Dude... everyone has always agreed we need more fieldtypes to support
 different languages (as you did earlier in this thread too).

+1, and I still agree that'd be best.  In that ideal future we would
have no more text fieldType, only text_zh, text_en, etc.

 There's been a
 history of just adding stuff like that (half of the commits to the example
 schema have no associated JIRA issue).

I wasn't objecting to the lack of a referenced JIRA issue; I was
objecting to you suddenly committing 'your way while we were still
discussing what to do.

 What happens to the default text field will have no bearing on that.

That's not really true?  I think any changes we make to any default
text* fieldTypes are strongly related.

For example, if we fix the text fieldType to have good all-around
defaults for all languages (ie, the patch on SOLR-2519) then we don't
need separate text_nwd/*_nwd field types.  Instead, maybe we could add
text_autophrase fieldTypes?  Or maybe text_en_autophrase?

 We will still need more field types to support more languages.

Right.

 Would you be against me adding a text_cjk fieldtype too?

text_cjk would be *awesome*, but text_zh, text_ja, text_ko would be
even better!

If we fix text fieldType to be generic for all languages (use
StandardAnalyzer, turn off autophrase), but then
go and add in specific languages over time (say text_en, text_cjk,
etc.), I think that's a great way to iterate towards the ideal future
where we have text_XX coverage for many languages.

 My position: it's silly for a lack of consensus on the text field to
 block progesss on any other fieldtype.

I disagree; I think changes to text fieldType are very much tied up
to what other text_* fieldTypes we want to introduce.

This is a *really* important configuration file in Solr and we should
present good defaults with it.  People who first use Solr start with
the schema.xml as their starting point.

People who first start with ElasticSearch today get StandardAnalyzer
and no autophrase as the default, which is the best overall default
Lucene has to offer right now.  I think Solr should do the same.

So to sum up, I think we should:

  1) Fix text fieldType to stop destroying non-whitespace languages,
 and use the best general defaults we have to offer today
 (switch from WhitespaceTokenizer - StandardTokenizer, and turn
 off autophrase); this is the patch on SOLR-2519.

  2) Add in text_XX specific language field types for as many as we
 can now, iterating over time to add more as we can / people get
 the itch.  We now have a fabulous analysis module (thank you
 Robert!), so we should take advantage of that and at least make
 text_XX for all the matching analyzers in there.

Let's continue this on the issue...

Mike

http://blog.mikemccandless.com

Re: assit with the Clustering component in Solr/Lucene

2011-05-16 Thread Stanislaw Osinski


 Both of the clustering algorithms that ship with Solr (Lingo and STC) are
 designed to allow one document to appear in more than one cluster, which
 actually does make sense in many scenarios. There's no easy way to force
 them to produce hard clusterings because this would require a complete
 change in the way the algorithms work. If you need each document to belong
 to exactly one cluster, you'd have to post-process the clusters to remove
 the redundant document assignments.


 On the second thought, I have a simple implementation of k-means clustering
 that could do hard clustering for you. It's not available yet, it will most
 probably be part of the next major release of Carrot2 (the package that does
 the clustering). Please watch this issue
 http://issues.carrot2.org/browse/CARROT-791 to get updates on this.


Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and branch_3x,
so you can use the bisecting k-means clustering algorithm
(org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which
will produce non-overlapping clusters for you. The downside of this simple
implementation of k-means is that, for the time being, it produces one-word
cluster labels rather than phrases as Lingo and STC.

Cheers,

S.

Re: Debugging same SOLR installation on 2 different servers

2011-05-16 Thread Paul Michalet


Thanks Erick !

As I re-checked the configuration files, it turns out someone had 
modified the /solr/conf/*stopwords.txt* on the production server,
and now we know what problem we're dealing with, which seems to be 
related to:
 - 
http://lucene.472066.n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html#a493488
 - 
http://stackoverflow.com/questions/3635096/dismax-feat-stopwords-synonyms-etc


Now I've tried to get around that issue by changing str 
name=mm2lt;-35%/str to str name=mm1/str in *solrconfig.xml*,
as suggested on http://drupal.org/node/1102646#comment-4249774 which 
actually gets us results for the incriminated queries, but it adds way 
too much *noise*...


So I tried to make sure all my field types were using our 
StopFilterFactory (even fieldType name=string class=solr.StrField 
sortMissingLast=true omitNorms=true), with no luck.
I'll keep on looking for clues, meanwhile if there's a known way around 
that issue, I'd be really grateful to hear about it :)


Cheers !
Paul


Le 15/05/2011 16:48, Erick Erickson a écrit :

What happens if you copy the index from one machine to the other? Probably from
prod to test. If your results stay the same, that'd eliminate index
differences as
the culprit.

What do you get by attachingdebugQuery=on the the queries that differ?
Is the parsed query any different? I'm wondering here if you somehow
have a difference in the configuration, perhaps dismax? anyway, if the
parsed queries are identical, that eliminates that possibility.

Next, what about synonym files? Stopwords? Are you absolutely sure they're
identical?

If you're using dismax, is it possible that the mm (minimum should match)
is different?

Perhaps this is all stuff you've done already, but this would at least narrow
down where the problem might lie...

Best
Erick

On Wed, May 11, 2011 at 12:10 PM, Paul Michaletp...@pix-l.fr  wrote:

Thanks for the hint :)
We ruled that out after having tested special characters, and if it was an
applicative bug, it wouldn't work consistently like it currently does for
the majority of queries.
The only difference we noticed was in the HTTP headers in the SOLR response:
occasionnally, the Content-length is present, but I've been told it was
probably not causing our bug:
  =  dev:
headers = Array
(
[0] =  HTTP/1.1 200 OK
[1] =  Last-Modified: Fri, 29 Apr 2011 13:36:21 GMT
[2] =  ETag: MTFjZjU2MTgxNDgwMDAwMFNvbHI=
[3] =  Content-Type: text/plain; charset=utf-8
[4] =  Server: Jetty(6.1.3)
)

=  production:
headers = Array
(
[0] =  HTTP/1.1 200 OK
[1] =  Last-Modified: Fri, 06 May 2011 14:18:36 GMT
[2] =  ETag: OGI3ZWYyZDUxNDgwMDAwMFNvbHI=
[3] =  Content-Type: text/plain; charset=utf-8
[4] =  Content-Length: 2558
[5] =  Server: Jetty(6.1.3)
)

Paul Michalet

Le 11/05/2011 17:47, Paul Libbrecht a écrit :

Could it be something in the transmission of the query?
Or is it also identical?

paul


Le 11 mai 2011 à 17:19, Paul Michalet a écrit :


Hello everyone

We have succesfully installed SOLR on 2 servers (developpement and
production), using the same configuration files and paths.
Both SOLR instances have indexed the same contents and most queries give
identical results, but there's a few exceptions where the production
instance returns 0 results (the developpement instance returns perfectly
valid results for the same query).
We checked the logs in both environments without finding anything
suspicous (the queries are rigorously identical, and the index is built in
the exact same way) and we've run out of options as to where to look for
debugging these cases.

Our developpement server is Debian and the production is CentOS;
the SOLR version installed in both environments is 1.4.0.

The weird thing is that the few queries failing in the production
instance contain very common terms (without quotes) which, when queried
individually, return valid results...
Any pointers would be greatly appreciated;
thanks in advance !

Paul

Re: boolean versus non-boolean search

2011-05-16 Thread Mike Sokolov



On 05/16/2011 09:24 AM, Dmitry Kan wrote:

Dear list,

Might have missed it from the literature and the list, sorry if so, but:

SOLR 1.4.1
solrQueryParser defaultOperator=AND/


Consider the query:

term1 term2 OR term1 term2 OR term1 term3

   
I think what's happening is that your query gets rewritten into 
something like:


+term1 + (term2? term1 term2? term3?)

where in my notation term? means term is optional, and + means 
required.  So any document would match the second clause


-Mike

Re: assit with the Clustering component in Solr/Lucene

2011-05-16 Thread ramdev.wudali

Thanks much Stan,


Ramdev

On May 16, 2011, at 11:38 AM, Stanislaw Osinski wrote:


Both of the clustering algorithms that ship with Solr 
(Lingo and STC) are designed to allow one document to appear in more than one 
cluster, which actually does make sense in many scenarios. There's no easy way 
to force them to produce hard clusterings because this would require a complete 
change in the way the algorithms work. If you need each document to belong to 
exactly one cluster, you'd have to post-process the clusters to remove the 
redundant document assignments.



On the second thought, I have a simple implementation of 
k-means clustering that could do hard clustering for you. It's not available 
yet, it will most probably be part of the next major release of Carrot2 (the 
package that does the clustering). Please watch this issue 
http://issues.carrot2.org/browse/CARROT-791 to get updates on this.



Just to let you know: Carrot2 3.5.0 has landed in Solr trunk and 
branch_3x, so you can use the bisecting k-means clustering algorithm 
(org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm) which will 
produce non-overlapping clusters for you. The downside of this simple 
implementation of k-means is that, for the time being, it produces one-word 
cluster labels rather than phrases as Lingo and STC.

Cheers,

S.

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Mike Sokolov

We use log4j explicitly and find it irritating to deal with the built-in 
JDK logging default.  We also have conflicts with other packages that 
have their own ideas about how to bind slf4j, so the less of this the 
better, IMO.  The 1.6.1 no-op default behavior seems a bit unfortunate 
as out-of-the-box behavior to me though. Not sure if there's anything to 
be done about that.  Can you log to stderr when there's no logger available?


-Mike

On 05/16/2011 04:43 AM, Jan Høydahl wrote:

Hi,

This poll is to investigate how you currently do or would like to do logging with Solr 
when deploying solr.war to a SEPARATE java application server (such as Tomcat, Resin etc) 
outside of the bundled solr/example. For background on how things work in 
Solr now, see http://wiki.apache.org/solr/SolrLogging and for more info on the SLF4J 
framework, see http://www.slf4j.org/manual.html

Please tick one of the options below with an [X]:

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with re-packaging 
solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!

Note that NOT bundling a logger binding with solr.war means defaulting to the 
NOP logger after outputting these lines to stderr:
SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Re: UIMA analysisEngine path

2011-05-16 Thread chamara

Hi Tommaso,
 Thanks for the quick reply. I had copied the lib files and
followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation.
However i get this error. The AnalysisEngine has the default class path
which is /org/apache/uima/desc/.

SEVERE: org.apache.solr.common.SolrException: Error Instantiating
UpdateRequestP
rocessorFactory,
org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor
y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory


Regards,
Chamara


On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] 
ml-node+2946920-843126873-399...@n3.nabble.com wrote:

 Hello,

 if you want to take the descriptor from a jar, provided that you configured

 the jar inside a lib element in solrconfig, then you just need to write
 the correct classpath in the analysisEngine element.
 For example if your descriptor resides in com/something/desc/ path inside
 the jar then you should set the analysisEngine element as
 /com/something/desc/descriptorname.xml
 If you instead need to get the descriptor from filesystem try the patch in
 SOLR-2501 [1].
 Hope this helps,
 Tommaso

 [1] :  https://issues.apache.org/jira/browse/SOLR-2501

 2011/5/13 chamara [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2946920i=0


  Hi,
   Is this code line 57 needs to be changed to the location where the jar
  files(library files) resides?
   URL url = this.getClass().getResource(location of the jar files); I
  did
  change it but no luck so far. Let me know what i am doing wrong?
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2946920.html
  To unsubscribe from UIMA analysisEngine path, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2895284code=Y2hhbWFyYXdAZ21haWwuY29tfDI4OTUyODR8MjY5ODM2NTMx.





-- 
--- Chamara 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with custom Similarity class

2011-05-16 Thread Gora Mohanty

On Mon, May 16, 2011 at 10:04 PM, Alex Grilo a...@umamao.com wrote:
 Hi,
 I'm new to Solr and I'm trying to use my custom Similarity class but I've
 not succeeded on that.

 I added some debug information and my class is loaded, but it is not used
 when queries are made.

 Does someone could help me? If any further information is relevant, I can
 provide it.
[...]

Have you overriden the default similarity class in schema.xml?
Though, if your class is getting loaded, that should be the case.

The code for the class should be pretty small, right? Please post
it here, or better yet at pastebin.com, and send a link to this list.

Regards,
Gora

Re: solr velocity.log setting

2011-05-16 Thread Yuhan Zhang

I solved the problem of velocity.log following this tutorial:

http://kris-itproblems.blogspot.com/2010/11/velocitylog-permission-denied.html

On Thu, May 12, 2011 at 6:36 PM, Yuhan Zhang yzh...@onescreen.com wrote:

 hi all,

 I'm new to solr, and trying to install it on tomcat. however, an exception
 was reached when
 the page http://localhost/sorl/browse was visited:

  *FileNotFoundException: velocity.log (Permission denied) *

 looks like solr is trying to create a velocity.log file to tomcat root. so,
 how should I set the configuration
 file on solr to change the location that velocity.log is logging to?

 Thank you.

 Y

Re: UIMA analysisEngine path

2011-05-16 Thread Tommaso Teofili

The error you pasted doesn't seem to be related to a (class)path issue but more 
likely to be related to a Solr instance at 1.4.1/3.1.0 and Solr-UIMA module at 
3.1.0/4.0-SNAPSHOT(trunk); it seems that the error raises from 
UpdateRequestProcessorFactory API changed.
Hope this helps,
Tommaso


Il giorno 16/mag/2011, alle ore 18.54, chamara ha scritto:

 Hi Tommaso,
 Thanks for the quick reply. I had copied the lib files and
 followed instructions on http://wiki.apache.org/solr/SolrUIMA#Installation.
 However i get this error. The AnalysisEngine has the default class path
 which is /org/apache/uima/desc/.
 
 SEVERE: org.apache.solr.common.SolrException: Error Instantiating
 UpdateRequestP
 rocessorFactory,
 org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactor
 y is not a org.apache.solr.update.processor.UpdateRequestProcessorFactory
 
 
 Regards,
 Chamara
 
 
 On Mon, May 16, 2011 at 9:17 AM, Tommaso Teofili [via Lucene] 
 ml-node+2946920-843126873-399...@n3.nabble.com wrote:
 
 Hello,
 
 if you want to take the descriptor from a jar, provided that you configured
 
 the jar inside a lib element in solrconfig, then you just need to write
 the correct classpath in the analysisEngine element.
 For example if your descriptor resides in com/something/desc/ path inside
 the jar then you should set the analysisEngine element as
 /com/something/desc/descriptorname.xml
 If you instead need to get the descriptor from filesystem try the patch in
 SOLR-2501 [1].
 Hope this helps,
 Tommaso
 
 [1] :  https://issues.apache.org/jira/browse/SOLR-2501
 
 2011/5/13 chamara [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2946920i=0
 
 
 Hi,
 Is this code line 57 needs to be changed to the location where the jar
 files(library files) resides?
 URL url = this.getClass().getResource(location of the jar files); I
 did
 change it but no luck so far. Let me know what i am doing wrong?
 
 --
 View this message in context:
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2935541.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 --
 If you reply to this email, your message will be added to the discussion
 below:
 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2946920.html
 To unsubscribe from UIMA analysisEngine path, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=2895284code=Y2hhbWFyYXdAZ21haWwuY29tfDI4OTUyODR8MjY5ODM2NTMx.
 
 
 
 
 
 -- 
 --- Chamara 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/UIMA-analysisEngine-path-tp2895284p2948760.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Problem indexing CSV

2011-05-16 Thread Dora Wuyts


















I’m pretty
new to Solr and I have a question about indexing data using CSV. 

I have a
Blacklight-application running on my Mac 10.6.7 and I configured the schema.xml
and solrconfig.xml in the separate Apache-Solr-directory according to the
guidelines on the Blacklight-website. I have added the RequestHandler to
solrconfig.xml as well. But when I try to index the exemplary document
books.csv (with Solr and the Blacklight script running in the background), I
get an error saying that it came across an undefined field, cat. I assume it’s
not just cat that isn’t recognised as a field. 

What should I do to make the
indexing via CSV possible, both for the exemplary document as for further
documents to follow?


Kind
regards

Re: Problem with custom Similarity class

2011-05-16 Thread Alex Grilo

The code is here: http://pastebin.com/50ugqRfA

http://pastebin.com/50ugqRfAand my schema.xml configuration entry for
similarity is:
similarity class=com.umamao.solr.ShortFieldNormSimilarity/

Thanks

Alex

On Mon, May 16, 2011 at 2:01 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Mon, May 16, 2011 at 10:04 PM, Alex Grilo a...@umamao.com wrote:
  Hi,
  I'm new to Solr and I'm trying to use my custom Similarity class but I've
  not succeeded on that.
 
  I added some debug information and my class is loaded, but it is not used
  when queries are made.
 
  Does someone could help me? If any further information is relevant, I can
  provide it.
 [...]

 Have you overriden the default similarity class in schema.xml?
 Though, if your class is getting loaded, that should be the case.

 The code for the class should be pretty small, right? Please post
 it here, or better yet at pastebin.com, and send a link to this list.

 Regards,
 Gora

Re: why query chinese character with bracket become phrase query by default?

2011-05-16 Thread Chris Hostetter


: Does anyone disagree that Yonik's commit was inappropriate?  This is
: not how we work at Apache.

FWIW: I don't see how Yonik's commit was inappropriate at all

He added some new example configuration to trunk that was unused, and in 
no way un-did or blocked any other attempts at improving the configs.

It had no impact on any existing usage, and only served as an example 
(which could be iterated forward)

I seriously don't see the problem here.

-Hoss

RE: How to index and query C# as whole term?

2011-05-16 Thread Robert Petersen

Sorry I am also using a synonyms.txt for this in the analysis stack.  I
was not clear, sorry for any confusion.  I am not doing it outside of
Solr but on the way into the index it is converted...  :)

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Monday, May 16, 2011 8:51 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index and query C# as whole term?

Before indexing so outside Solr? Using the SynonymFilter would be easier
i 
guess.

On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
 I have always just converted terms like 'C#' or 'C++' into 'csharp'
and
 'cplusplus' before indexing them and similarly converted those terms
if
 someone searched on them.  That always has worked just fine for me...

 :)

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Monday, May 16, 2011 8:28 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index and query C# as whole term?

 I don't think you'd want to use the string type here. String type is
 almost never appropriate for a field you want to actually search on
(it
 is appropriate for fields to facet on).

 But you may want to use Text type with different analyzers selected.
 You probably want Text type so the value is still split into different
 tokens on word boundaries; you just don't want an analyzer set that
 removes punctuation.

 On 5/16/2011 10:46 AM, Gora Mohanty wrote:
  On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com
wrote:
  Hi,

  I'm using Apache Solr v3.1.

  How do I configure/allow Solr to both index and query the term c#

 as a

  whole word/term?  From Analysis page, I could see that the term

 c# is

  being reduced/converted into just c by

 solr.WordDelimiterFilterFactory.

  [...]

  Yes, as you have discovered the analyzers for the field type in
  question will affect the values indexed.

  To index c# exactly as is, you can use the string type, instead
  of the text type. However, what you probably want some filters
  to be applied, e.g., LowerCaseFilterFactory. Take a look at the
  definition of the fieldType text in schema.xml, define a new field
  type that has only the tokenizers and analyzers that you need, and
  use that type for your field. This Wiki page should be helpful:
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

  Regards,
  Gora

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Highlighting issue with Solr 3.1

2011-05-16 Thread Nemani, Raj

All,

 

I have just installed Solr 3.1 running on Tomcat 7.  I am noticing a possible 
issue with Highlighting.  I have a filed in my index called story.  The solr 
document that I am testing with the data in the story field starts with the 
following snippet (remaining data in the field is not shown to keep things 
simple)

 

pa idref=0 //ppEN AMÉRICA LATINA, 

 

When I search for america with the highlighting enabled on the story' field, 
here is what I get in my highlighting section of the response.  I am using 
the ASCIIFoldingFilterFactory to make my searches accent insensitive.  

 

lst name=highlightinglst name=2011_May_13_ _1c77033aarr 
name=storystrlt;pgt;lt;a idref=quot;0quot; /gt;lt;/pgt;lt;pgt;EN 
emAM#201;RICA/em LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA 
PROTECCI#211;N/str/arr/lst.  The problem is the encode html tags before 
the em showing up as raw html tags (because of the encoding) on my search 
results page.  Just to make sure, I do want the html to be interpreted as html 
not as text.  In this particular situation I am not worried about the dangers 
of allowing such behavior.

 

The same test performed on the same data running on 1.4.1 index does not 
exhibit this behavior.

 

Any help is appreciated.  Please let me know if I need to post my field type 
definitions (index and query) from the SolrConfig.xml for the story field.

 

Thanks in advance

 

Raj

indexing directed graph

2011-05-16 Thread dani.b.angelov

Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing directed graph

2011-05-16 Thread Stefan Matheis


Dani,

i'm actually playing with Neo4j .. and the have a Lucene-Indexing and 
plan to have Solr-Integration (no idea what the current state is).


http://lists.neo4j.org/pipermail/user/2010-January/002372.html

Regards
Stefan

Am 16.05.2011 21:50, schrieb dani.b.angelov:

Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Stephen Duncan Jr

[ ]  I always use the JDK logging as bundled in solr.war, that's perfect
[ ]  I sometimes use log4j or another framework and am happy with
re-packaging solr.war
[X]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at
deploy time
[ ]  Let me choose whether to bundle a binding or not at build time, using
an ANT option
[ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
[ ]  What? Solr can do logging? How cool!


Actually, more specifically, the build distribution could build a war done
either way, but I'd most like to see the war file WITHOUT a binding be
deployed to Maven.

As it stands, I've done both 1) deploy solr without logging to Maven and use
it, and 2) deploy solr with jdk logging to Maven, then have a Maven build
repackage to remove jdk and use my preferred implementation (logback).  I've
only done 2) at the preference of others who don't want me to deploy a
modified war to our Maven repo.

Stephen Duncan Jr
www.stephenduncanjr.com

Re: indexing directed graph

2011-05-16 Thread dani.b.angelov

Thank you Gora,

1. Could you confirm, that the context of  IMHO is 'In My Humble Opinion'.
2. Could you point example of graph database.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949734.html
Sent from the Solr - User mailing list archive at Nabble.com.

indexing directed graph

2011-05-16 Thread dani.b.angelov

Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949553p2949553.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to index and query C# as whole term?

2011-05-16 Thread Erick Erickson

The other advantage to the synonyms approach is it will be much less
of a headache down the road.

For instance, imagine you've defined whitespacetokenizer and
lowercasefilter.
That'll fix your example just fine. It'll also cause all punctuation
to be included in
the tokens, so if you indexed try to find me. (note the period) and
searched for
me (without the period) you'd not get a hit.

Then, let's say you get clever and do a regex manipulation via
PatternReplaceCharFilterFactory to leave in '#' but remove other
punctuation.
Then any miscellaneous stream that contains a # will give surprising
results. Consider 15# (for 15 pounds). Won't match 15 in a search now.

So whatever solution you choose, think about it pretty carefully before
you jump G..

Best
Erick

On Mon, May 16, 2011 at 2:10 PM, Robert Petersen rober...@buy.com wrote:
 Sorry I am also using a synonyms.txt for this in the analysis stack.  I
 was not clear, sorry for any confusion.  I am not doing it outside of
 Solr but on the way into the index it is converted...  :)

 -Original Message-
 From: Markus Jelsma [mailto:markus.jel...@openindex.io]
 Sent: Monday, May 16, 2011 8:51 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index and query C# as whole term?

 Before indexing so outside Solr? Using the SynonymFilter would be easier
 i
 guess.

 On Monday 16 May 2011 17:44:24 Robert Petersen wrote:
 I have always just converted terms like 'C#' or 'C++' into 'csharp'
 and
 'cplusplus' before indexing them and similarly converted those terms
 if
 someone searched on them.  That always has worked just fine for me...

 :)

 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Monday, May 16, 2011 8:28 AM
 To: solr-user@lucene.apache.org
 Subject: Re: How to index and query C# as whole term?

 I don't think you'd want to use the string type here. String type is
 almost never appropriate for a field you want to actually search on
 (it
 is appropriate for fields to facet on).

 But you may want to use Text type with different analyzers selected.
 You probably want Text type so the value is still split into different
 tokens on word boundaries; you just don't want an analyzer set that
 removes punctuation.

 On 5/16/2011 10:46 AM, Gora Mohanty wrote:
  On Mon, May 16, 2011 at 7:05 PM, Gnanakumargna...@zoniac.com
 wrote:
  Hi,
 
  I'm using Apache Solr v3.1.
 
  How do I configure/allow Solr to both index and query the term c#

 as a

  whole word/term?  From Analysis page, I could see that the term

 c# is

  being reduced/converted into just c by

 solr.WordDelimiterFilterFactory.

  [...]
 
  Yes, as you have discovered the analyzers for the field type in
  question will affect the values indexed.
 
  To index c# exactly as is, you can use the string type, instead
  of the text type. However, what you probably want some filters
  to be applied, e.g., LowerCaseFilterFactory. Take a look at the
  definition of the fieldType text in schema.xml, define a new field
  type that has only the tokenizers and analyzers that you need, and
  use that type for your field. This Wiki page should be helpful:
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 
  Regards,
  Gora

 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350

Re: indexing directed graph

2011-05-16 Thread dani.b.angelov

I am wandering, whether the following idea is worth.
We can describe the graph with series of triples. So can we create some bean
with fields, for example:
...
@Field
String[] sybjects;
@Field
String[] predicates;
@Field
String[] objects;
@Field
int[] level;
...
or other combination of metadata.
We can index/search this bean. Based on the content of the found bean, we
can conclude for interconnections between graph participants.
What do you thing?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949845.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing directed graph

2011-05-16 Thread Jonathan Rochkind

You can certainly index it, the problem will be being able to make the 
kinds of queries you want to make on it once indexed. Indexing it in a 
way that will let you do such queries.


The kind of typical queries I'd imagine you wanting to run on such a 
graph -- I can't think of any way to index in Solr to support. But if 
you give examples of the sorts of queries you want to run, maybe someone 
else has an idea, or can give a definitive 'no'.


On 5/16/2011 3:49 PM, dani.b.angelov wrote:

Hello,
is it possible to index graph - named vertices and named edges? My target
is, with text search to find whether particular node is connected(direct or
indirect) with another. Thank you.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949553p2949553.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing directed graph

2011-05-16 Thread Jonathan Rochkind

Absolutely you can index each point or line of the graph with it's own 
document in Solr, perhaps as a triple. (Sounds like you are specifically 
talking about RDF-type data, huh?  Asking about that specifically might 
get you more useful ideas than asking graphs in general).


But if you want to then figure out if two points are connected, or get 
the list of all points within X distance from a known point, or do other 
things you are likely to want to do it with it... Solr's not going to 
give you the tools to do that, indexed like that.


On 5/16/2011 4:52 PM, dani.b.angelov wrote:

I am wandering, whether the following idea is worth.
We can describe the graph with series of triples. So can we create some bean
with fields, for example:
...
@Field
String[] sybjects;
@Field
String[] predicates;
@Field
String[] objects;
@Field
int[] level;
...
or other combination of metadata.
We can index/search this bean. Based on the content of the found bean, we
can conclude for interconnections between graph participants.
What do you thing?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-directed-graph-tp2949556p2949845.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How many UpdateHandlers can a Solr config have?

2011-05-16 Thread Chris Hostetter


: just a very basic question, but I haven't been able to find the answer in
: the Solr wiki: how many updateHandlers can one Solr config have? Just one?
: Or many?

There can only be one updateHandler / declaration in solrconfig.xml, 
it's 
responsible for owning updates to the index.

But there can be any number of requestHandler / declarations to 
configure request handlers that do updates, as well as any number of 
updateRequestProcessorChain / declarations that can identify the 
processors used for dealing with updates (which cna be refered to by name 
from the request handlers)


-Hoss

Re: K-Stemmer for Solr 3.1

2011-05-16 Thread Smiley, David W.

Lucid's KStemmer is LGPL and the Solr committers have shown that they don't 
want LGPL libraries shipping with Solr. If you are intent on releasing your 
changes, I suggest attaching both the modified source and the compiled jar onto 
Solr's k-stemmer wiki page; and of course say that it's LGPL licensed.

~ David Smiley

On May 16, 2011, at 2:24 AM, Bernd Fehling wrote:

 I don't know if it is allowed to modify Lucid code and add it to jira.
 If someone from Lucid would give me the permission and the Solr developers
 have nothing against it I won't mind adding the Lucid KStemmer to jira
 for Solr 3.x and 4.x.
 
 There are several Lucid KStemmer users which I can see from the many requests
 which I got. Also the Lucid KStemmer is faster than the standard KStemmer.
 
 Bernd
 
 Am 16.05.2011 06:33, schrieb Bill Bell:
 Did you upload the code to Jira?
 
 On 5/13/11 12:28 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de
 wrote:
 
 I backported a Lucid KStemmer version from solr 4.0 which I found
 somewhere.
 Just changed from
 import org.apache.lucene.analysis.util.CharArraySet;  // solr4.0
 to
 import org.apache.lucene.analysis.CharArraySet;  // solr3.1
 
 Bernd
 
 
 Am 12.05.2011 16:32, schrieb Mark:
 java.lang.AbstractMethodError:
 org.apache.lucene.analysis.TokenStream.incrementToken()Z
 
 Would you mind explaining your modifications? Thanks
 
 On 5/11/11 11:14 PM, Bernd Fehling wrote:
 
 Am 12.05.2011 02:05, schrieb Mark:
 It appears that the older version of the Lucid Works KStemmer is
 incompatible with Solr 3.1. Has anyone been able to get this to work?
 If not,
 what are you using as an alternative?
 
 Thanks
 
 Lucid KStemmer works nice with Solr3.1 after some minor mods to
 KStemFilter.java and KStemFilterFactory.java.
 What problems do you have?
 
 Bernd
 
 
 
 -- 
 *
 Bernd FehlingUniversitätsbibliothek Bielefeld
 Dipl.-Inform. (FH)Universitätsstr. 25
 Tel. +49 521 106-4060   Fax. +49 521 106-4052
 bernd.fehl...@uni-bielefeld.de33615 Bielefeld
 
 BASE - Bielefeld Academic Search Engine - www.base-search.net
 *

Re: Problem with custom Similarity class

2011-05-16 Thread Chris Hostetter


: The code is here: http://pastebin.com/50ugqRfA
: 
: http://pastebin.com/50ugqRfAand my schema.xml configuration entry for
: similarity is:
: similarity class=com.umamao.solr.ShortFieldNormSimilarity/

exactly what version of Solr are you using?

what does the full field/fieldType declaration look like in your 
schema.xml for the filed you are testing with?

what does your exactl query request look like? 

The trunk branch of lucene/solr has made some changes to how similarity 
works (it's now very much per field) and how you declare your similarity 
in schema.xml ... if i remember correctly, the syntax from 3.1 to declare 
a global similarity *should* still work in 4.x as a way to declare the 
default used by fields that don't define a similarity, but there may be 
a bug (or i may be remembering incorrectly ... if the syntax really is no 
longer used at all then we should make sure it logs a nice fat error on 
startup)

:   I added some debug information and my class is loaded, but it is not used
:   when queries are made.

Please clarify exactly how you are testing this and what you mean by is 
not used when queries are made ... it's important to rule out the 
possibility that you are just missunderstanding how the similarity methods 
are used.


-Hoss

RE: K-Stemmer for Solr 3.1

2011-05-16 Thread Steven A Rowe

On 5/16/2011 at 5:33 PM, David W. Smiley wrote:
 Lucid's KStemmer is LGPL and the Solr committers have shown that they
 don't want LGPL libraries shipping with Solr. If you are intent on
 releasing your changes, I suggest attaching both the modified source and
 the compiled jar onto Solr's k-stemmer wiki page; and of course say that
 it's LGPL licensed.

AFAICT, all Apache MoinMoin wikis (at least Lucene's and Solr's) have disabled 
attachments - you can't retrieve existing attachments, and you can't create new 
ones.  (Spam, apparently, was the impetus for this change.)

Steve

Re: K-Stemmer for Solr 3.1

2011-05-16 Thread Robert Muir

On Mon, May 16, 2011 at 5:33 PM, Smiley, David W. dsmi...@mitre.org wrote:
 Lucid's KStemmer is LGPL and the Solr committers have shown that they don't 
 want LGPL libraries shipping with Solr. If you are intent on releasing your 
 changes, I suggest attaching both the modified source and the compiled jar 
 onto Solr's k-stemmer wiki page; and of course say that it's LGPL licensed.

 ~ David Smiley

Hi David, I don't know much about this stemmer but the original
implementation is BSD-licensed
(http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi)

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Chris Hostetter


: This poll is to investigate how you currently do or would like to do 
: logging with Solr when deploying solr.war to a SEPARATE java application 
: server (such as Tomcat, Resin etc) outside of the bundled 

FWIW...

a) the context of this poll is SOLR-2487

b) this poll seems flawed to me, as it completely sidesteps what i 
consider the major crux of the issue: 


   If: You are someone who does not like (or has conflicts with) 
   the JDK logging binding currently included in the solr.war 
   that is built by default and included in the binary releases; 
 Then: Do you consider building solr.war from source difficult?


-Hoss

Re: [POLL] How do you (like to) do logging with Solr

2011-05-16 Thread Chris Hostetter


My answers...

: [X]  I always use the JDK logging as bundled in solr.war, that's perfect
: [X]  I sometimes use log4j or another framework and am happy with 
re-packaging solr.war
: [ ]  Give me solr.war WITHOUT an slf4j logger binding, so I can choose at 
deploy time
: [X]  Let me choose whether to bundle a binding or not at build time, using an 
ANT option
: [ ]  What's wrong with the solr/example Jetty? I never run Solr elsewhere!
: [ ]  What? Solr can do logging? How cool!


-Hoss

Re: Boost newer documents only if date is different from timestamp

2011-05-16 Thread Chris Hostetter


The map function lets you replace an arbitrary range of values with a 
new value, so you could map any value greater then the ms that today 
started on to any other point in history...

http://wiki.apache.org/solr/FunctionQuery#map

An easier approach would be probably be to apply some logic at index time: 
you can still index the the Last-Modified date you are getting, 
but if you believe that date is artificial, you can index an alternate 
date (possibly based on some rules you know about the site, or reuse the 
first' last modified date you ever got for that URL, etc...) in a 
distinct field and use that value for date boosting.

: I am trying to boost newer documents in Solr queries. The ms function
: 
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
: seems to be the right way to go, but I need to add an additional
: condition:
: I am using the last-Modified-Date from crawled web pages as the date
: to consider, and that does not always provide a meaningful date.
: Therefore I would like the function to only boost documents where the
: date (not time) found in the last-Modified-Date is different from the
: timestamp, eliminating results that just return the current date as
: the last-Modified-Date. Suggestions are appreciated!
: 

-Hoss

Re: Embedded Solr Optimize under Windows

2011-05-16 Thread Chris Hostetter


: http://code.google.com/p/solr-geonames/wiki/DeveloperInstall
: It's worth noting that the build has also been run on Mac and Solaris now,
: and the Solr index is about half the size. We suspect the optimize() call in
: Embedded Solr is not working correctly under Windows.
: 
: We've observed that Windows leaves lots of segments on disk and takes up
: twice the volume as the other OSs. Perhaps file locking or something

The problem isn't that optimize doesn't work on windows, the problem is 
that windows file semantics won't let files be deleted while there are 
open file handles -- so Lucene's Directory behavior is to leave the files 
on disk, and try to clean them up later.  (on the next write, or next 
optimize call)


-Hoss

Re: Embedded Solr Optimize under Windows

2011-05-16 Thread Greg Pendlebury

Thanks for the reply. I'm at home right now, or I'd try this myself, but is
the suggestion that two optimize() calls in a row would resolve the issue?
The process in question is a JVM devoted entirely to harvesting, calls
optimize() then shuts down.

The least processor intensive way of triggering this behaviour is
desirable... perhaps a commit()? But I wouldn't have expected that to
trigger a write.

On 17 May 2011 10:20, Chris Hostetter hossman_luc...@fucit.org wrote:


 : http://code.google.com/p/solr-geonames/wiki/DeveloperInstall
 : It's worth noting that the build has also been run on Mac and Solaris
 now,
 : and the Solr index is about half the size. We suspect the optimize() call
 in
 : Embedded Solr is not working correctly under Windows.
 :
 : We've observed that Windows leaves lots of segments on disk and takes up
 : twice the volume as the other OSs. Perhaps file locking or something

 The problem isn't that optimize doesn't work on windows, the problem is
 that windows file semantics won't let files be deleted while there are
 open file handles -- so Lucene's Directory behavior is to leave the files
 on disk, and try to clean them up later.  (on the next write, or next
 optimize call)


 -Hoss

Re: Highlighting issue with Solr 3.1

2011-05-16 Thread Koji Sekiguchi


(11/05/17 3:27), Nemani, Raj wrote:

All,



I have just installed Solr 3.1 running on Tomcat 7.  I am noticing a possible issue with 
Highlighting.  I have a filed in my index called story.  The solr document 
that I am testing with the data in the story field starts with the following snippet 
(remaining data in the field is not shown to keep things simple)



pa idref=0 //ppEN AMÉRICA LATINA,



When I search for america with the highlighting enabled on the story' field, here is what I get 
in my highlighting section of the response.  I am using the ASCIIFoldingFilterFactory to 
make my searches accent insensitive.



lst name=highlightinglst name=2011_May_13_ _1c77033aarr name=storystrlt;pgt;lt;a idref=quot;0quot; 
/gt;lt;/pgt;lt;pgt;ENemAM#201;RICA/em  LATINA, SE HAN PRODUCIDO AVANCES, CON RESPECTO A LA PROTECCI#211;N/str/arr/lst.  The problem is the 
encode html tags before theem  showing up as raw html tags (because of the encoding) on my search results page.  Just to make sure, I do want the html to be interpreted as html not as text.  In this particular 
situation I am not worried about the dangers of allowing such behavior.



The same test performed on the same data running on 1.4.1 index does not 
exhibit this behavior.



Any help is appreciated.  Please let me know if I need to post my field type definitions 
(index and query) from the SolrConfig.xml for the story field.



Thanks in advance



Raj




I bet you have an encoder setting in your solrconfig.xml:

encoder name=html
 default=true
 class=solr.highlight.HtmlEncoder /

If so, try to comment it out.

Koji
--
http://www.rondhuit.com/en/

Structured fields and termVectors

2011-05-16 Thread Jack Repenning

How does MoreLikeThis use termVectors?

My documents (full sample at the bottom) frequently include lines more or less 
like this

   M /trunk/home/.Aquamacs/Preferences.el

I want to MoreLikeThis based on the full path, but not the M. But what I 
actually display as a search result should include M (should look pretty much 
like the sample, below).

If I define a field to include that whole line, I can certainly search in ways 
that skip the M, but how do I control the termVector and MoreLikeThis?  I 
think the answer is not to termVector the line as shown, but rather to index 
these lines twice, once whole (which is also copyFielded into the display 
text), and a second time with just the path (and termVectors=true). Which is 
OK, but since these lines will represent most of my data, double-indexing seems 
to double my storage, which is ... oh, well ... not entirely optimal.

So is there some way I can index the full line, once, with M and path, and 
tell the termVector to include the whole path and nothing but the path?



-==-
Jack Repenning
Technologist
Codesion Business Unit
CollabNet, Inc.
8000 Marina Boulevard, Suite 600
Brisbane, California 94005
office: +1 650.228.2562
twitter: http://twitter.com/jrep




r3580 | jack | 2011-04-26 13:55:46 -0700 (Tue, 26 Apr 2011) | 1 line
Changed paths:
   M /trunk/home/.Aquamacs
   M /trunk/home/.Aquamacs/Preferences.el
   M /trunk/www/wynton-start-page.html

simplify the hijack of Aquamacs prefs storage, aufl




PGP.sig
Description: This is a digitally signed message part

Re: How to set a common field to several values types ?

2011-05-16 Thread habogay

I want create  field from extract value from another field with some java
code ( using regular expressions ) . How to make  this ? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-set-a-common-field-to-several-values-types-tp2922192p2951036.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Want to Delete Existing Index create fresh index

2011-05-16 Thread Pawan Darira

I set the datadir in solrconfig.xml. actually m using core based structures.
is it creating any problem



On Sat, May 14, 2011 at 10:49 PM, Gabriele Kahlout gabri...@mysimpatico.com
 wrote:

 I guess you are having issues with the datadir. Did you set the datadir in
 solrconfig.xml?

 On Sat, May 14, 2011 at 4:10 PM, Pawan Darira pawan.dar...@gmail.com
 wrote:

  Hi
 
  I am using Solr 1.4.  had changed schema already. When i created the
 index
  for first time, the directory was automatically created  index made
  perfectly fine.
 
  Now, i want to create the index from scratch, so I deleted the whole
  data/index directory  ran the script. Now it is only creating empty
  directories  NO index files inside that.
 
  Thanks
  Pawan
 
 
  On Sat, May 14, 2011 at 6:54 PM, Dmitry Kan dmitry@gmail.com
 wrote:
 
   Hi Pawan,
  
   Which SOLR version do you have installed?
  
   It should be absolutely normal for the data/ sub directory to create
 when
   starting up SOLR.
  
   So just go ahead and post your data into SOLR, if you have changed the
   schema already.
  
   --
   Regards,
  
   Dmitry Kan
  
   On Sat, May 14, 2011 at 4:01 PM, Pawan Darira pawan.dar...@gmail.com
   wrote:
  
I did that. Index directory is created but not contents in that
   
2011/5/14 François Schiettecatte fschietteca...@gmail.com
   
 You can also shut down solr/lucene, do:

rm -rf /YourIndexName/data/index

 and restart, the index directory will be automatically recreated.

 François

 On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote:

  curl --fail $solrIndex/update?commit=true -d
  'deletequery*:*/query/delete' #empty index [1
  

  
 http://wiki.apache.org/nutch/Whole-Web%20Crawling%20incremental%20script
]
 
  did u try?
 
 
  On Sat, May 14, 2011 at 7:26 AM, Pawan Darira 
   pawan.dar...@gmail.com
 wrote:
 
  Hi
 
  I had an existing index created months back. now my database
  schema
has
  changed. i wanted to delete the current data/index directory 
re-create
  the
  fresh index
 
  but it is saying that segments file not found  just create
  blank
  data/index directory. Please help
 
  --
  Thanks,
  Pawan Darira
 
 
 
 
  --
  Regards,
  K. Gabriele
 
  --- unchanged since 20/9/10 ---
  P.S. If the subject contains [LON] or the addressee
 acknowledges
   the
  receipt within 48 hours then I don't resend the email.
  subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x,
 this)
  ∧
 time(x)
   Now + 48h) ⇒ ¬resend(I, this).
 
  If an email is sent by a sender that is not a trusted contact or
  the
 email
  does not contain a valid code then the email is not received. A
  valid
 code
  starts with a hyphen and ends with X.
  ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈
 subject(x)
  ∧
   y
∈
  L(-[a-z]+[0-9]X)).


   
   
--
Thanks,
Pawan Darira
   
  
 
 
 
  --
  Thanks,
  Pawan Darira
 



 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧
 time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).




-- 
Thanks,
Pawan Darira

error while doing full import

2011-05-16 Thread deniz

org.apache.solr.handler.dataimport.DataImportHandlerException: Parsing failed
for xml, url:http://xxx.xxx.xxx/frontend_dev.php/xxx/xxx/xxx rows
processed:0 Processing Document # 1 at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.RuntimeException:
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp at
[row,col {unknown-source}]: [170,29] at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:181)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared
general entity nbsp at [row,col {unknown-source}]: [170,29] at
com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467) at
com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamReader.java:5431)
at
com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.java:1661)
at com.ctc.wstx.sr.StreamScanner.expandEntity(StreamScanner.java:1555) at
com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1523) at
com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2757)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019) at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.handleStartElement(XPathRecordReader.java:370)
at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:304)
at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.access$200(XPathRecordReader.java:196)
at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:178)
... 11 more May 17, 2011 10:51:51 AM
org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full
Import failed org.apache.solr.handler.dataimport.DataImportHandlerException:
Parsing failed for xml, url:http://xxx.xxx.xxx/frontend_dev.php/xxx/xxx/xxx
rows processed:0 Processing Document # 1 at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:292)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:187)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:164)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.lang.RuntimeException:
com.ctc.wstx.exc.WstxParsingException: Undeclared general entity nbsp at
[row,col {unknown-source}]: [170,29] at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:181)
at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:282)
... 10 more Caused by: com.ctc.wstx.exc.WstxParsingException: Undeclared
general entity nbsp at [row,col {unknown-source}]: [170,29] at
com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:630)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:467) at
com.ctc.wstx.sr.BasicStreamReader.handleUndeclaredEntity(BasicStreamReader.java:5431)
at
com.ctc.wstx.sr.StreamScanner.expandUnresolvedEntity(StreamScanner.java:1661)
at

86 matches

Mail list logo