locks in solr

2009-11-25 Thread Rakhi Khatwani
Hi,
  Is there any article which explains the locks in solr??
there is some info on solrconfig.txt which says that you can set the lock
type to none(NoLockFactory), single(SingleInstanceLockFactory),
NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we
create a new file.
suppose my index dir has the following files:
_2s.fdt, _2t.fnm, _2u.nrm, _2v.tii, _2x.fdt, _2y.fnm, _2z.nrm, _30.tii,
_2s.fdx, _2t.frq, _2u.prx, _2v.tis _2x.fdx, _2y.frq _2z.prx, _30.tis,
_2s.fnm, _2t.nrm, _2u.tii, _2w.fdt _2x.fnm, _2y.nrm _2z.tii, segments_2s,
_2s.frq, _2t.prx, _2u.tis, _2w.fdx _2x.frq, _2y.prx _2z.tis, segments.gen

1.)   I assume for each of these files there is a lock. please correct me if
i am wrong.
2.) what are the different lock types in terms of read/write/updates?
3.) Can we have a document level locking scheme?
4.) we would like to know the best way to handle multiple simulataneous
writes to the index
Thanks a ton,
Raakhi


Multicore - Post xml to core0, core1 or core2

2009-11-25 Thread Jörg Agatz
Hallo, at the moment i tryed to create a Solr instance wite more then one
Cores

I use solr 1.4 and multicore Runs :-)
But i dont know how i post a XML in one of my cores. At the Moment i use

java -jar post.jar *.xml

now i will fill the core0 index with core0*.xml , and core1 with core1*.xml
But how?
in the wiki i cant find anythink about that.

King


Re: Multicore - Post xml to core0, core1 or core2

2009-11-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
try this
java -Durl=http://localhost:8983/solr/core0/update -jar post.jar *.xml


On Wed, Nov 25, 2009 at 3:23 PM, Jörg Agatz joerg.ag...@googlemail.com wrote:
 Hallo, at the moment i tryed to create a Solr instance wite more then one
 Cores

 I use solr 1.4 and multicore Runs :-)
 But i dont know how i post a XML in one of my cores. At the Moment i use

 java -jar post.jar *.xml

 now i will fill the core0 index with core0*.xml , and core1 with core1*.xml
 But how?
 in the wiki i cant find anythink about that.

 King




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Multicore - Post xml to core0, core1 or core2

2009-11-25 Thread Jörg Agatz
Thanks, it works realy fine..

Maby you have an Ideo, to search in Core0 and Core1

I want to search in all cores, or only in 2 of 3 cores.


Sending Tika parse result to Solr

2009-11-25 Thread Daniel Knapp
Hello,


i want to send the Tika parse results of my data to my Solr-Server.
My File-Server is not my Solr-Server, so Solr Cell is no option for me.

In Lucene i can pass my Reader Object (as an result of the parsing) to a Lucene 
Document for indexing.

Is this also possible with Solr? Or is there an other or better way to do this?
I'm using SolrJ for the connection.


Regards,
Daniel 

smime.p7s
Description: S/MIME cryptographic signature


Buggy search Solr1.4 Multicore

2009-11-25 Thread Jörg Agatz
Hi...

I have a Problem with Solr, I try it with 3 cores, and it starts. I can
search but i only become results, when i exaktly search for the howls field.

i mean, in the field stand: Dell Widescreen Ultra

when i search for
name:Widescreen i get Nothing
name:Dell Widescreen Ultra i get the file
name:Dell* i get the file

now i creat copyfiels and search only for Dell* and get it, bit
Widescreen Nothing

What is wrong with the index?

I will search each word in each field!

Pleas Help me


Re: Buggy search Solr1.4 Multicore

2009-11-25 Thread Rafał Kuć
Hello!

 Hi...

 I have a Problem with Solr, I try it with 3 cores, and it starts. I can
 search but i only become results, when i exaktly search for the howls field.

 i mean, in the field stand: Dell Widescreen Ultra

 when i search for
 name:Widescreen i get Nothing
 name:Dell Widescreen Ultra i get the file
 name:Dell* i get the file

 now i creat copyfiels and search only for Dell* and get it, bit
 Widescreen Nothing

 What is wrong with the index?

 I will search each word in each field!

 Pleas Help me

I assume your name field type is string right ? If it is right, than
change it to text, it should work as You would like.

-- 
Regards,
 Rafał Kuć



Help on this parsed query

2009-11-25 Thread revas
I have the text analyzer defined as follows


fieldType name=text class=solr.TextField positionIncrementGap=100

analyzer type=index

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory

ignoreCase=true

words=stopwords.txt

enablePositionIncrements=true

/

filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 /

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/

/analyzer

analyzer type=query

tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/

filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/

filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/

/analyzer

/fieldType



when i search on this field  name simple of the above field type , the term
peRsonal

*I expect it to search as   simple:personal   simple :pe simple:rsonal*



instead the parsed query string says

str name=*rawquerystring**simple:peRsonal*/str
 * * str name=*querystring**simple:peRsonal*/str
 * * str name=*parsedquery**MultiPhraseQuery(simple:(person pe)
rsonal)*/str
 * * str name=*parsedquery_toString**simple:(person pe) rsonal*/str


what is this multiphrase query ,why is this a phrase query istead of simple
query?


Regards
Revas


Solr 1.4 search in more the one Core

2009-11-25 Thread Jörg Agatz
Hollo,

I try to search in more than one Core.

I search in Wiki, but i dont find any way to search in 2 of the 3 cores and
a way to seacht in all cores.

maby Someone of you have tryed the same an can help me?


Re: Sending Tika parse result to Solr

2009-11-25 Thread Grant Ingersoll

On Nov 25, 2009, at 5:32 AM, Daniel Knapp wrote:

 Hello,
 
 
 i want to send the Tika parse results of my data to my Solr-Server.
 My File-Server is not my Solr-Server, so Solr Cell is no option for me.
 
 In Lucene i can pass my Reader Object (as an result of the parsing) to a 
 Lucene Document for indexing.
 
 Is this also possible with Solr? Or is there an other or better way to do 
 this?
 I'm using SolrJ for the connection.

You can't pass your reader object, but I have opened 
https://issues.apache.org/jira/browse/SOLR-1526 to provide a SolrJ client side 
equivalent of Solr Cell.  If you'd like to contribute a patch that would be 
great.   Basically, you just need to have your Handler override create a 
SolrInputDocument (batches, that is) and then send them to Solr.  Using the 
Streaming server may also fit well with this model.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search



Re: how to do partial word searches?

2009-11-25 Thread Joel Nylund

Hi Erick,

thanks for the links, I read both of them and I still have no idea  
what to do, lots of back and forth, but didn't see any solution on it.


One person talked about indexing the field in reverse and doing and ON  
on it, this might work I guess.


thanks
Joel


On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:


copying from Eric Hatcher:

See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
does not have leading wildcard support enabled.

There's a pretty extensive recent exchange on this, see the
thread on the user's list titled

leading and trailing wildcard queryBest
Erick

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com  
wrote:



Hi, I saw some older postings on this, but didnt see a resolution.

I have a field called title, I would like to be able to find  
partial word

matches within the title.

For example:

http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

I would expect it to find:
str name=textTitlethe daily dish | by andrew sullivan/str

but it doesnt, it does find sully (which is fine with me also as a  
bonus),
but doesnt seem to get any of the partial word stuff. Oddly enough  
before I
lowercased the title, the wildcard matching seemed to work a bit  
better, it

just didnt deal with the case sensitive query.

At first I had mixed case titles and I read that the wildcard  
doesn't work
with mixed case, so I created another field that is a lowered  
version of the

title called textTitle, it is of type text.

Is it possible with solr to achieve what I am trying to do, if so  
how? If

not, anything closer than what I have?

thanks
Joel






Re: Implementing phrase autopop up

2009-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 11:58 PM, darniz rnizamud...@edmunds.com wrote:



 i created a filed as same as the lucid blog says.

 field name=autocomp type=edgytext indexed=true stored=true
 omitNorms=true omitTermFreqAndPositions=true/

 with the following field configurtion

 fieldType name=edgytext class=solr.TextField
 positionIncrementGap=100
 −
 analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=25/
 /analyzer
 −
 analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType

 Now when i query i get the correct phrases for example if search for
 autocomp:how to i get all the correct phrases like

 How to find a car
 How to find a mechanic
 How to choose the right insurance company

 etc... which is good.

 Now I have two question.
 1) Is it necessary to give the query in quote. My gut feeling is yes, since
 if you dont give quote i get phrases beginning with How followed by some
 other words like How can etc...


Yes since we want to do phrase searches on n-grams



 2)if i search for word for example choose, it gives me nothing
 I was expecting to see a result considering there is a word choose in the
 phrase
 How to choose the right insurance company

 i might look more at documentation but do you have anything to advice.


EdgeNgram creates n-grams from the starting or the ending edge therefore you
can't match words in the middle of a phrase. Try using NGramFilterFactory
instead.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Buggy search Solr1.4 Multicore

2009-11-25 Thread Erick Erickson
If Rafal's response doesn't help (but it's sure where I'd
look first, it sounds like you're using a field type that's
not Tokenized), then could you post the relevant parts
of your config file that define the field and the analyzers
used at *both* query and index time?

Best
Erick

On Wed, Nov 25, 2009 at 6:35 AM, Jörg Agatz joerg.ag...@googlemail.comwrote:

 Hi...

 I have a Problem with Solr, I try it with 3 cores, and it starts. I can
 search but i only become results, when i exaktly search for the howls
 field.

 i mean, in the field stand: Dell Widescreen Ultra

 when i search for
 name:Widescreen i get Nothing
 name:Dell Widescreen Ultra i get the file
 name:Dell* i get the file

 now i creat copyfiels and search only for Dell* and get it, bit
 Widescreen Nothing

 What is wrong with the index?

 I will search each word in each field!

 Pleas Help me



Re: Help on this parsed query

2009-11-25 Thread Erick Erickson
I think because if it wasn't a phrase query you'd be matching on
the broken up parts of the word *wherever* they were in your field.
e.g. pe and rsonal could be separated by any number of other
tokens and you'd get a match.

HTH
Erick

P.S. I was a bit confused by your asterisks, it took me a while to figure
out that you'd added them by hand for emphasis and you weren't sending
wildcards through..

On Wed, Nov 25, 2009 at 6:43 AM, revas revas...@gmail.com wrote:

 I have the text analyzer defined as follows


 fieldType name=text class=solr.TextField positionIncrementGap=100

 analyzer type=index

 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.StopFilterFactory

 ignoreCase=true

 words=stopwords.txt

 enablePositionIncrements=true

 /

 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 /

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/

 filter class=solr.RemoveDuplicatesTokenFilterFactory/

 /analyzer

 analyzer type=query

 tokenizer class=solr.WhitespaceTokenizerFactory/

 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/

 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/

 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/

 filter class=solr.RemoveDuplicatesTokenFilterFactory/

 /analyzer

 /fieldType



 when i search on this field  name simple of the above field type , the term
 peRsonal

 *I expect it to search as   simple:personal   simple :pe simple:rsonal*



 instead the parsed query string says

 str name=*rawquerystring**simple:peRsonal*/str
  * * str name=*querystring**simple:peRsonal*/str
  * * str name=*parsedquery**MultiPhraseQuery(simple:(person pe)
 rsonal)*/str
  * * str name=*parsedquery_toString**simple:(person pe) rsonal*/str


 what is this multiphrase query ,why is this a phrase query istead of simple
 query?


 Regards
 Revas



Re: SolrPlugin Guidance

2009-11-25 Thread Shalin Shekhar Mangar
On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent 
laurent.vauth...@disney.com wrote:


 Our team is trying to make a Solr plugin that needs to parse/decompose a
 given query into potentially multiple queries.  The idea is that we're
 trying to abstract a complex schema (with different document types) from
 the users so that their queries can be simpler.



 So basically, we're trying to do the following:



 1.   Decompose query A into query B and query C

 2.   Send query B to all shards and plug query B's results into
 query C

 3.   Send Query C to all shards and pass the results back to the
 client



 I started trying to implement this by subclassing the SearchHandler but
 realized that I would not have access to HttpCommComponent.  Then I
 tried to replicate the SearchHandler class but realized that I might not
 have access to fields I would need in ShardResponse.  So I figured I
 should step back and get advice from the mailing list now J.  What is
 the best plugin point for decomposing a query into multiple queries so
 that all resultant queries can be sent to each shard?



All queries are sent to all shards? If yes, it sounds like a job for a
custom QParser.

-- 
Regards,
Shalin Shekhar Mangar.


Re: locks in solr

2009-11-25 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 3:05 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
  Is there any article which explains the locks in solr??
 there is some info on solrconfig.txt which says that you can set the lock
 type to none(NoLockFactory), single(SingleInstanceLockFactory),
 NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime
 we
 create a new file.
 suppose my index dir has the following files:
 _2s.fdt, _2t.fnm, _2u.nrm, _2v.tii, _2x.fdt, _2y.fnm, _2z.nrm, _30.tii,
 _2s.fdx, _2t.frq, _2u.prx, _2v.tis _2x.fdx, _2y.frq _2z.prx, _30.tis,
 _2s.fnm, _2t.nrm, _2u.tii, _2w.fdt _2x.fnm, _2y.nrm _2z.tii, segments_2s,
 _2s.frq, _2t.prx, _2u.tis, _2w.fdx _2x.frq, _2y.prx _2z.tis, segments.gen

 1.)   I assume for each of these files there is a lock. please correct me
 if
 i am wrong.


No. The index directory has one lock. Individual files are not locked
separately.


 2.) what are the different lock types in terms of read/write/updates?


Locks are only used for preventing more than one IndexWriters (or Solr
instances/cores) writing to the same index. They do not prevent reads. They
also do not prevent multiple writes from the same Solr core (there is some
synchronization but it has nothing to do with these locks)


 3.) Can we have a document level locking scheme?


No. I think you have grossly misunderstood the purpose of locks in Solr.


 4.) we would like to know the best way to handle multiple simulataneous
 writes to the index


With one Solr instance, you can do writes concurrently without a problem.

-- 
Regards,
Shalin Shekhar Mangar.


Re: why is XMLWriter declared as final?

2009-11-25 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell goodie...@gmail.com wrote:

 Is there any reason the XMLWriter is declared as final? I'd like to extend
 it for a special case but can't. The other writers (ruby, php, json) are
 not
 final.


I don't think it needs to be final. Maybe it is final because it wasn't
designed to be extensible. Please open a jira issue.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Shalin Shekhar Mangar
On Wed, Nov 25, 2009 at 5:39 PM, Jörg Agatz joerg.ag...@googlemail.comwrote:

 Hollo,

 I try to search in more than one Core.

 I search in Wiki, but i dont find any way to search in 2 of the 3 cores and
 a way to seacht in all cores.

 maby Someone of you have tryed the same an can help me?


You need to provide urls of the cores in the distributed search request. It
will make HTTP calls to the specified cores but there is no way around that
right now.

http://wiki.apache.org/solr/DistributedSearch

Why do you want to search across cores on the same Solr?

-- 
Regards,
Shalin Shekhar Mangar.


Re: ExternalRequestHandler and ContentStreamUpdateRequest usage

2009-11-25 Thread javaxmlsoapdev

Grant, can you assist. I am going clueless as to why its not indexing content
of the file. I have provided schema, code info below/previous threads. do I
need to explicitly add param(content, ') into ContentStreamUpdateRequest?
which I don't think is the right thing to do. Please advie.

let me know if you need anything else. Appreciate your help.

Thanks,

javaxmlsoapdev wrote:
 
 Following is luke response. lst name=fields / is empty. can someone
 assist to find out why file content isn't being index?
 
   ?xml version=1.0 encoding=UTF-8 ? 
  response
  lst name=responseHeader
   int name=status0/int 
   int name=QTime0/int 
   /lst
  lst name=index
   int name=numDocs0/int 
   int name=maxDoc0/int 
   int name=numTerms0/int 
   long name=version1259085661332/long 
   bool name=optimizedfalse/bool 
   bool name=currenttrue/bool 
   bool name=hasDeletionsfalse/bool 
   str
 name=directoryorg.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index/str
  
   date name=lastModified2009-11-24T18:01:01Z/date 
   /lst
   lst name=fields / 
  lst name=info
  lst name=key
   str name=IIndexed/str 
   str name=TTokenized/str 
   str name=SStored/str 
   str name=MMultivalued/str 
   str name=VTermVector Stored/str 
   str name=oStore Offset With TermVector/str 
   str name=pStore Position With TermVector/str 
   str name=OOmit Norms/str 
   str name=LLazy/str 
   str name=BBinary/str 
   str name=CCompressed/str 
   str name=fSort Missing First/str 
   str name=lSort Missing Last/str 
   /lst
   str name=NOTEDocument Frequency (df) is not updated when a document
 is marked for deletion. df values include deleted documents./str 
   /lst
   /response
 
 javaxmlsoapdev wrote:
 
 I was able to configure /docs index separately from my db data index.
 
 still I am seeing same behavior where it only puts .docName  its size in
 the content field (I have renamed field to content in this new
 schema)
 
 below are the only two fields I have in schema.xml
 field name=key type=slong indexed=true stored=true
 required=true / 
 field name=content type=text indexed=true stored=true
 multiValued=true/  
 
 Following is updated code from test case
 
 File fileToIndex = new File(file.txt);
 
 ContentStreamUpdateRequest up = new
 ContentStreamUpdateRequest(/update/extract);
 up.addFile(fileToIndex);
 up.setParam(literal.key, 8978);
 up.setParam(literal.docName, doc123.txt);
 up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
 NamedList list = server.request(up);
 assertNotNull(Couldn't upload .txt,list);
  
 QueryResponse rsp = server.query( new SolrQuery( *:*) );
 assertEquals( 1, rsp.getResults().getNumFound() );
 System.out.println(rsp.getResults().get(0).getFieldValue(content));
 
 Also from solr admin UI when I search for doc123.txt then only it
 returns me following response. not sure why its not indexing file's
 content into content attribute.
 - result name=response numFound=1 start=0
 - doc
 - arr name=content
   str702/str 
   strtext/plain/str 
   strdoc123.txt/str 
   str / 
   /arr
   long name=key8978/long 
   /doc
   /result
 
 Any idea?
 
 Thanks,
 
 
 javaxmlsoapdev wrote:
 
 http://machinename:port/solr/admin/luke gives me 404 error so seems like
 its not able to find luke.
 
 I am reusing schema, which is used for indexing other entity from
 database, which has no relevance to documents. that was my next question
 that what do I put in, in a schema if my documents don't need any column
 mappings or anything. plus I want to keep file documents index
 separately from database entity index. what's the best way to do this?
 If I don't have any db columns etc to map and file documents index
 should leave separate from db entity index, what's the best way to
 achieve this.
 
 thanks,
 
 
 
 Grant Ingersoll-6 wrote:
 
 
 On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote:
 
 
 *:* returns me 1 count but when I search for specific word (which was
 part of
 .txt file I indexed before) it doesn't return me anything. I don't
 have luke
 setup on my end.
 
 http://localhost:8983/solr/admin/luke should give yo some info.
 
 
 let me see if I can set that up quickly but otherwise do
 you see anything I am missing in solrconfig mapping or something?
 
 What's your schema look like and how are you querying?
 
 which maps
 document content to wrong attribute?
 
 thanks,
 
 Grant Ingersoll-6 wrote:
 
 
 On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote:
 
 
 Following code is from my test case where it tries to index a file
 (of
 type
 .txt)
 ContentStreamUpdateRequest up = new
 ContentStreamUpdateRequest(/update/extract);
 up.addFile(fileToIndex);
 up.setParam(literal.key, 8978); //key is the uniqueId
 up.setParam(ext.literal.docName, doc123.txt);
 up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);  
 server.request(up); 
 
 test case doesn't give me any error and I think its indexing the
 file?
 but
 when I search for a 

Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Jörg Agatz
 Why do you want to search across cores on the same Solr?

 --
 Regards,
 Shalin Shekhar Mangar.


I only need Multiindexing, but i find no other way to import other indizes.

I have some old indexes from a other Projekt, and will us this in Solr. i i
use one index it works, but i have a lot of index, so i need to find a way
search in more then one index, so more then one Cores


Re: how to do partial word searches?

2009-11-25 Thread Erick Erickson
Confession: I haven't had occasion to use the ngram thingy, but here's the
theory
And note that SOLR has n-gram tokenizers available..

Using a 2-gram example for sullivan, the n-gram would index these tokens...
su, ul, ll, li, iv, va, an. Then at query time in your example, sulli would
be
broken up into su, ul, ll and li. Which, when searched as a phrase
would turn match your field.

The expense, of course is that your index is larger (but surprisingly not as
much as you'd think). But your queries are much faster.

That's the theory anyway, the practice is left as an exercise for the
readerG

But the folks generously provided quite an explication of what wildcards
are
all about on the *lucene* user's list, look for a thread titled
I just don't get wildcards at all from around 2006. It's a nice background
for
what the underlying problem is, some of the SOLR tokenizers are realizing
some of this I think. And the state of the art has progressed considerably
since then, but the underlying issues are still there...

Sorry I can't be more help here..
Erick

On Wed, Nov 25, 2009 at 8:18 AM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi Erick,

 thanks for the links, I read both of them and I still have no idea what to
 do, lots of back and forth, but didn't see any solution on it.

 One person talked about indexing the field in reverse and doing and ON on
 it, this might work I guess.

 thanks
 Joel



 On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote:

  copying from Eric Hatcher:

 See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently
 does not have leading wildcard support enabled.

 There's a pretty extensive recent exchange on this, see the
 thread on the user's list titled

 leading and trailing wildcard queryBest
 Erick

 On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com wrote:

  Hi, I saw some older postings on this, but didnt see a resolution.

 I have a field called title, I would like to be able to find partial word
 matches within the title.

 For example:

 http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

 I would expect it to find:
 str name=textTitlethe daily dish | by andrew sullivan/str

 but it doesnt, it does find sully (which is fine with me also as a
 bonus),
 but doesnt seem to get any of the partial word stuff. Oddly enough before
 I
 lowercased the title, the wildcard matching seemed to work a bit better,
 it
 just didnt deal with the case sensitive query.

 At first I had mixed case titles and I read that the wildcard doesn't
 work
 with mixed case, so I created another field that is a lowered version of
 the
 title called textTitle, it is of type text.

 Is it possible with solr to achieve what I am trying to do, if so how? If
 not, anything closer than what I have?

 thanks
 Joel






Re: how to do partial word searches?

2009-11-25 Thread Robert Muir
Hi, if you are using Solr 1.4 I think you might want to try type text_rev
(look in the example schema.xml)

unless i am mistaken:

this will enable leading wildcard support for that field.
this doesn't do any stemming, which I think might be making your wildcards
behave wierd.
it also enables reverse wildcard support, so some of your substring matches
will be faster.

On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi, I saw some older postings on this, but didnt see a resolution.

 I have a field called title, I would like to be able to find partial word
 matches within the title.

 For example:

 http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22

 I would expect it to find:
 str name=textTitlethe daily dish | by andrew sullivan/str

 but it doesnt, it does find sully (which is fine with me also as a bonus),
 but doesnt seem to get any of the partial word stuff. Oddly enough before I
 lowercased the title, the wildcard matching seemed to work a bit better, it
 just didnt deal with the case sensitive query.

 At first I had mixed case titles and I read that the wildcard doesn't work
 with mixed case, so I created another field that is a lowered version of the
 title called textTitle, it is of type text.

 Is it possible with solr to achieve what I am trying to do, if so how? If
 not, anything closer than what I have?

 thanks
 Joel




-- 
Robert Muir
rcm...@gmail.com


Re: why is XMLWriter declared as final?

2009-11-25 Thread Matt Mitchell
OK thanks Shalin.

Matt

On Wed, Nov 25, 2009 at 8:48 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell goodie...@gmail.com
 wrote:

  Is there any reason the XMLWriter is declared as final? I'd like to
 extend
  it for a special case but can't. The other writers (ruby, php, json) are
  not
  final.
 
 
 I don't think it needs to be final. Maybe it is final because it wasn't
 designed to be extensible. Please open a jira issue.

 --
 Regards,
 Shalin Shekhar Mangar.



RE: Index Splitter

2009-11-25 Thread Giovanni Fernandez-Kincade
You can't really use this if you have an optimized index, right?

-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Tuesday, November 24, 2009 6:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Index Splitter

Giovanni Fernandez-Kincade wrote:
 Hi,
 I've heard about a tool that can be used to split Lucene indexes, for cases 
 where you want to break up a large index into shards. Do you know where I can 
 find it? Any observations/recommendations about its use?

 This seems promising but I'm not sure if there is anything more mature out 
 there:
 http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html

 Thanks,
 Gio.

   
There are IndexSplitter and MultiPassIndexSplitter tools in 3.0.

https://issues.apache.org/jira/browse/LUCENE-1959

I'd written an article about them before:

http://lucene.jugem.jp/?eid=344

It is Japanese but I think you can read out how to use them from command 
lines...

Koji

-- 
http://www.rondhuit.com/en/



Re: Index Splitter

2009-11-25 Thread Koji Sekiguchi

Giovanni Fernandez-Kincade wrote:

You can't really use this if you have an optimized index, right?

  

For optimized index, I think you can use MultiPassIndexSplitter.

Koji

--
http://www.rondhuit.com/en/



Re: how is score computed with hsin functionquery?

2009-11-25 Thread gdeconto


Grant Ingersoll-6 wrote:
 
 ...
 Yep.  Also note that I added deg() and rad() functions, but for the most
 part is probably better to do the conversion during indexing.
 ...
 

Thanks Grant.  I hadnt seen the deg and rad functions.  Conversion would be
difficult since I typically work with degrees.  Once I get a bit more
experienced with the solr code, maybe I can contribute a degree version of
hsin  :-)
-- 
View this message in context: 
http://old.nabble.com/how-is-score-computed-with-hsin-functionquery--tp26504265p26515157.html
Sent from the Solr - User mailing list archive at Nabble.com.



Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread javaxmlsoapdev

My SOLR_HOME =/home/solr_1_4_0/apache-solr-1.4.0/example/solr/conf in
tomcat.sh

POI, PDFBox, Tika and related jars are under
/home/solr_1_4_0/apache-solr-1.4.0/lib

When I try to index files using SolrJ API as follow, I don't see content of
the file being indexed. It only indexes file size (bytes) and file/type into
content field. See below schema defintion as well.
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest(/update/extract);
up.addFile(file);
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
server.request(up);

schema.xml has following
 field name=issueKey type=slong indexed=true stored=true
required=true / 
 field name=content type=text indexed=true stored=true
multiValued=true/ 

defaultSearchFieldcontent/defaultSearchField

And solrconfig.xml has
requestHandler name=/update/extract
class=org.apache.solr.handler.extraction.ExtractingRequestHandler
lst name=defaults
  str name=map.contentcontent/str
  str name=defaultFieldcontent/str
/lst
  /requestHandler

Luke response is as below, which displays correct count (7) of indexed
documents but no content in the index. in tomcat logs I don't see any
errors or anything. Unless I am going blind with something I don't see
anything missing in setting things up. Can anyone advise. Do I need to
include tika jars in tomcat's deployed solr/lib or unde /example/lib in
SOLR_HOME?

  ?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime28/int 
  /lst
- lst name=index
  int name=numDocs7/int 
  int name=maxDoc7/int 
  int name=numTerms25/int 
  long name=version1259164190261/long 
  bool name=optimizedfalse/bool 
  bool name=currenttrue/bool 
  bool name=hasDeletionsfalse/bool 
  str
name=directoryorg.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index/str
 
  date name=lastModified2009-11-25T15:50:03Z/date 
  /lst
- lst name=fields
- lst name=content
  str name=typetext/str 
  str name=schemaITSM--/str 
  str name=indexITS--/str 
  int name=docs7/int 
  int name=distinct18/int 
- lst name=topTerms
  int name=text3/int 
  int name=applic3/int 
  int name=msword3/int 
  int name=applicationmsword3/int 
  int name=plain2/int 
  int name=textplain2/int 
  int name=701441/int 
  int name=4531/int 
  int name=23701/int 
  int name=html1/int 
  /lst
- lst name=histogram
  int name=112/int 
  int name=22/int 
  int name=44/int 
  /lst
  /lst
- lst name=issueKey
  str name=typeslong/str 
  str name=schemaI-SO-l/str 
  str name=indexI-SO-/str 
  int name=docs7/int 
  int name=distinct7/int 
- lst name=topTerms
  int name=11/int 
  int name=21/int 
  int name=31/int 
  int name=41/int 
  int name=51/int 
  int name=61/int 
  int name=01/int 
  /lst
- lst name=histogram
  int name=17/int 
  /lst
  /lst
  /lst
- lst name=info
- lst name=key
  str name=IIndexed/str 
  str name=TTokenized/str 
  str name=SStored/str 
  str name=MMultivalued/str 
  str name=VTermVector Stored/str 
  str name=oStore Offset With TermVector/str 
  str name=pStore Position With TermVector/str 
  str name=OOmit Norms/str 
  str name=LLazy/str 
  str name=BBinary/str 
  str name=CCompressed/str 
  str name=fSort Missing First/str 
  str name=lSort Missing Last/str 
  /lst
  str name=NOTEDocument Frequency (df) is not updated when a document is
marked for deletion. df values include deleted documents./str 
  /lst
  /response
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26515579.html
Sent from the Solr - User mailing list archive at Nabble.com.



Looking for Best Practices: Analyzers vs. UpdateRequestProcessors?

2009-11-25 Thread Andreas Kahl
Hello, 

are there any general criteria when to use Analyzers to implement an indexing 
function and when it is better to use UpdateRequestProcessors? 

The main difference I found in the documentation was that 
UpdateRequestProcessors are able to manipulate several fields at once (create, 
read, update, delete), while Analyzers operate on the contents of a single 
field at once. 

Is that correct so far? Are there more experiences helping to decide which type 
of module to use implementing indexing modules? Are there differences in 
processing performance? Is one of the two APIs easier to learn/debug etc?

If you have any Best Practices with that I would be very interested to hear 
about those. 

Andreas

P.S. My experience with Search Engines is mainly with FAST where one uses 
Stages in a Pipeline no matter which feature to implement. 


Re: Index Splitter

2009-11-25 Thread Andrzej Bialecki

Koji Sekiguchi wrote:

Giovanni Fernandez-Kincade wrote:

You can't really use this if you have an optimized index, right?

  

For optimized index, I think you can use MultiPassIndexSplitter.


Correct - MultiPassIndexSplitter can handle any index - optimized or 
not, with or without deletions, etc. The cost for this flexibility is 
that it needs to read index files multiple times (hence multi-pass).




--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Solr 1.4 search in more the one Core

2009-11-25 Thread Jörg Agatz
I think NO, because there is a Crawler for fulltext indexig that
permernently uptate the Indexes

When you have a Crawler for documents,office ect, than i can switch to solr
totaly.


Re: Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread javaxmlsoapdev

g. I had to include tika and related parsing jars into
tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
apologies for all the noise. 

Thanks,
-- 
View this message in context: 
http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
Sent from the Solr - User mailing list archive at Nabble.com.



Batch file upload using solrJ API

2009-11-25 Thread javaxmlsoapdev

Is there an API to upload files over one connection versus looping through
all the files and creating new ContentStreamUpdateRequest for each file.
This, as expected, doesn't work if there are large number of files and
quickly run into memory problems. Please advise.

Thanks,


-- 
View this message in context: 
http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26518167.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: converting over from sphinx

2009-11-25 Thread Chris Hostetter

: way.  In particular, I'm doing phrase searching into a corpus of
: descriptions, such as I need help with a foo where I have a bunch of foo:
: a foo is a subset of a bar often used to create briznatzes, etc.
: 
: With Sphinx, I could convert I need help with a foo into *need* *help*
: *with* *foo* and get pretty nice matches. With Solr, my understanding is
: that you can only do wildcard matches on the suffix. In addition, stemming
: only happens on non-wildcard terms. So, my first thought would be to convert
: I need help with a foo into need need* help help* with with* foo foo*.

First off, we need to make sure we have all our terminology in sync -- i'm 
not very familiar with Sphinx, so i'm not sure what types of vernacular 
are used there to describe various things, but in Solr/Lucene you have 
options regarding how you want text to be analyzed when it's indexed -- 
this analysis is what converts an arbitrary stream of characters into 
Terms that get indexed.  at query time, it's very easy to match on 
terms, or boolean combinations of terms, and sequential phrases of terms 
-- you only need wildcard type functionality if you want to provide a 
wildcard expression that could match more then one individual term.

In your specific example, if you just configured a basic wildcard 
tokenizer when you indexed your documents (ie: foo: a foo is a subset of 
a bar often used to create briznatzes) then at query time any of the 
individual words (foo, bar, etc...) would match that document.  
likewise a phrase query like need help with foo would match that text if 
you defined some stop words (like need and with) and specified a small 
amount of slop on your phrase queries.


The point is: there are a lot of differnet ways to use Solr, and the 
terminology you are use to with Sphinx may not map exactly to some of the 
terminology you'll see in the SOlr docs/configs -- so please feel free to 
ask.

-Hoss



Re: error with multicore CREATE action

2009-11-25 Thread Chris Hostetter
: 
:  Are there any use cases for CREATE where the instance directory
:  *doesn't* yet exist? I ask because I've noticed that Solr will create
:  an instance directory for me sometimes with the CREATE command. In
...
: I guess when you try to add documents and an IndexWriter is opened, the data
: directory is created if it does not exist. Since it calls File#mkdirs, all
: parent directories are also created. I don't think Solr creates those
: directories by itself.

Shalin: I'm confused, wasn't this one of the original use cases for the 
CREATE command as part of the LotsOfCores work you and Noble have been 
pushing forward?  I thought one of the goals was that a user could have a 
single solrconfig.xml+schema.xml on disk somewhere, and then at run time 
use the CREATE command to caused many, many new cores to be created (each 
with a new/unqiue instanceDir).

If that isn't intended (and therefor: not handled well) then we should 
probably make the CREATE command test for the existence of the specified 
instanceDir and error if it doesn't already exist -- otherwise a typo in 
an instanceDir file path could lead to some really unexpected behavior.



-Hoss



Re: why is XMLWriter declared as final?

2009-11-25 Thread Chris Hostetter

: I don't think it needs to be final. Maybe it is final because it wasn't
: designed to be extensible. Please open a jira issue.

it really wasn't, and it probably shouldn't be ... there is another thread 
currently in progress (in response to SOLR-1592) about this.

Given how kludgy the entire API is, i'd really prefer it not be made 
un-final .. it would need some serious overhaul/review to make it possible 
to subclass in a sensical way, and coming up with a new API is likely to 
make a lot more sense then trying to retrofit that one.

-Hoss



Re: why is XMLWriter declared as final?

2009-11-25 Thread Mattmann, Chris A (388J)
Hey Hoss,

+1. I think we need to overhaul the whole API, even in light of the incremental 
progress I've been proposing and patching, etc., lately.

I think it's good to do that incrementally, though, rather than all at once, 
especially considering SOLR is in 1.5-dev trunk stage atm.

Cheers,
Chris

On 11/25/09 11:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote:



: I don't think it needs to be final. Maybe it is final because it wasn't
: designed to be extensible. Please open a jira issue.

it really wasn't, and it probably shouldn't be ... there is another thread
currently in progress (in response to SOLR-1592) about this.

Given how kludgy the entire API is, i'd really prefer it not be made
un-final .. it would need some serious overhaul/review to make it possible
to subclass in a sensical way, and coming up with a new API is likely to
make a lot more sense then trying to retrofit that one.

-Hoss



++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




Re: [SolrResourceLoader] Unable to load cached class-name

2009-11-25 Thread Chris Hostetter
: 
: I've deployed the contents of dist/ into JBoss's lib directory for the
: server I'm running and I've also copied the contents of lib/ into

Please be specific ... what is dist/ what is lib/ ? ... are you 
talking about the top level dist and lib directories in a solr release, 
then those should *not* be copied into any directory for JBoss.  
everything you need to access core solr features is available in wht 
solr.war -- that is all you need to run the solr application.

the only reason to ever copy any jars arround when dealing with solr is to 
load plugins (ie: your own, or things counts in the contrib directory of a 
solr release) and even then they should go in the special lib directory 
inside your Solr HOme directory so they are loaded by the appropraite 
classlaoder -- not in the top level class loader of your servlet 
container.

: [SolrResourceLoader] Unable to load cached class-name :
: org.apache.solr.search.FastLRUCache for shortname :
: solr.FastLRUCachejava.lang.ClassNotFoundException:
: org.apache.solr.search.FastLRUCache

this is most likely because you have duplicate copies of (all of) the solr 
classes at various classloader levels -- the copies in the solr.war, and 
the copies you've put into the JBoss lib dir.  having both can cause 
problems like this because of the rules involved with 
hierarchical classloaders.



-Hoss



Re: why is XMLWriter declared as final?

2009-11-25 Thread Matt Mitchell
Interesting. Well just to clarify my intentions a bit, I'll quickly explain
what I was trying to do.

I'm using the MLT component but because some of my stored fields are really
big, I don't need (or want) all of the fields for my MLT docs in the
response. I want my MLT docs to have only 2 fields, but I need my main docs
fl to have all fields.

So a simple override of the XMLWriter writeNamedList method would do the
trick. All you have to do is check if the name == moreLikeThis. If so,
process the docs and specify a different field list. If not, just call
super(). Worked like a charm, but oh well. I really only need the Ruby
response anyway, so I'll move on to that. I'm glad this spurred some
interest though.

-- It'd be great to let components have control over their fl value instead
of having a global fl value for all doc lists within a writer?

Matt

On Wed, Nov 25, 2009 at 2:33 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : I don't think it needs to be final. Maybe it is final because it wasn't
 : designed to be extensible. Please open a jira issue.

 it really wasn't, and it probably shouldn't be ... there is another thread
 currently in progress (in response to SOLR-1592) about this.

 Given how kludgy the entire API is, i'd really prefer it not be made
 un-final .. it would need some serious overhaul/review to make it possible
 to subclass in a sensical way, and coming up with a new API is likely to
 make a lot more sense then trying to retrofit that one.

 -Hoss




Re: locks in solr

2009-11-25 Thread Chris Hostetter
:   Is there any article which explains the locks in solr??
: there is some info on solrconfig.txt which says that you can set the lock
: type to none(NoLockFactory), single(SingleInstanceLockFactory),
: NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we
: create a new file.

FYI: That's not at all what the SimpleFSLockFactory does.

Index locking is a pretty low level Level concept -- there isn't really 
anything Solr specific about it.  90% of all Solr users shouldn't need to 
worry about it, ever.  The only time it becomes an issue is if you are 
planning on doing something extremeley advanced dealing with the Lucene 
index files directly.

if that's the case: your best bet is to read the Locking code and APIs in 
Lucene, and ask your questions on the java-us...@lucene mailing list.



-Hoss



Re: error with multicore CREATE action

2009-11-25 Thread Shalin Shekhar Mangar
On Thu, Nov 26, 2009 at 12:43 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 :
 :  Are there any use cases for CREATE where the instance directory
 :  *doesn't* yet exist? I ask because I've noticed that Solr will create
 :  an instance directory for me sometimes with the CREATE command. In
 ...
 : I guess when you try to add documents and an IndexWriter is opened, the
 data
 : directory is created if it does not exist. Since it calls File#mkdirs,
 all
 : parent directories are also created. I don't think Solr creates those
 : directories by itself.

 Shalin: I'm confused, wasn't this one of the original use cases for the
 CREATE command as part of the LotsOfCores work you and Noble have been
 pushing forward?  I thought one of the goals was that a user could have a
 single solrconfig.xml+schema.xml on disk somewhere, and then at run time
 use the CREATE command to caused many, many new cores to be created (each
 with a new/unqiue instanceDir).


Yes, that is correct but those changes are not in trunk right now. We're
planning to spend some time in the next few weeks in splitting that big
patch into smaller ones, adding tests and pushing them into trunk.
LotsOfCores still needs LotsOfWork :)

-- 
Regards,
Shalin Shekhar Mangar.


Re: PatternTokenizer question

2009-11-25 Thread Chris Hostetter

: I think the answer to my question is contained in the wiki when discussing
: the SynonymFilter, The Lucene QueryParser tokenizes on white space before
: giving any text to the Analyzer.  This would indeed explain what I am
: getting.  Next question - can I avoid that behavior?

it's the nature of the lucene query parser -- whitespace is a meta 
character that provides instructions to the parse, just like '+', '\', 
'', etc...

you could always use a quoted string (so the parser treats all of your 
input as one phrase) or try the field QParser (which is essentailly the 
same thing as using a quoted phrase but doesn't require the quotes, or 
respect any of the other escape characters)



-Hoss



param version and diferences in /admin/ping response

2009-11-25 Thread Nestor Oviedo
Hi everyone!
Can anyone tell me what's the meaning of the param version ?? There
isn't anything about it in the Solr documentation.

When I invoke the /admin/ping url, if the version value is between 0
and 2.1, the response looks like this:

response
responseHeader
status0/status
QTime5/QTime
lst name=params
str name=echoParamsall/str
str name=rows10/str
str name=echoParamsall/str
str name=qsolrpingquery/str
str name=qtstandard/str
str name=version2.1/str
/lst
/responseHeader
str name=statusOK/str
/response

And when the version value is anything different from that range, the
response looks like this:

response
lst name=responseHeader
int name=status0/int
int name=QTime4/int
lst name=params
str name=echoParamsall/str
str name=rows10/str
str name=echoParamsall/str
str name=qsolrpingquery/str
str name=qtstandard/str
/lst
/lst
str name=statusOK/str
/response

Tanks.
Regards
Nestor Oviedo


Re: param version and diferences in /admin/ping response

2009-11-25 Thread Chris Hostetter
: Hi everyone!
: Can anyone tell me what's the meaning of the param version ?? There
: isn't anything about it in the Solr documentation.

http://wiki.apache.org/solr/XMLResponseFormat#A.27version.27

-Hoss



Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread simon
first, check what port 8983 is bound to - should be listening on all
interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*   LISTEN

-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote:

 Hi, if I try to use any other hostname jetty doesnt work, gives a blank
 page, if I telnet too the server/port it just disconnects.

 I tried editing the scripts.conf to change the hostname, that didnt seem to
 help.

 For example I tried editing my etc/hosts file and added:

 127.0.0.1 solriscool

 then:
 ping solriscool
 PING solriscool (127.0.0.1): 56 data bytes
 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


 sh-3.2# telnet solriscool 8983
 Trying 127.0.0.1...
 Connected to solriscool.
 Escape character is '^]'.
 GET / HTTP/1.1
 Connection closed by foreign host.


 telnet localhost 8983
 Trying ::1...
 Connected to localhost.
 Escape character is '^]'.
 GET /solr HTTP/1.1
 Host: localhost

 HTTP/1.1 302 Found
 Location: http://localhost/solr/
 Content-Length: 0
 Server: Jetty(6.1.3)


 any ideas?

 thanks
 Joel




Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund

I see:

tcp46  0  0  *.8983 *.* 
LISTEN
tcp4   0  0  127.0.0.1.8983 *.* 
LISTEN


thanks
Joel

On Nov 25, 2009, at 5:21 PM, simon wrote:


first, check what port 8983 is bound to - should be listening on all
interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*
LISTEN


-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com  
wrote:


Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank

page, if I telnet too the server/port it just disconnects.

I tried editing the scripts.conf to change the hostname, that didnt  
seem to

help.

For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel






Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread simon
On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund jnyl...@yahoo.com wrote:

 I see:

 tcp46  0  0  *.8983 *.*LISTEN
 tcp4   0  0  127.0.0.1.8983 *.*LISTEN


Not the same version of linux/netstat as mine, but I'd guess that the
second  line is the key to the problem -looks as though TCP over IPv4 is onl
y listening on the localhost interface, which is a network configuration
issue.

what does the Solr log say after it's started - should be a line

 INFO:  Started SelectChannelConnector @ 0.0.0.0:8983


-Simon


 thanks
 Joel


 On Nov 25, 2009, at 5:21 PM, simon wrote:

  first, check what port 8983 is bound to - should be listening on all
 interfaces

 netstat -an |grep 8983

 You should see

 tcp0  0 0.0.0.0:8983  0.0.0.0:*   LISTEN

 -Simon

 On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote:

  Hi, if I try to use any other hostname jetty doesnt work, gives a blank
 page, if I telnet too the server/port it just disconnects.

 I tried editing the scripts.conf to change the hostname, that didnt seem
 to
 help.

 For example I tried editing my etc/hosts file and added:

 127.0.0.1 solriscool

 then:
 ping solriscool
 PING solriscool (127.0.0.1): 56 data bytes
 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


 sh-3.2# telnet solriscool 8983
 Trying 127.0.0.1...
 Connected to solriscool.
 Escape character is '^]'.
 GET / HTTP/1.1
 Connection closed by foreign host.


 telnet localhost 8983
 Trying ::1...
 Connected to localhost.
 Escape character is '^]'.
 GET /solr HTTP/1.1
 Host: localhost

 HTTP/1.1 302 Found
 Location: http://localhost/solr/
 Content-Length: 0
 Server: Jetty(6.1.3)


 any ideas?

 thanks
 Joel






Re: solr/jetty not working for anything other than localhost

2009-11-25 Thread Joel Nylund

yes says:

2009-11-25 18:08:59.967::INFO:  Started SocketConnector @ 0.0.0.0:8983

running on osx

thanks
Joel


On Nov 25, 2009, at 6:00 PM, simon wrote:

On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund jnyl...@yahoo.com  
wrote:



I see:

tcp46  0  0  *.8983 *.* 
LISTEN
tcp4   0  0  127.0.0.1.8983 *.* 
LISTEN




Not the same version of linux/netstat as mine, but I'd guess that the
second  line is the key to the problem -looks as though TCP over  
IPv4 is onl
y listening on the localhost interface, which is a network  
configuration

issue.

what does the Solr log say after it's started - should be a line

INFO:  Started SelectChannelConnector @ 0.0.0.0:8983


-Simon



thanks
Joel


On Nov 25, 2009, at 5:21 PM, simon wrote:

first, check what port 8983 is bound to - should be listening on all

interfaces

netstat -an |grep 8983

You should see

tcp0  0 0.0.0.0:8983  0.0.0.0:*
LISTEN


-Simon

On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com  
wrote:


Hi, if I try to use any other hostname jetty doesnt work, gives a  
blank

page, if I telnet too the server/port it just disconnects.

I tried editing the scripts.conf to change the hostname, that  
didnt seem

to
help.

For example I tried editing my etc/hosts file and added:

127.0.0.1 solriscool

then:
ping solriscool
PING solriscool (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms


sh-3.2# telnet solriscool 8983
Trying 127.0.0.1...
Connected to solriscool.
Escape character is '^]'.
GET / HTTP/1.1
Connection closed by foreign host.


telnet localhost 8983
Trying ::1...
Connected to localhost.
Escape character is '^]'.
GET /solr HTTP/1.1
Host: localhost

HTTP/1.1 302 Found
Location: http://localhost/solr/
Content-Length: 0
Server: Jetty(6.1.3)


any ideas?

thanks
Joel









Re: Deduplication in 1.4

2009-11-25 Thread KaktuChakarabati

Hey Otis,
Yep, I realized this myself after playing some with the dedupe feature
yesterday.
So it does look like Field collapsing is what I need pretty much.
Any idea on how close it is to being production-ready?

Thanks,
-Chak

Otis Gospodnetic wrote:
 
 Hi,
 
 As far as I know, the point of deduplication in Solr (
 http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate
 document before indexing it in order to avoid duplicates in the index in
 the first place.
 
 What you are describing is closer to field collapsing patch in SOLR-236.
 
  Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
 
 
 
 - Original Message 
 From: KaktuChakarabati jimmoe...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 24, 2009 5:29:00 PM
 Subject: Deduplication in 1.4
 
 
 Hey,
 I've been trying to find some documentation on using this feature in 1.4
 but
 Wiki page is alittle sparse..
 In specific, here's what i'm trying to do:
 
 I have a field, say 'duplicate_group_id' that i'll populate based on some
 offline documents deduplication process I have.
 
 All I want is for solr to compute a 'duplicate_signature' field based on
 this one at update time, so that when i search for documents later, all
 documents with same original 'duplicate_group_id' value will be rolled up
 (e.g i'll just get the first one that came back  according to relevancy).
 
 I enabled the deduplication processor and put it into updater, but i'm
 not
 seeing any difference in returned results (i.e results with same
 duplicate_id are returned separately..)
 
 is there anything i need to supply in query-time for this to take effect?
 what should be the behaviour? is there any working example of this?
 
 Anything will be helpful..
 
 Thanks,
 Chak
 -- 
 View this message in context: 
 http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html
Sent from the Solr - User mailing list archive at Nabble.com.



Date ranges for indexes constructed outside Solr

2009-11-25 Thread Phil Hagelberg

I'm working on an application that will build indexes directly using the
Lucene API, but will expose them to clients using Solr. I'm seeing
plenty of documentation on how to support date range fields in Solr,
but they all assume that you are inserting documents through Solr rather
than merging already-generated indexes.

Where can I find details about the Lucene-level field operations that
can be used to generate date fields that Solr will work with? In
particular date resolution settings are unclear.

On a similar note: how much of schema.xml is relevant in cases where
Solr is not performing insertions? Obviously defaultSearchField is as
well as the solrQueryParser defaultOperator attribute, but it seems like
most of the field declarations might not matter.

thanks,
Phil


Re: Where to put ExternalRequestHandler and Tika jars

2009-11-25 Thread Juan Pedro Danculovic
HI! does your example finally works? I index the data with solrj and I have
the same problem and could not retrieve file data.


On Wed, Nov 25, 2009 at 3:41 PM, javaxmlsoapdev vika...@yahoo.com wrote:


 g. I had to include tika and related parsing jars into
 tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake.
 apologies for all the noise.

 Thanks,
 --
 View this message in context:
 http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Trouble Configuring WordDelimiterFilterFactory

2009-11-25 Thread Rahul R
Hello,
Would really appreciate any inputs/suggestions on this. Thank you.



On Tue, Nov 24, 2009 at 10:59 PM, Rahul R rahul.s...@gmail.com wrote:

 Hello,
 In our application we have a catch-all field (the 'text' field) which is
 cofigured as the default search field. Now this field will have a
 combination of numbers, alphabets, special characters etc. I have a
 requirement wherein the WordDelimiterFilterFactory does not work on numbers,
 especially those with decimal points. Accuracy of results with relevance to
 numerical data is quite important, So if the text field of a document has
 data like Bridge-Diode 3.55 Volts, I want to make sure that a search for
 355 or 35.5 does not retrieve this document. So I found the following
 setting for the WordDelimiterFilterFactory to work for me (for most parts):
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1
 generateNumberParts=0 catenateWords=1 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0
 preserveOriginal=1/

 I am using the same setting for both index and query.

 Now the only problem is, if I have data like .355. With the above
 setting, the analysis jsp shows me that WordDelimiterFilterFactory is
 creating term texts as both .355' and 355. So a search for .355
 retrieves documents containing both .355 and 355. A search for 355
 also has the same effect. I noticed that when the entry for the
 WordDelimiterFilterFactory was completely removed (both index and query),
 then the above problem was resolved. But this seems too harsh a measure.

 Is there a way by which I can prevent the WordDelimiterFilterFactory from
 totally acting on numerical data ?

 Regards
 Rahul