Re: Newbie Question, can I store structured sub elements?

2011-08-25 Thread Zac Tolley
I know I can have multi value on them but that doesn't let me see that
a showing instance happens at a particular time on a particular
channel, just that it shows on a range of channels at a range of times

Starting to think I will have to either store a formatted string that
combines them or keep it flat just for indexing, retrieve ids and use
them to get data out of the RDBMS


On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote:

 You could change starttime and channelname to multiValued=true and use these 
 fields to store all the values for those fields.

 showing.movie_id and showing.id probably isn't needed in a solr record.



 On 8/24/11 7:53 AM, Zac Tolley wrote:
 I have a very scenario in which I have a film and showings, each film has
 multiple showings at set times on set channels, so I have:

 Movie
 -
 id
 title
 description
 duration


 Showing
 -
 id
 movie_id
 starttime
 channelname



 I want to know can I store this in solr so that I keep this stucture?

 I did try to do an initial import with the DIH using this config:

 entity name=movie query=SELECT * from movies
field column=ID name=id/
field column=TITLE name=title/
field name=DESCRIPTION name=description/

entity name=showing query=SELECT * from showings WHERE movie_id = ${
 movie.id}
  field column=ID name=id/
  field column=STARTTIME name=starttime/
  field column=CHANNELNAME name=channelname/
/entity
 /entity

 I was hoping, for each movie to get a sub entity with the showing like:

 doc
str name=title./str
showing
  str name=channelname.



 but instead all the fields are flattened down to the top level.

 I know this must be easy, what am I missing... ?




Re: can i create filters of score range

2011-08-25 Thread jame vaalet
well when i said client.. i meant querying through solr.NET (in a way this
can be seen as posting it through web browser url).
so coming back to the issue .. even if am sorting it by _docid_ i need to do
paging( 2 million docs in result)
how is it internally doing it ?
when sorted by docid, don we have deep pagin issue ? (getting all the
previous pages into memory to get the next page)
so whats the main difference we are gaining by sorting lucene docids and
normal fields ?

On 23 August 2011 22:41, Erick Erickson erickerick...@gmail.com wrote:

 Did you try exactly what Chris suggested? Appending
 sort=_docid_ asc to the query? When you say
 client I assume you're talking SolrJ, and I'm pretty
 sure that SolrQuery.setSortField is what you want.

 I suppose you could also set this as the default in your
 query handler.

 Best
 Erick

 On Tue, Aug 23, 2011 at 4:43 AM, jame vaalet jamevaa...@gmail.com wrote:
  okey, so this is something i was looking for .. the default order of
 result
  docs in lucene\solr ..
  and you are right, since i don care about the order in which i get the
 docs
  ideally i shouldn't ask solr to do any sorting on its raw result list
 ...
  though i understand your point, how do i do it as solr client ? by
 default
  if am not mentioning the sort parameter in query URL to solr, solr will
 try
  to sort it with respect to the score it calculated .. how do i prevent
 even
  this sorting ..do we have any setting as such in solr for this ?
 
 
  On 23 August 2011 03:29, Chris Hostetter hossman_luc...@fucit.org
 wrote:
 
 
  : before going into lucene doc id , i have got creationDate datetime
 field
  in
  : my index which i can use as page definition using filter query..
  : i have learned exposing lucene docid wont be a clever idea, as its
 again
  : relative to index instance.. where as my index date field will be
 unique
  : ..and i can definitely create ranges with that..
 
  i think you missunderstood me: i'm *not* suggesting you do any filtering
  on the internal lucene doc id.  I am suggesting that you forget all
 about
  trying to filter to work arround the issues with deep paging, and simply
  *sort* on _docid_ asc, which should make all inherient issues with deep
  paging go away (as far as i know).  At no point with the internal lucene
  doc ids be exposed to your client code, it's just a instruction to
  Solr/Lucene that it doesn't really need to do any sorting, it can just
  return the Nth-Mth docs as collected.
 
  : i ahve got on more doubt .. if i use filter query each time will it
  result
  : in memory problem like that we see in deep paging issues..
 
  it could, i'm not sure. that's why i said...
 
  :  I'm not sure if this would really gain you much though -- yes this
  would
  :  work arround some of the memory issues inherient in deep paging
 but
  it
  :  would still require a lot or rescoring of documents again and again.
 
 
  -Hoss
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: Preserve XML hierarchy

2011-08-25 Thread _snake_
Hi Michael, Thanks for your help!

I am using Apache Solr 3.2 on windows.

I am trying to apply the 2 patches (
https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#issue-tabs
XMLCharFilter Patch ), but I have no idea to do that.

What do I need to open the Solr project? Or Which is the .jar file that I
need to open?
I saw that there is 2 files  (woodstox and stax2), what I have to do with
that files?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Preserve-XML-hierarchy-tp3166690p3283275.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spatial Search problems

2011-08-25 Thread Javier Heras
Thanx David,

Just one more question. Am I able to do spatial search with solr1.4? And
with lucene 2.9? What's your recomendation?

Javier

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spatial-Search-problems-tp3277945p3283285.html
Sent from the Solr - User mailing list archive at Nabble.com.


Grouping and performing statistics per group

2011-08-25 Thread Omri Cohen
Hi All,

I want to group-by certain field and perform statistics per group on a
certain field of my choice.

For example, if I have the next documents in my collection:

doc
child-id 12353 /child-id
weight 65 /weight
 gender male /gender
/doc
doc
child-id 12353 /child-id
 weight 63 /weight
gender male /gender
/doc
doc
child-id 12353 /child-id
weight 49 /weight
 gender male /gender
/doc

now I want to group by gender, and let say for the sake of the example, that
I want to average statistic on the weight.

Is that possible?  I would appreciate if anyone can also elaborate on
performing such actions using SolrJ ...

Thanks.


Re: Grouping and performing statistics per group

2011-08-25 Thread Sowmya V.B.
Hi

Is it possible that Luke Handler can be used for this?

I used Something like:
http://localhost:8080/solr/admin/luke?fl=fieldNamenumTerms=1
to get an estimate of a range of values a field can have.

Hope you find this information useful.

Sowmya.

On Thu, Aug 25, 2011 at 10:58 AM, Omri Cohen omri...@gmail.com wrote:

 Hi All,

 I want to group-by certain field and perform statistics per group on a
 certain field of my choice.

 For example, if I have the next documents in my collection:

 doc
 child-id 12353 /child-id
 weight 65 /weight
  gender male /gender
 /doc
 doc
 child-id 12353 /child-id
  weight 63 /weight
 gender male /gender
 /doc
 doc
 child-id 12353 /child-id
 weight 49 /weight
  gender male /gender
 /doc

 now I want to group by gender, and let say for the sake of the example,
 that
 I want to average statistic on the weight.

 Is that possible?  I would appreciate if anyone can also elaborate on
 performing such actions using SolrJ ...

 Thanks.




-- 
Sowmya V.B.

Losing optimism is blasphemy!
http://vbsowmya.wordpress.com



Re: Grouping and performing statistics per group

2011-08-25 Thread Omri Cohen
Thanks, but I actually need something more deterministic and more
accurate..

Anyone knows if there is an already existing feature for that?

thanks again


Upload doc and pdf in Solr 3.3.0

2011-08-25 Thread Moinsn
Good Morning,

I have to set up a Solr System to seek in documents like pdf and doc. My
Solr System is running in the meantime, but i cant find a tutorial that
tells me what i have to do to put the files in the system. 
I hope you can help me a bit to bring that off on a simple way.
And please excuse my bad english.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upload-doc-and-pdf-in-Solr-3-3-0-tp3283224p3283224.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr in a windows shared hosting environment

2011-08-25 Thread Devora
Hi,

 

Is it possible to install Solr in  a windows (IIS 7 or IIS 7.5)  shared
hosting environment?

If yes, where can I find instructions how to do that?

 

Thank you!



Re: Upload doc and pdf in Solr 3.3.0

2011-08-25 Thread Jayendra Patil
http://wiki.apache.org/solr/ExtractingRequestHandler may help.

Regards,
Jayendra

On Thu, Aug 25, 2011 at 3:24 AM, Moinsn felix.wieg...@googlemail.com wrote:
 Good Morning,

 I have to set up a Solr System to seek in documents like pdf and doc. My
 Solr System is running in the meantime, but i cant find a tutorial that
 tells me what i have to do to put the files in the system.
 I hope you can help me a bit to bring that off on a simple way.
 And please excuse my bad english.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Upload-doc-and-pdf-in-Solr-3-3-0-tp3283224p3283224.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to anchor solr searches?

2011-08-25 Thread lee carroll
I don't think solr conforms to ACID type behaviours for its queries.
This is not to say your use-case is not important
just that its not SOLR's focus. I think its a interesting question but
the solution is probably going to involve rolling your own.

Something like returning 1 user docs and caching these in an
application cache. pagination occurs in this cache rather than as a
solr query with the start param incremented. Maybe offer a refresh
data link with repopulates the cache from solr.

cheers lee c

On 25 August 2011 01:01, arian487 akarb...@tagged.com wrote:
 If I'm searching for users based on last login time, and I search once, then
 go to the second page with a new offset, I could potentially see the same
 users on page 2 if the index has changed.  What is the best way to anchor it
 so I avoid this?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3282576.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Upload doc and pdf in Solr 3.3.0

2011-08-25 Thread Moinsn
Thanks.
Now it works.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Upload-doc-and-pdf-in-Solr-3-3-0-tp3283224p3283760.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Grouping and performing statistics per group

2011-08-25 Thread Martijn v Groningen
Hi Omri,

I think you can achieve that with grouping and the Solr StatsComponent (
http://wiki.apache.org/solr/StatsComponent).
In order to compute statistics on groups you must set the option
group.truncate=true
An example query:
q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight

Let me know if this fits your use case.

Martijn

On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote:

 Thanks, but I actually need something more deterministic and more
 accurate..

 Anyone knows if there is an already existing feature for that?

 thanks again




-- 
Met vriendelijke groet,

Martijn van Groningen


Re: can you help on this?

2011-08-25 Thread Shalin Shekhar Mangar
Can you please specify:

   1. Solr version
   2. Platform
   3. JDK version
   4. Under what conditions does this error happens? Is it reproducible?


On Wed, Aug 24, 2011 at 6:53 PM, abhijit bashetti abhijitbashe...@gmail.com
 wrote:

 SEVERE: java.lang.InternalError: a fault occurred in a recent unsafe memory
 access operation in compiled Java code
 at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108)
 at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
 at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:131)
 at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:166)
 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:273)
 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:209)
 at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:503)
 at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:309)
 at org.apache.lucene.search.TermQuery$TermWeight$1.add(TermQuery.java:56)
 at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:77)
 at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:82)
 at org.apache.lucene.util.ReaderUtil$Gather.run(ReaderUtil.java:66)
 at org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:53)
 at org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:198)
 at

 org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:176)
 at
 org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:354)
 at
 org.apache.lucene.search.Searcher.createNormalizedWeight(Searcher.java:168)
 at

 org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:661)
 at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320)
 at

 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178)
 at

 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066)
 at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358)
 at

 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258)
 at

 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:879)
 at

 org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
 at

 org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
 at

 org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
 at

 org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
 at java.lang.Thread.run(Thread.java:619)




-- 
Regards,
Shalin Shekhar Mangar.


Re: Grouping and performing statistics per group

2011-08-25 Thread Martijn v Groningen
Or if you dont care about grouped results you can also add the following
option:
stats.facet=gender

On 25 August 2011 14:40, Martijn v Groningen
martijn.v.gronin...@gmail.comwrote:

 Hi Omri,

 I think you can achieve that with grouping and the Solr StatsComponent (
 http://wiki.apache.org/solr/StatsComponent).
 In order to compute statistics on groups you must set the option
 group.truncate=true
 An example query:

 q=*:*group=truegroup.field=gendergroup.truncate=truestats=truestats.field=weight

 Let me know if this fits your use case.

 Martijn

 On 25 August 2011 11:17, Omri Cohen omri...@gmail.com wrote:

 Thanks, but I actually need something more deterministic and more
 accurate..

 Anyone knows if there is an already existing feature for that?

 thanks again




 --
 Met vriendelijke groet,

 Martijn van Groningen




-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Grouping and performing statistics per group

2011-08-25 Thread Omri Cohen
Hi, thanks for your reply..

it doesn't work.  I am getting the plain stats results at the end of the
response, but no statistics per group..

thanks anyway


Re: How to copy and extract information from a multi-line text before the tokenizer

2011-08-25 Thread Erick Erickson
You could consider writing your own UpdateHandler. It allows you to
get access to the underlying SolrInputDocument, and freely modify
the fields before it even gets to the analysis chain in defined in your
schema. So you can get your AllData out of the doc, split it apart as
many ways as you want and put fields back in the SolrInputDocument.
You can even remove the AllData field if you want and not even define
it in your schema

A programmer had a problem. He tried to solve it with regular expressions.
Now he has two problems :).

Best
Erick

On Tue, Aug 23, 2011 at 6:28 AM, Michael Kliewe m...@mail.de wrote:
 Hello all,

 I have a custom schema which has a few fields, and I would like to create a 
 new field in the schema that only has one special line of another field 
 indexed. Lets use this example:

 field AllData (TextField) has for example this data:
 Title: exampleTitle of the book
 Author: Example Author
 Date: 01.01.1980

 Each line is separated by a line break.
 I now need a new field named OnlyAuthor which only has the Author information 
 in it, so I can search and facet for specific Author information. I added 
 this to my schema:

 fieldType name=authorField class=solr.TextField
  analyzer type=index
    charFilter class=solr.PatternReplaceCharFilterFactory 
 pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all /
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
    charFilter class=solr.PatternReplaceCharFilterFactory 
 pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all /
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.TrimFilterFactory/
  /analyzer
 /fieldType

 field name=OnlyAuthor type=authorField indexed=true stored=true /

 copyField source=AllData dest=OnlyAuthor/


 But this is not working, the new AuthorOnly field contains all data, because 
 the regex didn't match. But I need Example Author in that field (I think) 
 to be able to search and facet only author information.

 I don't know where the problem is, perhaps someone of you can give me a hint, 
 or a totally different method to achieve my goal to extract a single line 
 from this multi-line-text.

 Kind regards and thanks for any help
 Michael





Re: Query parameter changes from solr 1.4 to 3.3

2011-08-25 Thread Erick Erickson
Have you looked at the CHANGES.txt file? That is supposed to be
updated with all these kinds of notes...

Best
Erick

On Tue, Aug 23, 2011 at 7:11 AM, Samarendra Pratap samarz...@gmail.com wrote:
 Hi,
  We are upgrading solr 1.4 (with collapsing patch solr-236) to solr 3.3. I
 was looking for the required changes in query parameters (or parameter
 names) if any.
  One thing I know for sure is that collapse and its sub-options are now
 known by group, but didn't find anything else.

  Can someone point me to some document or webpage for this?
  Or if there aren't any other changes can someone confirm that?

 --
 Regards,
 Samar



Re: Spellcheck Phrases

2011-08-25 Thread Erick Erickson
Please start a new thread for this question, see:

http://people.apache.org/~hossman/#threadhijack

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is hidden in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.


Best
Erick

On Tue, Aug 23, 2011 at 11:47 AM, Herman Kiefus herm...@angieslist.com wrote:
 The angle that I am trying here is to create a dictionary from indexed terms 
 that contain only correctly spelled words.  We are doing this by having the 
 field from which the dictionary is created utilize a type that employs 
 solr.KeepWordFilterFactory, which in turn utilizes a text file of known 
 correctly spelled words (including their respective derivations example: 
 lead, leads, leading, etc.).

 This is working great for us with the exception being those fields in our 
 schema that contain proper names.  I can't seem to get (unfiltered) terms 
 from those fields along with (correctly spelled) terms from other fields into 
 the single field upon which the dictionary is built.

 -Original Message-
 From: Dyer, James [mailto:james.d...@ingrambook.com]
 Sent: Thursday, June 02, 2011 11:40 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Spellcheck Phrases

 Actually, someone just pointed out to me that a patch like this is 
 unnecessary.  The code works as-is if configured like this:

 float name=thresholdTokenFrequency.01/float  (correct)

 instead of this:

 str name=thresholdTokenFrequency.01/str (incorrect)

 I tested this and it seems to work.  I'm still am trying to figure out if 
 using this parameter actually improves the quality of our spell suggestions, 
 now that I know how to use it properly.

 Sorry about the mis-information earlier.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Dyer, James
 Sent: Wednesday, June 01, 2011 3:02 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Spellcheck Phrases

 Tanner,

 I just entered SOLR-2571 to fix the float-parsing-bug that breaks 
 thresholdTokenFrequency.  Its just a 1-line code fix so I also included a 
 patch that should cleanly apply to solr 3.1.  See 
 https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.

 This parameter appears absent from the wiki.  And as it has always been 
 broken for me, I haven't tested it.  However, my understanding it should be 
 set as the minimum percentage of documents in which a term has to occur in 
 order for it to appear in the spelling dictionary.  For instance in the 
 config below, a term would have to occur in at least 1% of the documents for 
 it to be part of the spelling dictionary.  This might be a good setting for 
 long fields but for the short fields in my application, I was thinking of 
 setting this to something like 1/1000 of 1% ...

 searchComponent name=spellcheck class=solr.SpellCheckComponent  str 
 name=queryAnalyzerFieldTypetext/str
  lst name=spellchecker
  str name=namespellchecker/str
  str name=fieldSpelling_Dictionary/str
  str name=fieldTypetext/str
  str name=spellcheckIndexDir./spellchecker/str
  str name=thresholdTokenFrequency.01/str
  /lst
 /searchComponent

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Tanner Postert [mailto:tanner.post...@gmail.com]
 Sent: Friday, May 27, 2011 6:04 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Spellcheck Phrases

 are there any updates on this? any third party apps that can make this work 
 as expected?

 On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James 
 james.d...@ingrambook.comwrote:

 Tanner,

 Currently Solr will only make suggestions for words that are not in
 the dictionary, unless you specifiy spellcheck.onlyMorePopular=true.
 However, if you do that, then it will try to improve every word in
 your query, even the ones that are spelled correctly (so while it
 might change brake to break it might also change leg to log.)

 You might be able to alleviate some of the pain by setting the
 thresholdTokenFrequency so as to remove misspelled and rarely-used
 words from your dictionary, although I personally haven't been able to
 get this parameter to work.  It also doesn't seem to be documented on
 the wiki but it is in the 1.4.1. source code, in class
 IndexBasedSpellChecker.  Its also mentioned in SmileyPugh's book.  I
 tried setting it like this, but got a ClassCastException on the float value:

 searchComponent name=spellcheck class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetext_spelling/str
  lst name=spellchecker
  str name=namespellchecker/str
  str name=fieldSpelling_Dictionary/str
  str name=fieldTypetext_spelling/str
  str name=buildOnOptimizetrue/str  str
 name=thresholdTokenFrequency.001/str
  

Re: Grouping and performing statistics per group

2011-08-25 Thread Martijn v Groningen
If you take this query from the wiki:
http://localhost:8983/solr/select?q=*:*stats=truestats.field=pricestats.field=popularitystats.twopass=truerows=0indent=truestats.facet=inStock
In this case you get stats about the popularity per inStock value (true /
false). Replacing this values with weight and gender respectively and if you
run the query on your index, this should give the result you want.

On 25 August 2011 14:47, Omri Cohen omri...@gmail.com wrote:

 Hi, thanks for your reply..

 it doesn't work.  I am getting the plain stats results at the end of the
 response, but no statistics per group..

 thanks anyway




-- 
Met vriendelijke groet,

Martijn van Groningen


RE: Best way to anchor solr searches?

2011-08-25 Thread Jaeger, Jay - DOT
I don't think it has to be quite so bleak as that, depending upon the number of 
queries done over a given timeframe, and the size of the result sets.  Solr 
does cache the identifiers of documents returned by search results.   See 
http://wiki.apache.org/solr/SolrCaching paying particular attention to 
queryResultCache and queryResultWindoeSize .
 
Not a guarantee, of course, but if the number of people searching is limited, 
and the result set sizes is manageable, the cache can probably accommodate 
their business need.

JRJ

-Original Message-
From: lee carroll [mailto:lee.a.carr...@googlemail.com] 
Sent: Thursday, August 25, 2011 6:33 AM
To: solr-user@lucene.apache.org
Subject: Re: Best way to anchor solr searches?

I don't think solr conforms to ACID type behaviours for its queries.
This is not to say your use-case is not important
just that its not SOLR's focus. I think its a interesting question but
the solution is probably going to involve rolling your own.

Something like returning 1 user docs and caching these in an
application cache. pagination occurs in this cache rather than as a
solr query with the start param incremented. Maybe offer a refresh
data link with repopulates the cache from solr.

cheers lee c

On 25 August 2011 01:01, arian487 akarb...@tagged.com wrote:
 If I'm searching for users based on last login time, and I search once, then
 go to the second page with a new offset, I could potentially see the same
 users on page 2 if the index has changed.  What is the best way to anchor it
 so I avoid this?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3282576.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Newbie question, ant target for packaging source files from local copy?

2011-08-25 Thread Steven A Rowe
Hi sid,

The current source packaging scheme aims to *avoid* including local changes :), 
so yes, there is no support currently for what you want to do.

Prior to https://issues.apache.org/jira/browse/LUCENE-2973, the source 
packaging scheme used the current sources rather than pulling from Subversion.  
If you check out trunk revision 1083212 or earlier, or branch_3x revision 
1083234 or earlier, you can see how it used to be done.

If you want to resurrect the previous source packaging scheme as a new Ant 
target (maybe named package-local-src-tgz?), and you make a new JIRA issue 
and post a patch, and I'll help you get it committed (assuming nobody objects). 
 If you haven't seen the Solr Wiki HowToContribute page 
http://wiki.apache.org/solr/HowToContribute, it may be of use to you for this.

Steve

 -Original Message-
 From: syyang [mailto:syyan...@gmail.com]
 Sent: Wednesday, August 24, 2011 10:07 PM
 To: solr-user@lucene.apache.org
 Subject: Newbie question, ant target for packaging source files from
 local copy?
 
 Hi all,
 
 I am trying to package source files containing local changes. While
 running
 ant dist creates a war file containing the local changes, running ant
 package-src-tgz exports files straight from svn repository, and does not
 pick up any of the local changes.
 
 Is there an ant target that I can use to package local copy of the source
 files? Or are are we expected to just write our own?
 
 Thanks,
 -Sid
 
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Newbie-
 question-ant-target-for-packaging-source-files-from-local-copy-
 tp3282787p3282787.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-25 Thread Erick Erickson
Yes, but you can always employ a singleton to open and maintain a DB connection.

Best
Erick

On Tue, Aug 23, 2011 at 9:16 PM, samuele.mattiuzzo samum...@gmail.com wrote:
 those documents are unrelated to the database. the db i have is just storing
 countries - region - cities, and it's used to do a refinement on a specific
 solr field

 example:

 solrField thetext with content Mary comes from London

 updateHandler polls the database for europe - great britain - london and
 updates those values to the correct fields

 isnt an update handler relative to a single document? at least, that's what
 i understood...

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3279765.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SolrServer instances

2011-08-25 Thread Jonty Rhods
Hi All,

I am using SolrJ (3.1) and Tomcat 6.x. I want to open solr server once (20
concurrence) and reuse this across all the site. Or something like
connection pool like we are using for DB (ie Apache DBCP). There is a way to
use static method which is a way but I want better solution from you people.



I read one threade where Ahmet suggest to use something like that

String serverPath = http://localhost:8983/solr;;
HttpClient client = new HttpClient(new
MultiThreadedHttpConnectionManager());
URL url = new URL(serverPath);
CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client);

But how to use instance of this across all class.

Please suggest.

regards
Jonty


Re: Problem using stop words

2011-08-25 Thread Erick Erickson
Hmmm, I'd expect you to have an error in your log file if you haven't
removed the default field type named string. If you have removed it
from your schema, this should work...

But I'd change the name anyway, it'll cause endless confusion.

And don't forget to re-index *everything*, preferably after removing
the data/index directory (the whole directory, not just the contents of index).

Take a look at the admin/analysis page after you've made your changes, and
click the verbose checkbox to see exactly what happens at each step
of the analysis chain. That'll tell you whether you're on the right track.

The definition you're using should do what you've indicated you want.

Best
Erick

On Wed, Aug 24, 2011 at 3:43 AM, _snake_ lucas.mig...@gmail.com wrote:
 Thanks everybody for your help!!

 I change the stopwords file, and I only use one word per line, without start
 / ending spaces, and without comments.
 I change it to UTF-8.
 I am using the TermsComponent to suggest words to the user (JQuery UI
 Autocomplete). So, the stopwords are still showed here...
 Do I have to change the name of the fieldtype string?
 I think the problem is that TermsComponent doesn't use the stopwords file.
 Is there another way to suggest words to the user?
 Thanks!


 The Solr Analysis shows the following when I search the word 'a' (that is an
 stopword) in a field that have all the content.
 Query Analyzer
 a

 The content of schema file is:
 fieldtype name=string  class=solr.TextField positionIncrementGap=100
   analyzer type=index
                tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=spanish_stop2.txt enablePositionIncrements=true /
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
        filter class=solr.LowerCaseFilterFactory/

   /analyzer
   analyzer type=query
                tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.StopFilterFactory
                ignoreCase=true
                words=spanish_stop2.txt
                enablePositionIncrements=true
                /
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
        filter class=solr.LowerCaseFilterFactory/
        /analyzer
   /fieldtype

 The solrconfig.xml file:

  requestHandler name=/terms class=solr.SearchHandler startup=lazy
     lst name=defaults
      bool name=termstrue/bool
    /lst
    arr name=components
      strterms/str
    /arr
  /requestHandler



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Problem-using-stop-words-tp3274598p3280291.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr in a windows shared hosting environment

2011-08-25 Thread Jaeger, Jay - DOT
Yes, but since Solr is written in Java to run in a JEE container, you would 
host Solr in a web application server, either Jetty (which comes packaged), or 
something else (say, Tomcat or WebSphere or something like that).

As a result, you aren't going to find anything that says how to run Solr under 
IIS because it doesn't run under IIS.

It doesn't need IIS, though it can certainly coexist alongside IIS.  If you 
want the requests to go thru IIS you might need a plug-in in IIS to handle that 
(IBM's WebSphere has such a plugin).  If you don't need the requests to go thru 
IIS, then that isn't an issue.

Hope that helps.

JRJ

-Original Message-
From: Devora [mailto:devora...@gmail.com] 
Sent: Thursday, August 25, 2011 5:15 AM
To: solr-user@lucene.apache.org
Subject: Solr in a windows shared hosting environment

Hi,

 

Is it possible to install Solr in  a windows (IIS 7 or IIS 7.5)  shared
hosting environment?

If yes, where can I find instructions how to do that?

 

Thank you!



RE: How to copy and extract information from a multi-line text before the tokenizer

2011-08-25 Thread Jaeger, Jay - DOT
A programmer had a problem. He tried to solve it with regular expressions.
Now he has two problems :).

A.  That just isn't fair...  8^)

(I can't think of very many things that have allowed me to perform more magic 
over my career than regular expressions, starting with SNOBOL.  Uh oh:  I just 
dated myself.  8^) ).

JRJ

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, August 25, 2011 7:54 AM
To: solr-user@lucene.apache.org
Subject: Re: How to copy and extract information from a multi-line text before 
the tokenizer

You could consider writing your own UpdateHandler. It allows you to
get access to the underlying SolrInputDocument, and freely modify
the fields before it even gets to the analysis chain in defined in your
schema. So you can get your AllData out of the doc, split it apart as
many ways as you want and put fields back in the SolrInputDocument.
You can even remove the AllData field if you want and not even define
it in your schema

A programmer had a problem. He tried to solve it with regular expressions.
Now he has two problems :).

Best
Erick

On Tue, Aug 23, 2011 at 6:28 AM, Michael Kliewe m...@mail.de wrote:
 Hello all,

 I have a custom schema which has a few fields, and I would like to create a 
 new field in the schema that only has one special line of another field 
 indexed. Lets use this example:

 field AllData (TextField) has for example this data:
 Title: exampleTitle of the book
 Author: Example Author
 Date: 01.01.1980

 Each line is separated by a line break.
 I now need a new field named OnlyAuthor which only has the Author information 
 in it, so I can search and facet for specific Author information. I added 
 this to my schema:

 fieldType name=authorField class=solr.TextField
  analyzer type=index
    charFilter class=solr.PatternReplaceCharFilterFactory 
 pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all /
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.TrimFilterFactory/
  /analyzer
  analyzer type=query
    charFilter class=solr.PatternReplaceCharFilterFactory 
 pattern=^.*\nAuthor: (.*?)\n.*$ replacement=$1 replace=all /
    tokenizer class=solr.WhitespaceTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.TrimFilterFactory/
  /analyzer
 /fieldType

 field name=OnlyAuthor type=authorField indexed=true stored=true /

 copyField source=AllData dest=OnlyAuthor/


 But this is not working, the new AuthorOnly field contains all data, because 
 the regex didn't match. But I need Example Author in that field (I think) 
 to be able to search and facet only author information.

 I don't know where the problem is, perhaps someone of you can give me a hint, 
 or a totally different method to achieve my goal to extract a single line 
 from this multi-line-text.

 Kind regards and thanks for any help
 Michael





Re: Spatial Search problems

2011-08-25 Thread Smiley, David W.
Um, You might try googling LocalSolr or LocalLucene -- dead projects but you 
insist on using an old Solr/Lucene.

Of course if all you need is a bounding box filter than a pair of lat  lon 
range queries is sufficient.

~ David Smiley

On Aug 25, 2011, at 4:01 AM, Javier Heras wrote:

 Thanx David,
 
 Just one more question. Am I able to do spatial search with solr1.4? And
 with lucene 2.9? What's your recomendation?
 
 Javier
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Spatial-Search-problems-tp3277945p3283285.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Return only the fields where there are results

2011-08-25 Thread Erick Erickson
So, there are 2 documents with results, but Solr show me fields without the
word alquileres, why?

Because that's the way Solr is built G. It wouldn't be very useful for Solr to
only return the fields with matches. Consider indexing articles with a title and
contents. If you search matched on contents but not the title, would you really
want the returned document from Solr to NOT have the title?

You can control what fields are returned for the result set by specifying the
fl parameter, either in the defaults for your config or on the URL if you only
want specific fields back.

And a general comment. It is highly unusual to have all those string
fields.  String
fields are unanalyzed, meaning the entire field is a single term, no
matter now many
words are in it. This is rarely useful for anything except, say, part
numbers or SKUs
or IDs. For general text that you want to search on, this is almost
never correct. You
might want to spend some time on the admin/analysis page to get a better sense
of this...

Best
Erick

On Wed, Aug 24, 2011 at 9:35 AM, _snake_ lucas.mig...@gmail.com wrote:
 Hi,

 I am using Apache Solr 3.2 and I want to know if it's possible to return
 only the fields that have the term that I am searching.

 In the Solr Admin page, if I put the word alquileres and then I click the
 button Search, the following response is showed:
 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime438/int
 /lst
 -
 result name=response numFound=2 start=0
 -
 doc
 -
 arr name=Accion
 str/
 str/
 str/
 str/
 str/
 str/
 str/
 str/
 str/
 /arr
 -
 arr name=Acciones
 str/
 str/
 str/
 str/
 str/
 str/
 /arr
 -
 arr name=Actividad
 str/
 str/
 str/
 str/
 str/
 str/
 str/
 str/
 /arr
 -
 arr name=Actividades
 str/
 str/
 str/
 /arr
 -
 arr name=Beneficiarios
 str/
 str/
 str/
 str/
 /arr
 -
 arr name=Condicion
 str/
 str/
 /arr
 -
 arr name=Creacion
 str/
 /arr
 -
 arr name=Cuantia
 str/
 str/
 str/
 /arr
 -
 arr name=Excepcion
 ...
 
 The HTTP Request is :
 ...select/?q=alquileresversion=2.2start=0rows=10indent=on
 So, there are 2 documents with results, but Solr show me fields without the
 word alquileres, why?
 My schema.xml file is:
 ?xml version=1.0 ?
 schema name=example core zero version=1.1
  types
   fieldtype name=string  class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
filter class=solr.StopFilterFactory ignoreCase=true
 words=spanish_stop2.txt enablePositionIncrements=true /
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
filter class=solr.StopFilterFactory
ignoreCase=true
words=spanish_stop2.txt
enablePositionIncrements=true
/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
/analyzer
   /fieldtype

   fieldtype name=string2 class=solr.TextField
 positionIncrementGap=100 autoGeneratePhraseQueries=true
   analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
   analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
   /fieldtype
  /types

  fields

  field name=RutaFichero  type=string2   indexed=true
 stored=true  multiValued=false required=true/

  field name=TitlePVtype=string   indexed=true  stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true /

  field name=TextPVtype=string   indexed=true  stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true /

  field name=TitleParttype=string   indexed=true  stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true /

  field name=TextParttype=string   indexed=true  stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true /

  field name=TextSubPart type=string   indexed=true  stored=true
 multiValued=true termVectors=true termPositions=true
 termOffsets=true /

  field name=Acciontype=string   indexed=true  stored=true
 multiValued=true /

  field name=Actividadtype=string   indexed=true  stored=true
 multiValued=true /

  field name=Objeto   type=string   indexed=true  stored=true
 multiValued=true /

  field name=Objetivo   type=string   indexed=true  stored=true
 multiValued=true /

  field name=LineasDe   type=string   indexed=true  stored=true
 multiValued=true /

  field name=Linea   type=string   indexed=true  stored=true
 multiValued=true /

  field 

Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-25 Thread samuele.mattiuzzo
since i'm barely new to solr, can you please give some guidelines or provide
an example i can look at for starters?

i already tought about a singleton implementation, but i'm not sure where i
have to put it and how should i start coding it

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3283901.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Where the heck do you put maxAnalyzedChars?

2011-08-25 Thread Daniel Skiles
Thanks.  That seemed to do it.  I was thrown by the section of documentation
that said This parameter makes sense for Highlighter only and tried to put
it in the various highlighter elements.

On Wed, Aug 24, 2011 at 6:52 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (11/08/25 5:29), Daniel Skiles wrote:

 I have a very large field in my index that I need to highlight.  Where in
 the config file do I set the maxAnalyzedChars in order to make this work?
 Has anyone successfully done this?


 Placing it in your requestHandler should work. For example:

 requestHandler name=search class=solr.SearchHandler default=true
  lst name=defaults
int name=hl.maxAnalyzedChars**1000/int
  /lst
 /requestHandler

 koji
 --
 Check out Query Log Visualizer for Apache Solr
 http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/



Re: not equals query in solr

2011-08-25 Thread simon
http://wiki.apache.org/solr/SolrQuerySyntax has answers for you.

-Simon

On Thu, Aug 25, 2011 at 1:04 AM, Ranveer Kumar ranveer.s...@gmail.comwrote:

 any help...

 On Wed, Aug 24, 2011 at 12:58 PM, Ranveer Kumar ranveer.s...@gmail.com
 wrote:

  Hi,
 
  is it right way to do :
  q=(state:[* TO *] AND city:[* TO *])
 
  regards
  Ranveer
 
 
  On Wed, Aug 24, 2011 at 12:54 PM, Ranveer Kumar ranveer.s...@gmail.com
 wrote:
 
  Hi All,
 
  How to do negative query in solr. Following are the criteria :
  I have state and city field where I want to filter only those state and
  city which is not blank. something like: state NOT  AND city NOT .
  I tried -state: but its not working.
 
  Or suggest  me to do this in better way..
 
  regards
  Ranveer
 
 
 
 
 



Re: csv responsewriter and numfound

2011-08-25 Thread Jon Hoffman
I've added a case with patch here:
https://issues.apache.org/jira/browse/SOLR-2731

On Wed, Aug 24, 2011 at 11:59 AM, Jon Hoffman j...@foursquare.com wrote:

 I took a look at the source and agree that it would be a bit hairy to
 bubble up header settings from the response writers.

 Alternatively, and I'll admit that this is a somewhat hacky proposal, an
 optional parameter csv.numfound=true could be added to the request which
 would cause the first line of the response to be the numfound.  It would
 have no impact on existing behavior, and those who are interested in that
 value can simply read off the first line before sending to their usual csv
 parser.

 It's a trivial change to the code and I can create a JIRA ticket and submit
 the patch.

 This is my first interaction with this forum, so let me know if the dev
 list is a more appropriate place to propose changes.

 - Jon


 On Wed, Aug 24, 2011 at 10:47 AM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Good idea.  However response writers can't control HTTP response headers
 currently... Only the content type returned.

Erik

 On Aug 24, 2011, at 8:52, Jon Hoffman j...@foursquare.com wrote:

  What about the HTTP response header?
 
 
  Great question.  But how would that get returned in the response?
 
  It is a drag that the header is lost when results are written in CSV,
 but
  there really isn't an obvious spot for that information to be returned.
 
  I guess a comment would be one option.
 
  -Yonik
  http://www.lucidimagination.com
 
 
 





Paging over mutlivalued field results?

2011-08-25 Thread Darren Govoni

Hi,
  Is it possible to construct a query in Solr where the paged results are
matching multivalued fields and not documents?

thanks,
Darren


Highlight on alternateField

2011-08-25 Thread Val Minyaylo

Hi there,
I am trying to utilize highlighting alternateField and can't get 
highlights on the results from targeted fields. Is this expected 
behavior or am I understanding alternateFields wrong?



schema.xml:
field name=description type=text indexed=true stored=false 
multiValued=true termVectors=true termPositions=true 
termOffsets=true/
field name=description_highlighting type=string indexed=true 
stored=true  termVectors=true termPositions=true termOffsets=true/
copyField source=description dest=description_highlighting 
maxChars=2500/


solrconfig.xml:
str name=f.description.hl.alternateFielddescription_highlighting/str
str name=f.description.hl.alternateFieldLen100/str


RE: Text Analysis and copyField

2011-08-25 Thread Herman Kiefus
It had crossed my mind but for now we have a 'DictionarySource' field whose 
type utilizes the KeepWordFilterFactory that uses a text file containing all 
correctly spelled words (thanks to scrabble), location/last/first names 
(courtesy of the US census bureau) and a few other adds (month/day) names.  A 
file this large does not seem to have a material impact on indexing.

What we're seeing now (we also have a field 'TermsMisspelled' that utilizes the 
same text file with StopFilterFactory) is almost pure misspellings and some 
contractions (can't, won't, don't, etc.).

Thank you everyone for your help here, this is a truly fine community.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, August 24, 2011 1:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Text Analysis and copyField

Have you considered having two dictionaries and using ajax to query them both 
and intermingling the results in your suggestions? It'd be some work, but I 
think it might accomplish what you want.

Best
Erick

On Tue, Aug 23, 2011 at 1:48 PM, Herman Kiefus herm...@angieslist.com wrote:
 To close, I found this article from Hoss: 
 http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td
 3122408.html

 Since I cannot use one copyField directive to copy from another copyField's 
 dest[ination], I cannot achieve what I desire: some terms that are subject to 
 KeepWordFilterFactory and some that are not.

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, August 22, 2011 1:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Text Analysis and copyField

 I suspect that the things going into TermsDictionary are from fields other 
 than CorrectlySpelledTerms.

 In other words I don't think that anything is getting into TermsDictionary 
 from CorrectlySpelledTerms...

 Be careful to remove the index between schema changes, just to be sure that 
 you're not seeing old data.

 Best
 Erick

 On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus herm...@angieslist.com 
 wrote:
 That's what I thought, but my experiments show differently.  In actuality:

 I have a number of fields that are of type text (the default as it is 
 packaged).

 I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in 
 index-time analysis, using a file of terms which are known to be correctly 
 spelled.

 I have a type 'textDictionary' that has no index-time analysis.

 I have the fields:
 field name=CorrectlySpelledTerms type=textCorrectlySpelled
 indexed=false stored=false multiValued=true/ field 
 name=TermsDictionary type=textDictionary indexed=true
 stored=false multiValued=true/

 I want 'TermsDictionary' to contain only those terms from some fields that 
 are correctly spelled plus those terms from a couple other fields 
 (CompanyName and ContactName) as is.  I use several copyField directives as 
 follows:

 copyField source=Field1 dest=CorrectlySpelledTerms/ copyField 
 source=Field2 dest=CorrectlySpelledTerms/ copyField 
 source=Field3 dest=CorrectlySpelledTerms/

 copyField source=Name dest=TermsDictionary/ copyField 
 source=Contact dest=TermsDictionary/ copyField source 
 =CorrectlySpelledTerms dest=TermsDictionary/

 If I query 'Field1' for a term that I know is misspelled (electical) it 
 yields results.
 If I query 'TermsDictionary' for the same term it yields no results.

 It would seem by these results that 'TermsDictionary' only contains those 
 terms with misspellings stripped as a results of the text analysis on the 
 field 'CorrectlySpelledTerms'.

 Asked another way, I think you can see what I'm getting at: a source for the 
 spellchecker that only contains correct spelled terms plus proper names; 
 should I have gone about this in a different way?

 -Original Message-
 From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com]
 Sent: Monday, August 22, 2011 9:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Text Analysis and copyField

 On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus herm...@angieslist.com 
 wrote:
 Is my thinking correct?

 I have a field 'F1' of type 'T1' whose index time analysis employs the 
 StopFilterFactory.

 I also have a field 'F2' of type 'T2' whose index time analysis does NOT 
 employ the StopFilterFactory.

 There is a copyField directive source=F1 dest=F2

 F2 will not contain any stop words because they were filtered out as F1 was 
 populated.


 No, F2 will contain stop words.  Copy fields does not process input through 
 a chain, it sends the original content to each field and therefore analysis 
 is totally independent.

 --
 Stephen Duncan Jr
 www.stephenduncanjr.com




RE: Best way to anchor solr searches?

2011-08-25 Thread arian487
Thanks for the replies.  I did look at caching but our commit time time is 90
seconds.  It's definitely possible for someone to make a search, change the
page, and have wonky results.  How about getting it to autowarm the x most
recent searches in the queryResultCache and that can hopefully reduce the
issues?  Though even that can result in issues with the search being out of
date.  

Application cache per user would for sure solve such issues but I'd like to
avoid this if possible.  Definitely an interesting problem...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-anchor-solr-searches-tp3282576p3284674.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field type change / copy field

2011-08-25 Thread Erick Erickson
What version of Solr are you using? Because 3.2 (and I believe 3.1) and later
have faceting and range on numeric values, so there would be no need to use
two fields.

And you could then avoid the format thing entirely.

Best
Erick

On Wed, Aug 24, 2011 at 5:53 AM, Oliver Schihin
oliver.schi...@unibas.ch wrote:
 Hello list

 My documents come with a field holding a date, always a year:
 year2008/year

 In the schema, this content is taken for a field year as an integer, and
 it will be searchable.

 Through a copyfield-instruction I move the year to a facet_year-field,
 you guess, to use it for faceting and make range queries possible. Its field
 type is of the class 'solr.TrieDateField' that requires canonical date
 representation. Is there a way in solr to extend the simple year to
 facet_year2008-01-01T00:00:00Z/facet_year. Or, do i have to solve the
 problem in preprocessing, before posting?

 Thanks
 Oliver



Re: Query vs Filter Query Usage

2011-08-25 Thread Erick Erickson
The pitfalls of filter queries is also their strength. The results will be
cached and re-used if possible. This will take some memory,
of course. Depending upon how big your index is, this could
be quite a lot.

Yet another time/space tradeoff But yeah, use filter queries
until you have OOMs, then get more memory G...

Best
Erick

On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com wrote:
 Shawn -

     Thanks for your reply. Given that my application is mainly used as
 faceted search, would the following types of queries make sense or are there
 other pitfalls to consider?

 *q=*:*fq=someField:someValuefq=anotherField:anotherValue*

 Thanks!

 Josh

 On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:

 On 8/24/2011 2:02 PM, Joshua Harness wrote:

      I've done some basic query performance testing on my SOLR instance,
 which allows users to search via a faceted search interface. As such,
 document relevancy is less important to me since I am performing exact
 match
 searching. Comparing using filter queries with a plain query has yielded
 remarkable performance.  However, I'm suspicious of statements like
 'always
 use filter queries since they are so much faster'. In my experience,
 things
 are never so straightforward. Can anybody provide any further guidance?
 What
 are the pitfalls of relying heavily on filter queries? When would one want
 to use plain vanilla SOLR queries as opposed to filter queries?


 Completely separate from any performance consideration, the key to their
 usage lies in their name:  They are filters.  They are particularly useful
 in a faceted situation, because you can have more than one of them, and the
 overall result is the intersection (AND) of them all.

 When someone tells the interface to restrict their search by a facet, you
 can simply add a filter query with the field:value relating to that facet
 and reissue the query.  If they decide to remove that restriction, you just
 have to remove the filter query.  You don't have to try and combine the
 various pieces in the query, which means you'll have much less hassle with
 parentheses.

 If you need a union (OR) operation with your filters, you'll have to use
 more complex construction within a single filter query, or not use them at
 all.

 Thanks,
 Shawn





Re: Newbie Question, can I store structured sub elements?

2011-08-25 Thread Erick Erickson
nope, it's not easy. Solr docs are flat, flat, flat with the tiny
exception that multiValued fields are returned as a lists.

However, you can count on multi-valued fields being returned
in the order they were added, so it might work out for you to
treat these as parallel arrays in Solr documents.

Best
Erick

On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote:
 I know I can have multi value on them but that doesn't let me see that
 a showing instance happens at a particular time on a particular
 channel, just that it shows on a range of channels at a range of times

 Starting to think I will have to either store a formatted string that
 combines them or keep it flat just for indexing, retrieve ids and use
 them to get data out of the RDBMS


 On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote:

 You could change starttime and channelname to multiValued=true and use these 
 fields to store all the values for those fields.

 showing.movie_id and showing.id probably isn't needed in a solr record.



 On 8/24/11 7:53 AM, Zac Tolley wrote:
 I have a very scenario in which I have a film and showings, each film has
 multiple showings at set times on set channels, so I have:

 Movie
 -
 id
 title
 description
 duration


 Showing
 -
 id
 movie_id
 starttime
 channelname



 I want to know can I store this in solr so that I keep this stucture?

 I did try to do an initial import with the DIH using this config:

 entity name=movie query=SELECT * from movies
    field column=ID name=id/
    field column=TITLE name=title/
    field name=DESCRIPTION name=description/

    entity name=showing query=SELECT * from showings WHERE movie_id = ${
 movie.id}
      field column=ID name=id/
      field column=STARTTIME name=starttime/
      field column=CHANNELNAME name=channelname/
    /entity
 /entity

 I was hoping, for each movie to get a sub entity with the showing like:

 doc
    str name=title./str
    showing
      str name=channelname.



 but instead all the fields are flattened down to the top level.

 I know this must be easy, what am I missing... ?





RE: Solr in a windows shared hosting environment

2011-08-25 Thread Devora
Thank you!

Since it's shared hosting, how do I install java?

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: Thursday, August 25, 2011 4:34 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in a windows shared hosting environment

Yes, but since Solr is written in Java to run in a JEE container, you would
host Solr in a web application server, either Jetty (which comes packaged),
or something else (say, Tomcat or WebSphere or something like that).

As a result, you aren't going to find anything that says how to run Solr
under IIS because it doesn't run under IIS.

It doesn't need IIS, though it can certainly coexist alongside IIS.  If you
want the requests to go thru IIS you might need a plug-in in IIS to handle
that (IBM's WebSphere has such a plugin).  If you don't need the requests to
go thru IIS, then that isn't an issue.

Hope that helps.

JRJ

-Original Message-
From: Devora [mailto:devora...@gmail.com] 
Sent: Thursday, August 25, 2011 5:15 AM
To: solr-user@lucene.apache.org
Subject: Solr in a windows shared hosting environment

Hi,

 

Is it possible to install Solr in  a windows (IIS 7 or IIS 7.5)  shared
hosting environment?

If yes, where can I find instructions how to do that?

 

Thank you!



Re: Preserve XML hierarchy

2011-08-25 Thread Erick Erickson
Jars aren't where it's at. You apply patches to *source* code,
then compile.

Here's a good place to start understanding this process:

http://wiki.apache.org/solr/HowToContribute

See getting the code and working with patches

I *strongly* advise you to get the code and compile
it and run it first before applying the patch, just to
eliminate an extra variable...

Best
Erick

On Thu, Aug 25, 2011 at 3:56 AM, _snake_ lucas.mig...@gmail.com wrote:
 Hi Michael, Thanks for your help!

 I am using Apache Solr 3.2 on windows.

 I am trying to apply the 2 patches (
 https://issues.apache.org/jira/browse/SOLR-2597?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#issue-tabs
 XMLCharFilter Patch ), but I have no idea to do that.

 What do I need to open the Solr project? Or Which is the .jar file that I
 need to open?
 I saw that there is 2 files  (woodstox and stax2), what I have to do with
 that files?

 Thanks!

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Preserve-XML-hierarchy-tp3166690p3283275.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Newbie Question, can I store structured sub elements?

2011-08-25 Thread Zac Tolley
have come to that conclusion  so had to choose between multiple fields with
multiple vales or a field with delimited text, gone for the former

On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson erickerick...@gmail.comwrote:

 nope, it's not easy. Solr docs are flat, flat, flat with the tiny
 exception that multiValued fields are returned as a lists.

 However, you can count on multi-valued fields being returned
 in the order they were added, so it might work out for you to
 treat these as parallel arrays in Solr documents.

 Best
 Erick

 On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote:
  I know I can have multi value on them but that doesn't let me see that
  a showing instance happens at a particular time on a particular
  channel, just that it shows on a range of channels at a range of times
 
  Starting to think I will have to either store a formatted string that
  combines them or keep it flat just for indexing, retrieve ids and use
  them to get data out of the RDBMS
 
 
  On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote:
 
  You could change starttime and channelname to multiValued=true and use
 these fields to store all the values for those fields.
 
  showing.movie_id and showing.id probably isn't needed in a solr record.
 
 
 
  On 8/24/11 7:53 AM, Zac Tolley wrote:
  I have a very scenario in which I have a film and showings, each film
 has
  multiple showings at set times on set channels, so I have:
 
  Movie
  -
  id
  title
  description
  duration
 
 
  Showing
  -
  id
  movie_id
  starttime
  channelname
 
 
 
  I want to know can I store this in solr so that I keep this stucture?
 
  I did try to do an initial import with the DIH using this config:
 
  entity name=movie query=SELECT * from movies
 field column=ID name=id/
 field column=TITLE name=title/
 field name=DESCRIPTION name=description/
 
 entity name=showing query=SELECT * from showings WHERE movie_id
 = ${
  movie.id}
   field column=ID name=id/
   field column=STARTTIME name=starttime/
   field column=CHANNELNAME name=channelname/
 /entity
  /entity
 
  I was hoping, for each movie to get a sub entity with the showing like:
 
  doc
 str name=title./str
 showing
   str name=channelname.
 
 
 
  but instead all the fields are flattened down to the top level.
 
  I know this must be easy, what am I missing... ?
 
 
 



Re: Newbie Question, can I store structured sub elements?

2011-08-25 Thread Paul Libbrecht
Delimited text is the baby form of lists.
Text can be made very very structured (think XML, ontologies...).
I think the crux is your search needs.

For example, with Lucene, I made a search for formulæ (including sub-terms) by 
converting the OpenMath-encoded terms into rows of tokens and querying with 
SpanQueries. Quite structured to my taste.

What you don't have is the freedom of joins which brings a very flexible query 
mechanism almost independent of the schema... but this often can be 
circumvented by the flat solr and lucene storage whose performance is really 
amazing.

paul


Le 25 août 2011 à 21:07, Zac Tolley a écrit :

 have come to that conclusion  so had to choose between multiple fields with
 multiple vales or a field with delimited text, gone for the former
 
 On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 nope, it's not easy. Solr docs are flat, flat, flat with the tiny
 exception that multiValued fields are returned as a lists.
 
 However, you can count on multi-valued fields being returned
 in the order they were added, so it might work out for you to
 treat these as parallel arrays in Solr documents.
 
 Best
 Erick
 
 On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote:
 I know I can have multi value on them but that doesn't let me see that
 a showing instance happens at a particular time on a particular
 channel, just that it shows on a range of channels at a range of times
 
 Starting to think I will have to either store a formatted string that
 combines them or keep it flat just for indexing, retrieve ids and use
 them to get data out of the RDBMS
 
 
 On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote:
 
 You could change starttime and channelname to multiValued=true and use
 these fields to store all the values for those fields.
 
 showing.movie_id and showing.id probably isn't needed in a solr record.
 
 
 
 On 8/24/11 7:53 AM, Zac Tolley wrote:
 I have a very scenario in which I have a film and showings, each film
 has
 multiple showings at set times on set channels, so I have:
 
 Movie
 -
 id
 title
 description
 duration
 
 
 Showing
 -
 id
 movie_id
 starttime
 channelname
 
 
 
 I want to know can I store this in solr so that I keep this stucture?
 
 I did try to do an initial import with the DIH using this config:
 
 entity name=movie query=SELECT * from movies
   field column=ID name=id/
   field column=TITLE name=title/
   field name=DESCRIPTION name=description/
 
   entity name=showing query=SELECT * from showings WHERE movie_id
 = ${
 movie.id}
 field column=ID name=id/
 field column=STARTTIME name=starttime/
 field column=CHANNELNAME name=channelname/
   /entity
 /entity
 
 I was hoping, for each movie to get a sub entity with the showing like:
 
 doc
   str name=title./str
   showing
 str name=channelname.
 
 
 
 but instead all the fields are flattened down to the top level.
 
 I know this must be easy, what am I missing... ?
 
 
 
 



Re: Solr in a windows shared hosting environment

2011-08-25 Thread simon
That's not a question we can answer in this group - you need to take it up
with your hosting provider - they may already have it available.

On Thu, Aug 25, 2011 at 2:59 PM, Devora devora...@gmail.com wrote:

 Thank you!

 Since it's shared hosting, how do I install java?

 -Original Message-
 From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov]
 Sent: Thursday, August 25, 2011 4:34 PM
 To: solr-user@lucene.apache.org
 Subject: RE: Solr in a windows shared hosting environment

 Yes, but since Solr is written in Java to run in a JEE container, you would
 host Solr in a web application server, either Jetty (which comes packaged),
 or something else (say, Tomcat or WebSphere or something like that).

 As a result, you aren't going to find anything that says how to run Solr
 under IIS because it doesn't run under IIS.

 It doesn't need IIS, though it can certainly coexist alongside IIS.  If you
 want the requests to go thru IIS you might need a plug-in in IIS to handle
 that (IBM's WebSphere has such a plugin).  If you don't need the requests
 to
 go thru IIS, then that isn't an issue.

 Hope that helps.

 JRJ

 -Original Message-
 From: Devora [mailto:devora...@gmail.com]
 Sent: Thursday, August 25, 2011 5:15 AM
 To: solr-user@lucene.apache.org
 Subject: Solr in a windows shared hosting environment

 Hi,



 Is it possible to install Solr in  a windows (IIS 7 or IIS 7.5)  shared
 hosting environment?

 If yes, where can I find instructions how to do that?



 Thank you!




Re: Newbie Question, can I store structured sub elements?

2011-08-25 Thread Zac Tolley
My search is very simple, mainly on titles, actors, show times and channels.
Having multiple lists of values is probably better for that, and as the
order is kept the same its relatively simple to map the response back onto
pojos for my presentation layer.

On Thu, Aug 25, 2011 at 8:18 PM, Paul Libbrecht p...@hoplahup.net wrote:

 Delimited text is the baby form of lists.
 Text can be made very very structured (think XML, ontologies...).
 I think the crux is your search needs.

 For example, with Lucene, I made a search for formulæ (including sub-terms)
 by converting the OpenMath-encoded terms into rows of tokens and querying
 with SpanQueries. Quite structured to my taste.

 What you don't have is the freedom of joins which brings a very flexible
 query mechanism almost independent of the schema... but this often can be
 circumvented by the flat solr and lucene storage whose performance is really
 amazing.

 paul


 Le 25 août 2011 à 21:07, Zac Tolley a écrit :

  have come to that conclusion  so had to choose between multiple fields
 with
  multiple vales or a field with delimited text, gone for the former
 
  On Thu, Aug 25, 2011 at 7:58 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  nope, it's not easy. Solr docs are flat, flat, flat with the tiny
  exception that multiValued fields are returned as a lists.
 
  However, you can count on multi-valued fields being returned
  in the order they were added, so it might work out for you to
  treat these as parallel arrays in Solr documents.
 
  Best
  Erick
 
  On Thu, Aug 25, 2011 at 3:10 AM, Zac Tolley z...@thetolleys.com wrote:
  I know I can have multi value on them but that doesn't let me see that
  a showing instance happens at a particular time on a particular
  channel, just that it shows on a range of channels at a range of times
 
  Starting to think I will have to either store a formatted string that
  combines them or keep it flat just for indexing, retrieve ids and use
  them to get data out of the RDBMS
 
 
  On 24 Aug 2011, at 23:09, dan whelan d...@adicio.com wrote:
 
  You could change starttime and channelname to multiValued=true and use
  these fields to store all the values for those fields.
 
  showing.movie_id and showing.id probably isn't needed in a solr
 record.
 
 
 
  On 8/24/11 7:53 AM, Zac Tolley wrote:
  I have a very scenario in which I have a film and showings, each film
  has
  multiple showings at set times on set channels, so I have:
 
  Movie
  -
  id
  title
  description
  duration
 
 
  Showing
  -
  id
  movie_id
  starttime
  channelname
 
 
 
  I want to know can I store this in solr so that I keep this stucture?
 
  I did try to do an initial import with the DIH using this config:
 
  entity name=movie query=SELECT * from movies
field column=ID name=id/
field column=TITLE name=title/
field name=DESCRIPTION name=description/
 
entity name=showing query=SELECT * from showings WHERE movie_id
  = ${
  movie.id}
  field column=ID name=id/
  field column=STARTTIME name=starttime/
  field column=CHANNELNAME name=channelname/
/entity
  /entity
 
  I was hoping, for each movie to get a sub entity with the showing
 like:
 
  doc
str name=title./str
showing
  str name=channelname.
 
 
 
  but instead all the fields are flattened down to the top level.
 
  I know this must be easy, what am I missing... ?
 
 
 
 




Re: Query vs Filter Query Usage

2011-08-25 Thread Joshua Harness
Erick -

Thanks for the insight. Does the filter cache just cache the internal
document id's of the result set, correct (as opposed to the document)? If
so, am I correct in the following math:

10,000,000 document index
Internal Document id is 32 bit unsigned int
Max Memory Used by a single cache slot in the filter cache = 32 bits x
10,000,000 docs = 320,000,000 bits or 38 MB

Of course, I realize there some additional overhead if we're dealing with
Integer objects as opposed to primitives -- and I'm way off if the internal
document id is implemented as a long.

Also, does SOLR fail gracefully when an OOM occurs (e.g. the cache fails but
the query still succeeds)?

Thanks!

Josh

On Thu, Aug 25, 2011 at 2:55 PM, Erick Erickson erickerick...@gmail.comwrote:

 The pitfalls of filter queries is also their strength. The results will be
 cached and re-used if possible. This will take some memory,
 of course. Depending upon how big your index is, this could
 be quite a lot.

 Yet another time/space tradeoff But yeah, use filter queries
 until you have OOMs, then get more memory G...

 Best
 Erick

 On Wed, Aug 24, 2011 at 8:07 PM, Joshua Harness jkharnes...@gmail.com
 wrote:
  Shawn -
 
  Thanks for your reply. Given that my application is mainly used as
  faceted search, would the following types of queries make sense or are
 there
  other pitfalls to consider?
 
  *q=*:*fq=someField:someValuefq=anotherField:anotherValue*
 
  Thanks!
 
  Josh
 
  On Wed, Aug 24, 2011 at 4:48 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 8/24/2011 2:02 PM, Joshua Harness wrote:
 
   I've done some basic query performance testing on my SOLR
 instance,
  which allows users to search via a faceted search interface. As such,
  document relevancy is less important to me since I am performing exact
  match
  searching. Comparing using filter queries with a plain query has
 yielded
  remarkable performance.  However, I'm suspicious of statements like
  'always
  use filter queries since they are so much faster'. In my experience,
  things
  are never so straightforward. Can anybody provide any further guidance?
  What
  are the pitfalls of relying heavily on filter queries? When would one
 want
  to use plain vanilla SOLR queries as opposed to filter queries?
 
 
  Completely separate from any performance consideration, the key to their
  usage lies in their name:  They are filters.  They are particularly
 useful
  in a faceted situation, because you can have more than one of them, and
 the
  overall result is the intersection (AND) of them all.
 
  When someone tells the interface to restrict their search by a facet,
 you
  can simply add a filter query with the field:value relating to that
 facet
  and reissue the query.  If they decide to remove that restriction, you
 just
  have to remove the filter query.  You don't have to try and combine the
  various pieces in the query, which means you'll have much less hassle
 with
  parentheses.
 
  If you need a union (OR) operation with your filters, you'll have to use
  more complex construction within a single filter query, or not use them
 at
  all.
 
  Thanks,
  Shawn
 
 
 



Re: Newbie Question, can I store structured sub elements?

2011-08-25 Thread Paul Libbrecht
Whether multi-valued or token-streams, the question is search, not 
(de)serialization: that's opaque to Solr which will take and give it to you as 
needed.

paul


Le 25 août 2011 à 21:24, Zac Tolley a écrit :

 My search is very simple, mainly on titles, actors, show times and channels.
 Having multiple lists of values is probably better for that, and as the
 order is kept the same its relatively simple to map the response back onto
 pojos for my presentation layer.



Solr Implementations

2011-08-25 Thread zarni aung
First, I would like to apologize if this is a repeat question but can't seem
to get the right answer anywhere.

   - What happens to pending documents when the server dies abruptly?  I
   understand that when the server shuts down gracefully, it will commit the
   pending documents and close the IndexWriter.  For the case where the server
   just crashes,  I am assuming that the pending documents are lost but would
   it also corrupt the index files?  If so, when the server comes back online
   what is the state?  I would think that a full re-indexing is in order.


   - What are the dangers of having n-number of ReadOnly Solr instances
   pointing to the same data directory?  (Shared by a SAN)?  Will there be
   issues with locking?  This is a scenario with replication.  The Read-Only
   instances are pointing to the same data directory on a SAN.

Thank you very much.

Z


RE: Solr in a windows shared hosting environment

2011-08-25 Thread Jaeger, Jay - DOT
You visit the Sun (oops, I mean Oracle -- old habits die hard) web site, 
download it and install it, or, being shared, you ask your provider to do it 
for you.  They might decline, of course, in which case you host another web 
server using a hosting provide who does support java (or, at least another 
machine, if they don't want to have a Java app. container alongside IIS.)


http://java.com/en/download/index.jsp


-Original Message-
From: Devora [mailto:devora...@gmail.com] 
Sent: Thursday, August 25, 2011 2:00 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in a windows shared hosting environment

Thank you!

Since it's shared hosting, how do I install java?

-Original Message-
From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] 
Sent: Thursday, August 25, 2011 4:34 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr in a windows shared hosting environment

Yes, but since Solr is written in Java to run in a JEE container, you would
host Solr in a web application server, either Jetty (which comes packaged),
or something else (say, Tomcat or WebSphere or something like that).

As a result, you aren't going to find anything that says how to run Solr
under IIS because it doesn't run under IIS.

It doesn't need IIS, though it can certainly coexist alongside IIS.  If you
want the requests to go thru IIS you might need a plug-in in IIS to handle
that (IBM's WebSphere has such a plugin).  If you don't need the requests to
go thru IIS, then that isn't an issue.

Hope that helps.

JRJ

-Original Message-
From: Devora [mailto:devora...@gmail.com] 
Sent: Thursday, August 25, 2011 5:15 AM
To: solr-user@lucene.apache.org
Subject: Solr in a windows shared hosting environment

Hi,

 

Is it possible to install Solr in  a windows (IIS 7 or IIS 7.5)  shared
hosting environment?

If yes, where can I find instructions how to do that?

 

Thank you!



RE: Query vs Filter Query Usage

2011-08-25 Thread Michael Ryan
 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

I think it depends on where exactly the result set was generated. I believe the 
result set will usually be represented by a BitDocSet, which requires 1 bit per 
doc in your index (result set size doesn't matter), so in your case it would be 
about 1.2MB.

-Michael


Re: Query vs Filter Query Usage

2011-08-25 Thread Yonik Seeley
On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote:
 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

 I think it depends on where exactly the result set was generated. I believe 
 the result set will usually be represented by a BitDocSet, which requires 1 
 bit per doc in your index (result set size doesn't matter), so in your case 
 it would be about 1.2MB.

Right - and Solr switches between the implementation depending on set
size... so if the number of documents in the set were 100, then it
would only take up 400 bytes.

-Yonik
http://www.lucidimagination.com


solr UIMA exception

2011-08-25 Thread chanhangfai
Hi,

I have followed this solr UIMA config, using AlchemyAPIAnnotator and
OpenCalaisAnnotator. 
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt
http://wiki.apache.org/solr/SolrUIMA


so, I got the AlchemyAPI key and OpenCalais key.

and I can successfully hit
http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities?apikey=my_alchemy_keyurl=www.cnn.com


but somehow, i got the following exception when I run 
*java -jar post.jar entity.xml*

-
this is entity.xml

add
doc
field name=id12345/field
field name=textSenator Dick Durbin (D-IL)  Chicago , March
3,2007./field
field name=titleEntity Extraction/field
/doc
/add
-

any suggestion, really appreciated..please..



Aug 25, 2011 5:11:50 PM WhitespaceTokenizer process
INFO: Whitespace tokenizer starts processing
Aug 25, 2011 5:11:50 PM WhitespaceTokenizer process
INFO: Whitespace tokenizer finished processing
Aug 25, 2011 5:11:54 PM
*org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl
callAnalysisComponentProcess(405)*
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.uima.alchemy.annotator.AbstractAlchemyAnnotator.process(AbstractAlchemyAnnotator.java:138)
at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:151)
at
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:77)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by:
org.apache.uima.alchemy.digester.exception.ResultDigestingException:
org.apache.uima.alchemy.annotator.exception.AlchemyCallFailedException:
ERROR
at
org.apache.uima.alchemy.annotator.AbstractAlchemyAnnotator.process(AbstractAlchemyAnnotator.java:133)
... 35 more
Caused by:

Re: Query vs Filter Query Usage

2011-08-25 Thread Lance Norskog
The point of filter queries is that they are applied very early in the
searching algorithm, and thus cut the amount of work later on. Some
complex queries take a lot of time and so this pre-trimming helps a
lot.

On Thu, Aug 25, 2011 at 2:37 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Thu, Aug 25, 2011 at 5:19 PM, Michael Ryan mr...@moreover.com wrote:
 10,000,000 document index
 Internal Document id is 32 bit unsigned int
 Max Memory Used by a single cache slot in the filter cache = 32 bits x
 10,000,000 docs = 320,000,000 bits or 38 MB

 I think it depends on where exactly the result set was generated. I believe 
 the result set will usually be represented by a BitDocSet, which requires 1 
 bit per doc in your index (result set size doesn't matter), so in your case 
 it would be about 1.2MB.

 Right - and Solr switches between the implementation depending on set
 size... so if the number of documents in the set were 100, then it
 would only take up 400 bytes.

 -Yonik
 http://www.lucidimagination.com




-- 
Lance Norskog
goks...@gmail.com


Is it possible to do a partial update of a doc's fields in the index?

2011-08-25 Thread Goran Pocina
New to Solr and Lucene.   We're indexing text, pdf, html docs located on
local Unix file systems, and need the ability to search for file owner,
group, and other Linux file metadata, in addition to the file contents.  It
would be great if we could use nutch to index everything, and then crawl
through the file system again with a 10 line shell script that passed the
missing metadata to solr, and updated the existing docs.

But adddoc deletes all the old fields even if they're not present in the
new document.

If partial updates aren't possible,  what would be the best way to
accomplish what we need?  Do we want to to modify the source code for each
of the different doc format parsers to add support for this metadata?

Thanks,


Re: Solr indexing process: keep a persistent Mysql connection throu all the indexing process

2011-08-25 Thread Erick Erickson
Put it anywhere you want G. Here's a good place
to start: http://www.javapractices.com/topic/TopicAction.do?Id=46
where the distributePresents method is the one you have
that returns the connection.

Here's a sample class that doesn't do much...
public enum MyEnum {
INSTANCE;
  private String _tester = ;

public void doStuff(String stuff) {
if (_tester.length() == 0) {
  _tester = stuff;
  System.out.println(In initialization);
}
System.out.println(Tester is  + _tester +  Stuff is  + stuff);

}
}

you can imagine doStuff as getConnection with logic to initialize the connection
where the string _tester is defined.

Now you can call it from anywhere the enum is available like this:

public class MyMain {
  public static void main(String[] args) {
MyEnum.INSTANCE.doStuff(first time);
MyEnum.INSTANCE.doStuff(second time);
MyEnum.INSTANCE.doStuff(third time);
  }
}

Best
Erick

On Thu, Aug 25, 2011 at 9:43 AM, samuele.mattiuzzo samum...@gmail.com wrote:
 since i'm barely new to solr, can you please give some guidelines or provide
 an example i can look at for starters?

 i already tought about a singleton implementation, but i'm not sure where i
 have to put it and how should i start coding it

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-indexing-process-keep-a-persistent-Mysql-connection-throu-all-the-indexing-process-tp3278608p3283901.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Paging over mutlivalued field results?

2011-08-25 Thread Erick Erickson
Hmm, I don't quite understand what you want. An example
or two would help.

Best
Erick

On Thu, Aug 25, 2011 at 12:11 PM, Darren Govoni dar...@ontrenet.com wrote:
 Hi,
  Is it possible to construct a query in Solr where the paged results are
 matching multivalued fields and not documents?

 thanks,
 Darren



Re: Highlight on alternateField

2011-08-25 Thread Koji Sekiguchi

(11/08/26 2:32), Val Minyaylo wrote:

Hi there,
I am trying to utilize highlighting alternateField and can't get highlights on 
the results from
targeted fields. Is this expected behavior or am I understanding 
alternateFields wrong?


Yes, it is expected behavior.


solrconfig.xml:
str name=f.description.hl.alternateFielddescription_highlighting/str
str name=f.description.hl.alternateFieldLen100/str


With hl=onhl.fl=description parameters, you can get the first 100 chars of
the (raw) stored description_highlighting field value if highlighter cannot 
generate
snippets on description field for some reason.

koji
--
Check out Query Log Visualizer for Apache Solr
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Re: Query parameter changes from solr 1.4 to 3.3

2011-08-25 Thread Yonik Seeley
On Tue, Aug 23, 2011 at 7:11 AM, Samarendra Pratap samarz...@gmail.com wrote:
  We are upgrading solr 1.4 (with collapsing patch solr-236) to solr 3.3. I
 was looking for the required changes in query parameters (or parameter
 names) if any.

There should be very few (but check CHANGES.txt as Erick pointed out).
We try to keep the main HTTP APIs very stable, even across major versions.

  One thing I know for sure is that collapse and its sub-options are now
 known by group, but didn't find anything else.

Field collapsing/grouping was never in any 1.4 release.

-Yonik
http://www.lucidimagination.com


RE: how to deal with URLDatasource which needs authorization?

2011-08-25 Thread deniz
Well, let me explain in details about the problem...

I have a website www.blablabla.com on which users can have profiles, with
any kind of information. And each user has an id, something like user_xyz.
So www.blablabla.com/user_xyz shows user profile, and
www.blablabla.com/solr/index/user_xyz shows an xml file, holding all of the
static information about the user. Solr uses
www.blablabla.com/solr/index/user_xyz to index the data.

Currently www.blablabla.com/solr/index/user_xyz is accessible by everyone,
both users and non-users of the site... 

I would like to put some kind of secuirty thing which only allows solr to
access www.blablabla.com/solr/index/user_xyz, and preventing both users and
non users to access it. So that link will be a 'solr only' link.

is there any other options than restricting ip address for access this link?
or that is the only option?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-deal-with-URLDatasource-which-needs-authorization-tp3280515p3285579.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is it possible to do a partial update of a doc's fields in the index?

2011-08-25 Thread Goran Pocina
Got an answer from the excellent support folks at LucidWorks:

 Currently Lucene/Solr can't do field-level updating. So whenever a new
document is indexed, if it has the same unique identifier (in this case
id) field, then the new document replaces the older document. There is an
open JIRA issue in Lucene to add field-level updates, but it is a very
major, long-term task.

Thanks,

On Thu, Aug 25, 2011 at 6:45 PM, Goran Pocina gpoc...@gmail.com wrote:

 New to Solr and Lucene.   We're indexing text, pdf, html docs located on
 local Unix file systems, and need the ability to search for file owner,
 group, and other Linux file metadata, in addition to the file contents.  It
 would be great if we could use nutch to index everything, and then crawl
 through the file system again with a 10 line shell script that passed the
 missing metadata to solr, and updated the existing docs.

 But adddoc deletes all the old fields even if they're not present in
 the new document.

 If partial updates aren't possible,  what would be the best way to
 accomplish what we need?  Do we want to to modify the source code for each
 of the different doc format parsers to add support for this metadata?

 Thanks,



Re: Paging over mutlivalued field results?

2011-08-25 Thread Darren Govoni

Hi Erick,
   Sure thing.

I have a document schema where I put the sentences of that document in a 
multivalued field sentences.

I search that field in a query but get back the document results, naturally.

I then need to further find which exact sentences matched the query (for 
each document result)
and then do my own paging since I am only returning pages of sentences 
and not the whole document.

(i.e. I don't want to page the document results).

Does this make sense? Or is there a better way Solr can accomodate this?

Much appreciated.

Darren

On 08/25/2011 07:24 PM, Erick Erickson wrote:

Hmm, I don't quite understand what you want. An example
or two would help.

Best
Erick

On Thu, Aug 25, 2011 at 12:11 PM, Darren Govonidar...@ontrenet.com  wrote:

Hi,
  Is it possible to construct a query in Solr where the paged results are
matching multivalued fields and not documents?

thanks,
Darren





missing field in schema browser on solr admin

2011-08-25 Thread deniz
hi all...

i have added a new field to index... but now when i check solr admin, i see
some interesting stuff...

i can see the field in schema and also db config file but there is nothing
about the field in schema browser... in addition i cant make a search in
that field... all of the config files seem correct but still no change...


any ideas or anyone who has ever had a similar problem?

-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/missing-field-in-schema-browser-on-solr-admin-tp3285739p3285739.html
Sent from the Solr - User mailing list archive at Nabble.com.


making a Solr search that returns documents where every field in the document is matched

2011-08-25 Thread Henry Ho
Hi everybody,

I've just started using solr. Not sure if this is too specific of a problem,
but here goes.

My situation is I have a semi long query and then im searching on very short
documents. The basic issue is I want this query to return documents where
every word/token in the document is matched.  I also have a synonyms file.

The way i've been going about this is using dismaxparserplugin's minimum
match and indexing each document's length, and then do max_tokens_document #
of queries, where each query is mm=x, doc_length=x, q=query_string.

Example:
dog, dogs puppy, and canine are synonyms
query=dog dog cat love puppy canine water not
doc1=cat love dog matches
doc2=cat hate water doesn't match
doc3=hot dog contests rule matches, but i don't want it to.


doc4 happens because dog, dog, puppy, canine all match, so that's four
matches, and doc3's token_length is also four.  Four words in my query are
matching to the same word in the doc, and each of them counts toward the
minimum match count.

Thanks,

Henry


Re: SolrServer instances

2011-08-25 Thread Jonty Rhods
Deal all please help I am stuck here as I have not much experience..

thanks

On Thu, Aug 25, 2011 at 6:51 PM, Jonty Rhods jonty.rh...@gmail.com wrote:

 Hi All,

 I am using SolrJ (3.1) and Tomcat 6.x. I want to open solr server once (20
 concurrence) and reuse this across all the site. Or something like
 connection pool like we are using for DB (ie Apache DBCP). There is a way to
 use static method which is a way but I want better solution from you people.



 I read one threade where Ahmet suggest to use something like that

 String serverPath = http://localhost:8983/solr;;
 HttpClient client = new HttpClient(new
 MultiThreadedHttpConnectionManager());
 URL url = new URL(serverPath);
 CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer(url, client);

 But how to use instance of this across all class.

 Please suggest.

 regards
 Jonty



FileDataSource baseDir to be solr.data.dir

2011-08-25 Thread abhayd
hi 

we have multiple environments where we run solr and use DIH to index
promotions.xml file.
here is snippet from dih config file

entity name=f processor=FileListEntityProcessor
baseDir=/sites/ fileName=promotions.xml 

how do i set base dir to be solr.data.dir? on each server solr.data.dir is
different. I use multi core solr instance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/FileDataSource-baseDir-to-be-solr-data-dir-tp3285872p3285872.html
Sent from the Solr - User mailing list archive at Nabble.com.