date:20110317

Without even looking at the different segment files, things look odd:
You say that you optimize every day, yet I see segments up to 4 days old.
Also look at all the segments_??? files... each represents a commit
point of the index.
So it looks like you have 16 snapshots (or commit points) of the index.
Do you have a deletion policy configured to do this for some reason?

Anyway, this is why when you changed how you index, you didn't see
much of a size increase (comparatively).

-Yonik
http://lucidimagination.com



On Wed, Mar 16, 2011 at 7:46 PM, Robert Petersen rober...@buy.com wrote:
 Thanks for the reply Yonik, Here are the results of Ls -l on the master 
 server index folder, also please note we have hundreds of those small 
 sparsely populated fields and I run optimize once a day at midnight.  We 
 index 24/7 off a queue at a clip of about 200K docs per hour so the index has 
 had hundreds of commits since last night at midnight.

[...]

RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread Pierre GOSSE

I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my config 
file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't 
try with 7.0.11 yet.

I wonder why your exception point to line 4 column 6, however. Shouldn't it 
point to line 1 column 1 ? Do you have some blank lines at the start of your 
XML file or some non blank lines ?

Pierre

-Message d'origine-
De : François Schiettecatte [mailto:fschietteca...@gmail.com] 
Envoyé : jeudi 17 mars 2011 14:48
À : solr-user@lucene.apache.org
Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 

Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'?xml version=1.0 encoding=utf-8?', 


Here is my context file:

Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/omim/index/ override=true /
/Context
---

Hope this helps.

Cheers

François


On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:

 Hello list,
 
 Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
 past I have been using guidance in accordance with
 http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
 but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
 E.g.
 
 INFO: Deploying configuration descriptor wombra.xml  This is my context
 fragment
 from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
 SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
 target matching [xX][mM][lL] is not allowed.
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor wombra.xml
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 some more
 ...
 
 My configuration descriptor is as follows
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/home/lewis/Downloads/wombra/wombra.war
 crossContext=true
  Environment name=solr/home type=java.lang.String
 value=/home/lewis/Downloads/wombra override=true/
 /Context
 
 Preferably I would upload a WAR file, but I have been working well with
 the configuration I have been using up until now therefore I didn't
 question change.
 I am unfamiliar with the above errors. Can anyone please point me in the
 right direction?
 
 Thank you
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education's Widening Participation Initiative of the 
 Year 2009 and Herald Society's Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education's Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: SOLR building problems

2011-03-17 Thread royr

java version 1.6.0_21
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) Server VM (build 17.0-b16, mixed mode)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2693574.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread McGibbney, Lewis John

Hi François,

Thank you for your reply. I had made a simple mistake of including comments 
before
'?xml version=1.0 encoding=utf-8?', therefore I was getting a SAX error.
As you have correctly pointed out, it is not essential to include the snippet 
as above in the context file (if using one), however it might be useful to know 
that Tomcat 7 now validates XML files by default. In time I will get round to 
editing the wiki accordingly to mitigate against this in the future.

Thanks for looking in to this.

Lewis
___
From: François Schiettecatte [fschietteca...@gmail.com]
Sent: 17 March 2011 13:47
To: solr-user@lucene.apache.org
Subject: Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'?xml version=1.0 encoding=utf-8?',


Here is my context file:

Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/omim/index/ override=true /
/Context
---

Hope this helps.

Cheers

François

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Erik Hatcher

Yes, pivot faceting is committed to trunk.  But is not part of upcoming 3.1 
release.

Erik

On Mar 16, 2011, at 15:00 , McGibbney, Lewis John wrote:

 Hi Erik,
 
 I have been reading about the progression of SOLR-792 into pivot faceting, 
 however can you expand to comment on
 where it is committed. Are you referring to trunk?
 The reason I am asking is that I have been using 1.4.1 for some time now and 
 have been thinking of upgrading to trunk... or branch
 
 Thank you Lewis
 
 From: Erik Hatcher [erik.hatc...@gmail.com]
 Sent: 16 March 2011 17:36
 To: solr-user@lucene.apache.org
 Subject: Re: hierarchical faceting, SOLR-792 - confused on config
 
 Sorry, I missed the original mail on this thread
 
 I put together that hierarchical faceting wiki page a couple of years ago 
 when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches.  
 Since then, SOLR-792 morphed and is committed as pivot faceting.  SOLR-64 
 spawned a PathTokenizer which is part of Solr now too.
 
 Recently Toke updated that page with some additional info.  It's definitely 
 not a how to page, and perhaps should get renamed/moved/revamped?  Toke?
 
Erik
 
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: Solr Autosuggest help

2011-03-17 Thread rahul

hi,

We have found that 'EnglishPorterFilterFactory' causes that issue. I believe
that is used for stemming words. Once we commented that factory, it works
fine.

And another thing, currently I am checking about how the word 'sci/tech'
will be indexed in solr. As mentioned in my previous email, if I search on
sci/tech it wont send any results. But solr has the terms as sci/tech. When
I search on other terms which also contain sci/tech, it returns both the
words.

Please let me know, if you have any idea regarding that.. If I came to know
I will update this thread.

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Erik Hatcher


On Mar 16, 2011, at 14:53 , Jonathan Rochkind wrote:

 Interesting, any documentation on the PathTokenizer anywhere? Or just have to 
 find and look at the source? That's something I hadn't known about, which may 
 be useful to some stuff I've been working on depending on how it works.

  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory

Sorry, I said PathTokenizer which is what SOLR-1057 called it for a bit 
before it got renamed.

Erik

Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread François Schiettecatte

Pierre

That is a very good point, I have been caught in the past by poor xml (RSS 
feeds) that included control characters before the '?xml...' .

And I have add the preamble to my solr.xml files for good form :)

François


On Mar 17, 2011, at 10:02 AM, Pierre GOSSE wrote:

 I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my 
 config file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. 
 Haven't try with 7.0.11 yet.
 
 I wonder why your exception point to line 4 column 6, however. Shouldn't it 
 point to line 1 column 1 ? Do you have some blank lines at the start of your 
 XML file or some non blank lines ?
 
 Pierre
 
 -Message d'origine-
 De : François Schiettecatte [mailto:fschietteca...@gmail.com] 
 Envoyé : jeudi 17 mars 2011 14:48
 À : solr-user@lucene.apache.org
 Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 
 
 Lewis
 
 My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
 context file and it does not have the xml preamble your has, specifically: 
 '?xml version=1.0 encoding=utf-8?', 
 
 
 Here is my context file:
 
 Context 
 docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
 debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
 value=/home/omim/index/ override=true /
 /Context
 ---
 
 Hope this helps.
 
 Cheers
 
 François
 
 
 On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:
 
 Hello list,
 
 Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
 past I have been using guidance in accordance with
 http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
 but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
 E.g.
 
 INFO: Deploying configuration descriptor wombra.xml  This is my context
 fragment
 from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
 SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
 target matching [xX][mM][lL] is not allowed.
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor wombra.xml
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 some more
 ...
 
 My configuration descriptor is as follows
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/home/lewis/Downloads/wombra/wombra.war
 crossContext=true
 Environment name=solr/home type=java.lang.String
 value=/home/lewis/Downloads/wombra override=true/
 /Context
 
 Preferably I would upload a WAR file, but I have been working well with
 the configuration I have been using up until now therefore I didn't
 question change.
 I am unfamiliar with the above errors. Can anyone please point me in the
 right direction?
 
 Thank you
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education's Widening Participation Initiative of the 
 Year 2009 and Herald Society's Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education's Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html

Re: Parent-child options

On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 The dreaded parent-child without denormalization question.  What are one's
 options for the following example:

 parent: shoes
 3 children. each with 2 attributes/fields: color and size
  * color: red black orange
  * size: 10 11 12

 The goal is to be able to search for:
 1) color:red AND size:10 and get 1 hit for the above
 2) color:red AND size:12 and get *no* matches because there are no red shoes 
 of
 size 12, only size 10.

What if you had this instead:

  color: red red orange
  size: 10 11 12

Do you need for color:red to return 1 or 2 (i.e. is the final answer
in units of child hits or parent hits)?

-Yonik
http://lucidimagination.com

Re: Replication slows down massively during high load

2011-03-17 Thread Shawn Heisey


On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:

Unfortunately, this doesn't seem to be the problem. The queries themselves are 
running fine. The problem is that the replications is crawling when there are 
many queries going on and that the replication speed stays low even after the 
load is gone.


If you run iostat 5 what are typical values on each iteration for the 
various CPU states while you're doing load testing and replication at 
the same time?  In particular, %iowait is important.

from multiValued field to non-multiValued field with copyField?



Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.

Regards,
Bernd

Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Gora Mohanty

On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

 Is there a way to have a kind of casting for copyField?

 I have author names in multiValued string field and need a sorting on it,
 but sort on field is only for multiValued=false.

 I'm trying to get multiValued content from one field to a
 non-multiValued text or string field for sorting.
 And this, if possible, during loading with copyField.

 Or any other solution?
[...]

Not sure about CopyField, but you could use a transformer to
extract values from a multiValued field, and stick them into a
single-valued field.

Regards,
Gora

Rename fields in a query

2011-03-17 Thread Fabiano Nunes

Given a Query object (name:firefox name:opera), is it possible 'rename'
the fields names to, for example, (content:firefox content:opera)?

Re: Sorting on multiValued fields via function query

On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Also... if lucene is already capable of sorting on multi-valued field by
 choosing the largest value largest vs. smallest is presumably just
 arbitrary there, there is presumably no performance implication to choosing
 the smallest instead of the largest. It just chooses the largest, according
 to Yonik.

It's a little more complicated than that.
It's not so much an explicit feature in lucene, but just what
naturally happens when building the field cache via uninverting an
indexed field.

It's pretty much this:

for every term in the field:
  for every document that matches that term:
value[document] = term

And since terms are iterated from smallest to largest (and no, you
can't reverse this)
larger values end up overwriting smaller values.
There's no simple patch to pick the smallest rather than the largest.

In the past, lucene used to try and detect this multi-valued case by
checking the number of values set in the whole array.  This was
unreliable though and the check was discarded.

-Yonik
http://lucidimagination.com

Re: from multiValued field to non-multiValued field with copyField?

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

 Is there a way to have a kind of casting for copyField?

 I have author names in multiValued string field and need a sorting on it,
 but sort on field is only for multiValued=false.

 I'm trying to get multiValued content from one field to a
 non-multiValued text or string field for sorting.
 And this, if possible, during loading with copyField.

 Or any other solution?

 I need this solution due to patch SOLR-2339, which is now more strict.
 May be anyone else also.

Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com

Re: from multiValued field to non-multiValued field with copyField?



Good idea.
Was also just looking into this area.

Assuming my input record looks like this:
documents
  document id=foobar
element name=authorvalueauthor_1 ; author_2 ; 
author_3/value/element
  /document
/documents

Do you know if I can use something like this:
...
entity name=records processor=XPathEntityProcessor
transformer=RegexTransformer
...
field column=author  
xpath=/documents/document/element[@name='author']/value /
field column=author_sort 
xpath=/documents/document/element[@name='author']/value /
field column=author  splitBy= ;  /
...

To just double the input and make author multiValued and author_sort a string 
field?

Regards
Bernd


Am 17.03.2011 15:39, schrieb Gora Mohanty:

On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de  wrote:


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

[...]

Not sure about CopyField, but you could use a transformer to
extract values from a multiValued field, and stick them into a
single-valued field.

Regards,
Gora

Re: Sorting on multiValued fields via function query

Here is a work around. Stick the high value and low value into other fields. 
Use those fields for sorting.

Bill Bell
Sent from mobile


On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote:

 On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Also... if lucene is already capable of sorting on multi-valued field by
 choosing the largest value largest vs. smallest is presumably just
 arbitrary there, there is presumably no performance implication to choosing
 the smallest instead of the largest. It just chooses the largest, according
 to Yonik.
 
 It's a little more complicated than that.
 It's not so much an explicit feature in lucene, but just what
 naturally happens when building the field cache via uninverting an
 indexed field.
 
 It's pretty much this:
 
 for every term in the field:
  for every document that matches that term:
value[document] = term
 
 And since terms are iterated from smallest to largest (and no, you
 can't reverse this)
 larger values end up overwriting smaller values.
 There's no simple patch to pick the smallest rather than the largest.
 
 In the past, lucene used to try and detect this multi-valued case by
 checking the number of values set in the whole array.  This was
 unreliable though and the check was discarded.
 
 -Yonik
 http://lucidimagination.com

Re: Sorting on multiValued fields via function query

By the way, this could be done automatically by Solr or Lucene behind the 
scenes. 

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:02 AM, Bill Bell billnb...@gmail.com wrote:

 Here is a work around. Stick the high value and low value into other fields. 
 Use those fields for sorting.
 
 Bill Bell
 Sent from mobile
 
 
 On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote:
 
 On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Also... if lucene is already capable of sorting on multi-valued field by
 choosing the largest value largest vs. smallest is presumably just
 arbitrary there, there is presumably no performance implication to choosing
 the smallest instead of the largest. It just chooses the largest, according
 to Yonik.
 
 It's a little more complicated than that.
 It's not so much an explicit feature in lucene, but just what
 naturally happens when building the field cache via uninverting an
 indexed field.
 
 It's pretty much this:
 
 for every term in the field:
 for every document that matches that term:
   value[document] = term
 
 And since terms are iterated from smallest to largest (and no, you
 can't reverse this)
 larger values end up overwriting smaller values.
 There's no simple patch to pick the smallest rather than the largest.
 
 In the past, lucene used to try and detect this multi-valued case by
 checking the number of values set in the whole array.  This was
 unreliable though and the check was discarded.
 
 -Yonik
 http://lucidimagination.com

Re: Sorting on multiValued fields via function query


Aha, oh well, not quite as good/flexible as I hoped.

Still, if lucene is now behaving somewhat more predictably/rationally 
when sorting on multi-valued fields, then I think, in response to your 
other email on a similar thread, perhaps SOLR-2339  is now a mistake.


When lucene was returning completely unpredictable results -- and even 
sometimes crashing entirely -- when sorting on a multi-valued field --- 
then I think in that situation it made a lot of sense for Solr to 
prevent you from doing that, which is I think what SOLR-2339 does?  So I 
don't think it was neccesarily a mistake in that context.


But if lucene now can sort a multi-valued field without crashing when 
there are 'too many' unique values, and with easily described and 
predictable semantics (use the minimal value in the multi-valued field 
as sort key) -- then it probably makes more sense for Solr to let you do 
that if you really want to, give you enough rope to hang yourself.


Jonathan

On 3/17/2011 10:49 AM, Yonik Seeley wrote:

On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkindrochk...@jhu.edu  wrote:

Also... if lucene is already capable of sorting on multi-valued field by
choosing the largest value largest vs. smallest is presumably just
arbitrary there, there is presumably no performance implication to choosing
the smallest instead of the largest. It just chooses the largest, according
to Yonik.

It's a little more complicated than that.
It's not so much an explicit feature in lucene, but just what
naturally happens when building the field cache via uninverting an
indexed field.

It's pretty much this:

for every term in the field:
   for every document that matches that term:
 value[document] = term

And since terms are iterated from smallest to largest (and no, you
can't reverse this)
larger values end up overwriting smaller values.
There's no simple patch to pick the smallest rather than the largest.

In the past, lucene used to try and detect this multi-valued case by
checking the number of values set in the whole array.  This was
unreliable though and the check was discarded.

-Yonik
http://lucidimagination.com

Re: from multiValued field to non-multiValued field with copyField?

Do you use Dih handler? A script can do this easily.

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:02 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de 
wrote:

 
 Good idea.
 Was also just looking into this area.
 
 Assuming my input record looks like this:
 documents
  document id=foobar
element name=authorvalueauthor_1 ; author_2 ; 
 author_3/value/element
  /document
 /documents
 
 Do you know if I can use something like this:
 ...
 entity name=records processor=XPathEntityProcessor
transformer=RegexTransformer
 ...
 field column=author  
 xpath=/documents/document/element[@name='author']/value /
 field column=author_sort 
 xpath=/documents/document/element[@name='author']/value /
 field column=author  splitBy= ;  /
 ...
 
 To just double the input and make author multiValued and author_sort a string 
 field?
 
 Regards
 Bernd
 
 
 Am 17.03.2011 15:39, schrieb Gora Mohanty:
 On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de  wrote:
 
 Is there a way to have a kind of casting for copyField?
 
 I have author names in multiValued string field and need a sorting on it,
 but sort on field is only for multiValued=false.
 
 I'm trying to get multiValued content from one field to a
 non-multiValued text or string field for sorting.
 And this, if possible, during loading with copyField.
 
 Or any other solution?
 [...]
 
 Not sure about CopyField, but you could use a transformer to
 extract values from a multiValued field, and stick them into a
 single-valued field.
 
 Regards,
 Gora

Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann

On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote:

On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:
Unfortunately, this doesn't seem to be the problem. The queries
themselves are running fine. The problem is that the replications is
crawling when there are many queries going on and that the replication
speed stays low even after the load is gone.

If you run iostat 5 what are typical values on each iteration for
the various CPU states while you're doing load testing and replication
at the same time?  In particular, %iowait is important.



CPU stats from top (iostat doesn't seem to show CPU load correctly):

90.1%us,  4.5%sy,  0.0%ni,  5.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Seems like I/O is not the bottleneck here.

Other interesting thing: When Solr starts its replication under heavy
load, it tries to download the whole index from master.

From /solr/admin/replication/index.jsp:

Current Replication Status

Start Time: Thu Mar 17 15:57:20 CET 2011
Files Downloaded: 9 / 163
Downloaded: 83,04 MB / 97,75 GB [0.0%]
Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%]
Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 
KB/s

Re: from multiValued field to non-multiValued field with copyField?


Hi Yonik,

actually some applications misused sorting on a multiValued field,
like VuFind. And as a matter oft fact also FAST doesn't support this
because it doesn't make sense.
FAST distinguishes between multiValue and singleValue by just adding
the seperator-FieldAttribute to the field. So I moved this from FAST
index-profile to Solr DIH and placed the seperator there.

But now I'm looking for a solution for VuFind.
Easiest thing would be to have a kind of casting, may be for copyField.

Regards,
Bernd


Am 17.03.2011 15:58, schrieb Yonik Seeley:

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de  wrote:


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.


Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com

Re: from multiValued field to non-multiValued field with copyField?


Hi Bill,
yes DIH is in use.

Thanks,
Bernd

Am 17.03.2011 16:09, schrieb Bill Bell:

Do you use Dih handler? A script can do this easily.

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:02 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de  
wrote:



Good idea.
Was also just looking into this area.

Assuming my input record looks like this:
documents
  document id=foobar
element name=authorvalueauthor_1 ; author_2 ; 
author_3/value/element
  /document
/documents

Do you know if I can use something like this:
...
entity name=records processor=XPathEntityProcessor
transformer=RegexTransformer
...
field column=author  
xpath=/documents/document/element[@name='author']/value /
field column=author_sort 
xpath=/documents/document/element[@name='author']/value /
field column=author  splitBy= ;  /
...

To just double the input and make author multiValued and author_sort a string 
field?

Regards
Bernd


Am 17.03.2011 15:39, schrieb Gora Mohanty:

On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de   wrote:


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

[...]

Not sure about CopyField, but you could use a transformer to
extract values from a multiValued field, and stick them into a
single-valued field.

Regards,
Gora


--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*

Re: Replication slows down massively during high load

You could always rsync the index dir and reload (old scripts). But this is 
still something we should investigate. I had this same issue on high load and 
never really found a solution. Did you try another Nic card? See if the Nic is 
configured right? Routing? Speed of transfer?

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:11 AM, Vadim Kisselmann v.kisselm...@googlemail.com 
wrote:

 On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote:
 
 On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:
 Unfortunately, this doesn't seem to be the problem. The queries
 themselves are running fine. The problem is that the replications is
 crawling when there are many queries going on and that the replication
 speed stays low even after the load is gone.
 
 If you run iostat 5 what are typical values on each iteration for
 the various CPU states while you're doing load testing and replication
 at the same time?  In particular, %iowait is important.
 
 
 
 CPU stats from top (iostat doesn't seem to show CPU load correctly):
 
 90.1%us,  4.5%sy,  0.0%ni,  5.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
 
 Seems like I/O is not the bottleneck here.
 
 Other interesting thing: When Solr starts its replication under heavy
 load, it tries to download the whole index from master.
 
 From /solr/admin/replication/index.jsp:
 
Current Replication Status
 
Start Time: Thu Mar 17 15:57:20 CET 2011
Files Downloaded: 9 / 163
Downloaded: 83,04 MB / 97,75 GB [0.0%]
Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%]
Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 KB/s

Re: Parent-child options

2011-03-17 Thread Otis Gospodnetic

Hi,

- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: Parent-child options

 On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com  wrote:
  The dreaded parent-child without denormalization question.  What  are one's
  options for the following example:

  parent:  shoes
  3 children. each with 2 attributes/fields: color and size
* color: red black orange
   * size: 10 11 12

  The goal is  to be able to search for:
  1) color:red AND size:10 and get 1 hit for the  above
  2) color:red AND size:12 and get *no* matches because there are no  red 
  shoes 
of
  size 12, only size 10.

 What if you had this  instead:

   color: red red orange
   size: 10 11 12

 Do  you need for color:red to return 1 or 2 (i.e. is the final answer
 in units of  child hits or parent hits)?

The final answer is the parent, which is shoes in this example.
So:
if the query is color:red AND size:10 the answer is: Yes, we got red shoes size 
10
if the query is color:red AND size:11 the answer is: Yes, we got red shoes size 
11
if the query is color:red AND size:12 the answer is: No, we don't have red 
shoes 
size 12

Thanks,
Otis

Re: Solr Autosuggest help

2011-03-17 Thread Otis Gospodnetic

Rahul,

Go to your Solr Admin Analysis page, enter sci/tech, check appropriate check 
boxes, and see how sci/tech gets analyzed.  This will lead you in the right 
direction.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: rahul asharud...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, March 17, 2011 10:12:27 AM
 Subject: Re: Solr Autosuggest help
 
 hi,
 
 We have found that 'EnglishPorterFilterFactory' causes that issue. I  believe
 that is used for stemming words. Once we commented that factory, it  works
 fine.
 
 And another thing, currently I am checking about how the  word 'sci/tech'
 will be indexed in solr. As mentioned in my previous email,  if I search on
 sci/tech it wont send any results. But solr has the terms as  sci/tech. When
 I search on other terms which also contain sci/tech, it  returns both the
 words.
 
 Please let me know, if you have any idea  regarding that.. If I came to know
 I will update this  thread.
 
 thanks.
 
 
 
 --
 View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html
 Sent  from the Solr - User mailing list archive at Nabble.com.

Re: from multiValued field to non-multiValued field with copyField?

Perhaps easiest thing for you right now, that you can do in any version 
of Solr, is translate your data at indexing time so you don't have to 
sort on a multi-valued field.  Put the stuff in an additional field for 
sorting, where at index time you only put the greatest or least value 
(your choice) from a multi-valued set in, to have a single-valued field.


Your sorting on a multi-valued field before, while Solr let you, was 
almost certainly resulting in unpredictable results in some cases, that 
you just hadn't noticed. Better to fix it up so it's predictable and 
reliable instead, no?


On 3/17/2011 11:14 AM, Bernd Fehling wrote:

Hi Yonik,

actually some applications misused sorting on a multiValued field,
like VuFind. And as a matter oft fact also FAST doesn't support this
because it doesn't make sense.
FAST distinguishes between multiValue and singleValue by just adding
the seperator-FieldAttribute to the field. So I moved this from FAST
index-profile to Solr DIH and placed the seperator there.

But now I'm looking for a solution for VuFind.
Easiest thing would be to have a kind of casting, may be for copyField.

Regards,
Bernd


Am 17.03.2011 15:58, schrieb Yonik Seeley:

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de   wrote:

Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.

Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com

Re: Parent-child options

On Thu, Mar 17, 2011 at 11:21 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 - Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: Parent-child options

 On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com  wrote:
  The dreaded parent-child without denormalization question.  What  are one's
  options for the following example:

  parent:  shoes
  3 children. each with 2 attributes/fields: color and size
    * color: red black orange
   * size: 10 11 12

  The goal is  to be able to search for:
  1) color:red AND size:10 and get 1 hit for the  above
  2) color:red AND size:12 and get *no* matches because there are no  red 
  shoes
of
  size 12, only size 10.

 What if you had this  instead:

   color: red red orange
   size: 10 11 12

 Do  you need for color:red to return 1 or 2 (i.e. is the final answer
 in units of  child hits or parent hits)?

 The final answer is the parent, which is shoes in this example.
 So:
 if the query is color:red AND size:10 the answer is: Yes, we got red shoes 
 size
 10
 if the query is color:red AND size:11 the answer is: Yes, we got red shoes 
 size
 11
 if the query is color:red AND size:12 the answer is: No, we don't have red 
 shoes
 size 12

Then yes, the join patch would work (as long as it's just filtering
and you don't need relevancy of child hits to propagate to the
parent).

parent {category:shoes}
child {parent:shoes, color:red, size:10}

q={!join from=parent to=category}color:red AND size:10

If you had a query on the parent type docs, the join could also be
used as an fq.

-Yonik
http://lucidimagination.com

FuzzyQuery rewrite

2011-03-17 Thread Fabiano Nunes

Rewriting fuzzy queries in spellchecker index is a good practice?
When I rewrite these queries in the main index, the rewriting time is about
3.5 - 4 secs. Now, this rewrites takes a few milliseconds.

Re: from multiValued field to non-multiValued field with copyField?


 Better to fix it up so it's predictable and reliable instead, no?

Yes, you are absolutely right. Thats why I'm looking into this.

But how would i stuff, say always author_1, from a multi-valued field
into a single-valued (string or text) field?

Ok, another solution comes up to my mind.
Writing a processor for updateRequestProcessorChain, that might work.

Regards,
Bernd


Am 17.03.2011 16:27, schrieb Jonathan Rochkind:

Perhaps easiest thing for you right now, that you can do in any version of 
Solr, is translate your data at indexing time so you don't have to
sort on a multi-valued field. Put the stuff in an additional field for sorting, 
where at index time you only put the greatest or least value
(your choice) from a multi-valued set in, to have a single-valued field.

Your sorting on a multi-valued field before, while Solr let you, was almost 
certainly resulting in unpredictable results in some cases, that you
just hadn't noticed. Better to fix it up so it's predictable and reliable 
instead, no?

On 3/17/2011 11:14 AM, Bernd Fehling wrote:

Hi Yonik,

actually some applications misused sorting on a multiValued field,
like VuFind. And as a matter oft fact also FAST doesn't support this
because it doesn't make sense.
FAST distinguishes between multiValue and singleValue by just adding
the seperator-FieldAttribute to the field. So I moved this from FAST
index-profile to Solr DIH and placed the seperator there.

But now I'm looking for a solution for VuFind.
Easiest thing would be to have a kind of casting, may be for copyField.

Regards,
Bernd


Am 17.03.2011 15:58, schrieb Yonik Seeley:

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.

Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com


--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*

Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann

Hi Bill,

 You could always rsync the index dir and reload (old scripts).

I used them previously but was getting problems with them. The
application querying the Solr doesn't cause enough load on it to
trigger the issue. Yet.

 But this is still something we should investigate.

Indeed :-)

 See if the Nic is configured right? Routing? Speed of transfer?

Network doesn't seem to be the problem. Testing with iperf from slave
to master yields a full gigabit, even while Solrmeter is hammering the
server.

 Bill Bell

Vadim

Re: Parent-child options

The standard answer, which is a kind of de-normalizing, is to index 
tokens like this:


red_10   red_11orange_12

in another field, you could do these things with size first:

10_red 11_red 12_orange

Now if you want to see what sizes of red you have, you can do a facet 
query with facet.prefix=red_ .  You'll need to do a bit of 
parsing/interpreting client size to translate from the results you get 
(red_10, red_11) to telling the users sizes 10 and 11 are 
available.  The second field with size first lets you do the same thing 
to answer what colors do we have in size X?.


That gets unmanageable with more than 2-3 facet combinations, but with 
just 2 (or, pushing it, 3), can work out okay. You'd probably ALSO want 
to keep the facets you have with plain values red red orange etc, to 
support that first level of user-implementing. There is a bit more work 
to do on client side with this approach, Solr isn't just giving you 
exactly what you want in it's response, you've got to have logic for 
when to use the top-level facets and when to go to that second-level 
combo facet (red_12), but it's do-able.


On 3/17/2011 11:21 AM, Otis Gospodnetic wrote:

Hi,



- Original Message 

From: Yonik Seeleyyo...@lucidimagination.com
Subject: Re: Parent-child options

On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com   wrote:

The dreaded parent-child without denormalization question.  What  are one's
options for the following example:

parent:  shoes
3 children. each with 2 attributes/fields: color and size
   * color: red black orange
  * size: 10 11 12

The goal is  to be able to search for:
1) color:red AND size:10 and get 1 hit for the  above
2) color:red AND size:12 and get *no* matches because there are no  red shoes

of

size 12, only size 10.

What if you had this  instead:

   color: red red orange
   size: 10 11 12

Do  you need for color:red to return 1 or 2 (i.e. is the final answer
in units of  child hits or parent hits)?

The final answer is the parent, which is shoes in this example.
So:
if the query is color:red AND size:10 the answer is: Yes, we got red shoes size
10
if the query is color:red AND size:11 the answer is: Yes, we got red shoes size
11
if the query is color:red AND size:12 the answer is: No, we don't have red shoes
size 12

Thanks,
Otis

memory not getting released in tomcat after pushing large documents

Hi,

I am very new to SOLR and facing a lot of issues when using SOLR to push large 
documents.
I have solr running in tomcat. I have allocated about 4gb memory (-Xmx) but I 
am pushing about twenty five 100 mb documents and gives heap space and fails.

Also I tried pushing just 1 document. It went thru successfully, but the tomcat 
memory does not come down. It consumes about a gig memory for just one 100 mb 
document and does not release it.

Please let me know if I am making any mistake in configuration/ or set up.

Here is the stack trace:
SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.StringWriter.write(StringWriter.java:77)
at 
com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1570)
at 
com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1488)
at 
com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLStream.java:1529)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at 
com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)


Thanks for help,
Geeta













**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.

Info about Debugging SOLR in Eclipse

Hi,

Can some please let me know the steps on how can I debug the solr code in my 
eclipse?

I tried to compile the source, use the jars and place in tomcat where I am 
running solr. And do remote debugging, but it did not stop at any break point.
I also tried to write a sample standalone java class to push the document. But 
I stopped at solr j classes and not solr server classes.


Please let me know if I am making any mistake.

Regards,
Geeta 













**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.

Re: memory not getting released in tomcat after pushing large documents

Hi,

25*100MB=2.5GB will most likely fail with just 4GB of heap space. But 
consecutive single `pushes` as you call it, of 25MB documents should work 
fine. Heap memory will only drop after the garbage collector comes along.

Cheers,

On Thursday 17 March 2011 17:12:46 Geeta Subramanian wrote:
 Hi,
 
 I am very new to SOLR and facing a lot of issues when using SOLR to push
 large documents. I have solr running in tomcat. I have allocated about 4gb
 memory (-Xmx) but I am pushing about twenty five 100 mb documents and
 gives heap space and fails.
 
 Also I tried pushing just 1 document. It went thru successfully, but the
 tomcat memory does not come down. It consumes about a gig memory for just
 one 100 mb document and does not release it.
 
 Please let me know if I am making any mistake in configuration/ or set up.
 
 Here is the stack trace:
 SEVERE: java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:2882)
   at
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:
 100) at
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at
 java.lang.StringBuffer.append(StringBuffer.java:306)
   at java.io.StringWriter.write(StringWriter.java:77)
   at
 com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.
 java:1570) at
 com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.ja
 va:1488) at
 com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLS
 tream.java:1529) at
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.charac
 ters(TransformerHandlerImpl.java:168) at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor
 ator.java:124) at
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.j
 ava:153) at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor
 ator.java:124) at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor
 ator.java:124) at
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:
 39) at
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
 at
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
 at
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:
 151) at
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.jav
 a:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVEx
 tractingDocumentLoader.java:349) at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content
 StreamHandlerBase.java:54) at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
 e.java:131) at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReque
 st(RequestHandlers.java:237) at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
 :337) at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
 a:240) at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
 onFilterChain.java:235) at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
 Chain.java:206) at
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.jav
 a:122)
 
 
 Thanks for help,
 Geeta
 
 
 
 
 
 
 
 
 
 
 
 
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Info about Debugging SOLR in Eclipse


http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-
eclipse



On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote:
 Hi,
 
 Can some please let me know the steps on how can I debug the solr code in
 my eclipse?
 
 I tried to compile the source, use the jars and place in tomcat where I am
 running solr. And do remote debugging, but it did not stop at any break
 point. I also tried to write a sample standalone java class to push the
 document. But I stopped at solr j classes and not solr server classes.
 
 
 Please let me know if I am making any mistake.
 
 Regards,
 Geeta
 
 
 
 
 
 
 
 
 
 
 
 
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: memory not getting released in tomcat after pushing large documents

On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
        at 
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349)

Looks like you're using a custom update handler.  Perhaps that's
accidentally hanging onto memory?

-Yonik
http://lucidimagination.com

RE: memory not getting released in tomcat after pushing large documents

Hi,

 Thanks for the reply.
I am sorry, the logs from where I posted does have a Custom Update Handler.

But I have a local setup, which does not have a custome update handler, its as 
its downloaded from SOLR site, even that gives me heap space.

at java.util.Arrays.copyOf(Unknown Source)  
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)   
at java.lang.AbstractStringBuilder.append(Unknown Source)   
at java.lang.StringBuilder.append(Unknown Source)   
at org.apache.solr.handler.extraction.Solrtik   
ContentHandler.characters(SolrContentHandler.java:257)  
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
   
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)   
 
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)   
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)   
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)  
 
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
 
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)   
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)   
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)  
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) 
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
  
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
   
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
 



Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
ideal heap space set?
Also, I see that when I push a single document of 100 mb, in task manager I see 
that about 900 mb memory is been used up, and some subsequent push keeps the 
memory about 900mb, so at what point there can be OOM crash?

When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
consumed by char[] , String []. 
How can I find out who is creating these(is it SOLR or TIKA) and free up these 
objects?


Thank you so much for your time and help,



Regards,
Geeta



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 17 March, 2011 12:21 PM
To: solr-user@lucene.apache.org
Cc: Geeta Subramanian
Subject: Re: memory not getting released in tomcat after pushing large documents

On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
gsubraman...@commvault.com wrote:
        at 
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(
 CVExtractingDocumentLoader.java:349)

Looks like you're using a custom update handler.  Perhaps that's accidentally 
hanging onto memory?

-Yonik
http://lucidimagination.com













**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.

RE: Info about Debugging SOLR in Eclipse

Hi Markus,

Thanks, I had already followed the steps of this site.
But I am not able to DEBUG the SOLR classes though I am able to run the solr.

I want to see the code flow from the server side, especially the point where 
solr calls tika and it gets the content from tika.

Thanks for the time  help,
Regards,
Geeta

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: 17 March, 2011 12:22 PM
To: solr-user@lucene.apache.org
Cc: Geeta Subramanian
Subject: Re: Info about Debugging SOLR in Eclipse


http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-
eclipse



On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote:
 Hi,
 
 Can some please let me know the steps on how can I debug the solr code 
 in my eclipse?
 
 I tried to compile the source, use the jars and place in tomcat where 
 I am running solr. And do remote debugging, but it did not stop at any 
 break point. I also tried to write a sample standalone java class to 
 push the document. But I stopped at solr j classes and not solr server 
 classes.
 
 
 Please let me know if I am making any mistake.
 
 Regards,
 Geeta
 
 
 
 
 
 
 
 
 
 
 
 
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient.  Any unauthorized review, 
 use or distribution by others is strictly prohibited.  If you have 
 received the message in error, please advise the sender by reply email 
 and delete the message. Thank you.
 

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350











**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.

Re: Sorting on multiValued fields via function query

2011-03-17 Thread Chris Hostetter


: But if lucene now can sort a multi-valued field without crashing when there
: are 'too many' unique values, and with easily described and predictable
: semantics (use the minimal value in the multi-valued field as sort key) --
: then it probably makes more sense for Solr to let you do that if you really
: want to, give you enough rope to hang yourself.

(Clarification: it's the the *maximal* value that gets used by lucene in 
that situation) 

I disagree.  

If we do what you describe we'd be relying on users to recognize when the 
sort logic is silently doing something tricky under the covers and make 
a concious decision as to if that was what they want, and if not then 
change their indexing to account for it.  

That seems like a recipe for confusion and unexpected behavior.

with SOLR-2339 in place, we tell users explicitly and up front what you 
are attempting to do can not work as specified and we force them to 
decide in advance how they want to deal with it -- by either indexing the 
lowest value or hte highest value (or both in distinct fields).

As the code stands now: we fail fast and let the person building hte index 
make a decision.  If we silently sort on the maximal value, we leave nasty 
headache for people who don't realize they are missusing a multiValued 
field and then wonder why some sorts don't do what they expect in some 
situations.

Bottom line: from day 1, we have always documented that sorting on 
multiValued fields (or fields that produced more then one document per 
document) didn't work.  If people didn't notice that documentation, they 
aren't likely to notice any documentation that says it will sort on the 
maximal value either -- SOLR-2339 may introduce a pain point for people 
upgrading, but it introduces it early and loudly, not quietly at some 
arbitrary moment in the future when they're beating their heads against a 
desk wondering why some sort isn't working the way they expect it to 
becuase they added some more values to a few documents.




-Hoss

Segments and Memory Correlate?

2011-03-17 Thread danomano

Hi folks, I ran into problem today where I am no longer able to execute any
queries :( due to Out of Memory issues.

I am in the process of investigating the use of different mergeFactors, or
even different merge policies all together.
My question is if I have many segments (i.e. smaller sized segments), will
that also reduce the total memory in RAM required for searching?  (my System
is currently allocated 8GB ram and has a ~255GB index).  (I'm not fully up
on the 'default merge policy' but I believe with a mergeFactor of 10, that
would mean each segment should be approaching about 25Gb? with ~543 million
documents

of note: this is all running on 1 server.

As seen below.

SEVERE: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.search.cache.LongValuesCreator.fillLongValues(LongValuesCreator.java:141)
at
org.apache.lucene.search.cache.LongValuesCreator.validate(LongValuesCreator.java:84)
at
org.apache.lucene.search.cache.LongValuesCreator.create(LongValuesCreator.java:74)
at
org.apache.lucene.search.cache.LongValuesCreator.create(LongValuesCreator.java:37)
at
org.apache.lucene.search.FieldCacheImpl$Cache.createValue(FieldCacheImpl.java:155)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:188)
at
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:337)
at
org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:504)
at
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:207)
at org.apache.lucene.search.Searcher.search(Searcher.java:101)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1389)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1285)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:344)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:273)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
com.openmarket.servletfilters.LogToCSVFilter.doFilter(LogToCSVFilter.java:89)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
com.openmarket.servletfilters.GZipAutoDeflateFilter.doFilter(GZipAutoDeflateFilter.java:66)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
...etc

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Segments-and-Memory-Correlate-tp2694747p2694747.html
Sent from the Solr - User mailing list archive at Nabble.com.

OOM for large files

Hi,



I am getting OOM after posting a 100 Mb document to SOLR with trace:

Exception in thread main org.apache.solr.common.SolrException: Java heap 
space  java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuilder.append(Unknown Source)

at org.apache.solr.handler.extraction.Solrtik   
ContentHandler.characters(SolrContentHandler.java:257)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)

at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)

at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)

at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)

at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)

at 
org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)

at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)

at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)

at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)

at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)

at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

at org.apache.solr.se







I have given 1024M memory.

But still this fails, so, can somebody tell me the minimum heap size required 
w.r.t. file size so that document get indexed successfully?



Also just a weird question:

In Tika's code, there is a place where char[] is initialized to 4096. Then when 
this used in StringWriter, if the array is full it does an expandCapacity (as 
highlighted in logs), there is an array copy operation. So with just 4kb, if I 
want to process a 100mb document, a lot of char arrays will be generated and we 
need to depend on GC for getting them cleaned.



Is there any idea, if I change the Tika code to initialize the char array with 
more than ~4k , will there be any performance improvement?



Thanks for your time,

Regards,

Geeta















**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.

Helpful new JVM parameters

2011-03-17 Thread Dyer, James

We're on the final stretch in getting our product database in Production with 
Solr.  We have 13m wide-ish records with quite a few stored fields in a 
single index (no shards).  We sort on at least a dozen fields and facet on 
20-30.  One thing that came up in QA testing is we were getting full gc's due 
to promotion failed conditions.  This led us to believe we were dealing with 
large objects being created and a fragmented old generation.  After improving, 
but not solving, the problem by tweaking conventional jvm parameters, our JVM 
expert learned about some newer tuning params included in Sun/Oracle's JDK 
1.6.0_24 (we're running RHEL x64, but I think these are available on other 
platforms too):

These 3 options dramatically reduced the # objects getting promoted into the 
Old Gen, reducing fragmentation and CMS frequency  time:
-XX:+UseStringCache
-XX:+OptimizeStringConcat
-XX:+UseCompressedStrings

This uses compressed pointers on a 64-bit JVM, significantly reducing the 
memory  performance penalty in using a 64-bit jvm over 32-bit.  This reduced 
our new GC (ParNew) time significantly:
-XX:+UseCompressedOops

The default for this was causing CMS to begin too late sometimes.  (the 
documentated 68% proved false in our case.  We figured it was defaulting close 
to 90%)  Much lower than 75%, though, and CMS ran far too often:
-XX:CMSInitiatingOccupancyFraction=75

This made the stop-the-world pauses during CMS much shorter:
-XX:+CMSParallelRemarkEnabled

We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the 
box), with a 1.2G newSize/maxNewSize.

In case anyone else is having similar issues, we thought we would share our 
experience with these newer options.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

Re: Helpful new JVM parameters

Awesome, very helpful. Do you maybe want to add this to the Solr wiki
somewhere? Finding some advice for JVM tuning for Solr can be
challenging, and you've explained what you did and why very well.

On 3/17/2011 2:59 PM, Dyer, James wrote:

We're on the final stretch in getting our product database in Production with Solr. We have 13m
wide-ish records with quite a few stored fields in a single index (no shards). We sort on at
least a dozen fields and facet on 20-30. One thing that came up in QA testing is we were getting full gc's
due to promotion failed conditions. This led us to believe we were dealing with large objects
being created and a fragmented old generation. After improving, but not solving, the problem by tweaking
conventional jvm parameters, our JVM expert learned about some newer tuning params included in
Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are available on other platforms too):

These 3 options dramatically reduced the # objects getting promoted into the Old
Gen, reducing fragmentation and CMS frequency time:
-XX:+UseStringCache
-XX:+OptimizeStringConcat
-XX:+UseCompressedStrings

This uses compressed pointers on a 64-bit JVM, significantly reducing the
memory performance penalty in using a 64-bit jvm over 32-bit. This reduced
our new GC (ParNew) time significantly:
-XX:+UseCompressedOops

The default for this was causing CMS to begin too late sometimes. (the
documentated 68% proved false in our case. We figured it was defaulting close
to 90%) Much lower than 75%, though, and CMS ran far too often:
-XX:CMSInitiatingOccupancyFraction=75

This made the stop-the-world pauses during CMS much shorter:
-XX:+CMSParallelRemarkEnabled

We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the
box), with a 1.2G newSize/maxNewSize.

In case anyone else is having similar issues, we thought we would share our
experience with these newer options.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

Re: Sorting on multiValued fields via function query

On Thu, Mar 17, 2011 at 2:12 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 As the code stands now: we fail fast and let the person building hte index
 make a decision.

Indexing two fields when one could work is unfortunate though.
I think what we should support (eventually) is a max() function will also
work on a multi-valued field and select the maximum value (i.e. it will
simply bypass the check for multi-valued fields).

Then one can utilize sort-by-function to do
sort=max(author) asc

-Yonik
http://lucidimagination.com

dismax 1.4.1 and pure negative queries


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.

Re: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Peter Keegan

Can you use jetty?
http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse

On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian 
gsubraman...@commvault.com wrote:

 Hi,

 Can some please let me know the steps on how can I debug the solr code in
 my eclipse?

 I tried to compile the source, use the jars and place in tomcat where I am
 running solr. And do remote debugging, but it did not stop at any break
 point.
 I also tried to write a sample standalone java class to push the document.
 But I stopped at solr j classes and not solr server classes.


 Please let me know if I am making any mistake.

 Regards,
 Geeta













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.

Re: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Peter Keegan

The instructions refer to the 'Run configuration' menu. Did you try 'Debug
configurations'?


On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan peterlkee...@gmail.comwrote:

 Can you use jetty?


 http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse

 On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:

 Hi,

 Can some please let me know the steps on how can I debug the solr code in
 my eclipse?

 I tried to compile the source, use the jars and place in tomcat where I am
 running solr. And do remote debugging, but it did not stop at any break
 point.
 I also tried to write a sample standalone java class to push the document.
 But I stopped at solr j classes and not solr server classes.


 Please let me know if I am making any mistake.

 Regards,
 Geeta













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.

Adding the suggest component

2011-03-17 Thread Brian Lamb

Hi all,

When I installed Solr, I downloaded the most recent version (1.4.1) I
believe. I wanted to implement the Suggester (
http://wiki.apache.org/solr/Suggester). I copied and pasted the information
there into my solrconfig.xml file but I'm getting the following error:

Error loading class 'org.apache.solr.spelling.suggest.Suggester'

I read up on this error and found that I needed to checkout a newer version
from SVN. I checked out a full version and copied the contents of
src/java/org/apache/spelling/suggest to the same location on my set up.
However, I am still receiving this error.

Did I not put the files in the right place? What am I doing incorrectly?

Thanks,

Brian Lamb

Re: memory not getting released in tomcat after pushing large documents

In your solrconfig.xml,
Are you specifying ramBufferSizeMB or maxBufferedDocs?

-Yonik
http://lucidimagination.com


On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
 Hi,

  Thanks for the reply.
 I am sorry, the logs from where I posted does have a Custom Update Handler.

 But I have a local setup, which does not have a custome update handler, its 
 as its downloaded from SOLR site, even that gives me heap space.

 at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at org.apache.solr.handler.extraction.Solrtik   
 ContentHandler.characters(SolrContentHandler.java:257)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
        at 
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
        at 
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
        at 
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
        at 
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
        at 
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
        at 
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
        at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
        at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
        at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)



 Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
 ideal heap space set?
 Also, I see that when I push a single document of 100 mb, in task manager I 
 see that about 900 mb memory is been used up, and some subsequent push keeps 
 the memory about 900mb, so at what point there can be OOM crash?

 When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
 consumed by char[] , String [].
 How can I find out who is creating these(is it SOLR or TIKA) and free up 
 these objects?


 Thank you so much for your time and help,



 Regards,
 Geeta



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: 17 March, 2011 12:21 PM
 To: solr-user@lucene.apache.org
 Cc: Geeta Subramanian
 Subject: Re: memory not getting released in tomcat after pushing large 
 documents

 On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
        at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(
 CVExtractingDocumentLoader.java:349)

 Looks like you're using a custom update handler.  Perhaps that's accidentally 
 hanging onto memory?

 -Yonik
 http://lucidimagination.com













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.

RE: memory not getting released in tomcat after pushing large documents

Hi Yonik,

I am not setting the ramBufferSizeMB or maxBufferedDocs params...
DO I need to for Indexing?

Regards,
Geeta

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 17 March, 2011 3:45 PM
To: Geeta Subramanian
Cc: solr-user@lucene.apache.org
Subject: Re: memory not getting released in tomcat after pushing large documents

In your solrconfig.xml,
Are you specifying ramBufferSizeMB or maxBufferedDocs?

-Yonik
http://lucidimagination.com


On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian 
gsubraman...@commvault.com wrote:
 Hi,

  Thanks for the reply.
 I am sorry, the logs from where I posted does have a Custom Update Handler.

 But I have a local setup, which does not have a custome update handler, its 
 as its downloaded from SOLR site, even that gives me heap space.

 at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
 Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at org.apache.solr.handler.extraction.Solrtik   
 ContentHandler.characters(SolrContentHandler.java:257)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at 
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl
 er.java:153)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at 
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j
 ava:39)
        at 
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java
 :61)
        at 
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:
 113)
        at 
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j
 ava:151)
        at 
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler
 .java:175)
        at 
 org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
        at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99
 )
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11
 2)
        at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra
 ctingDocumentLoader.java:193)
        at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
 tentStreamHandlerBase.java:54)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
 rBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR
 equest(RequestHandlers.java:237)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
 java:337)



 Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
 ideal heap space set?
 Also, I see that when I push a single document of 100 mb, in task manager I 
 see that about 900 mb memory is been used up, and some subsequent push keeps 
 the memory about 900mb, so at what point there can be OOM crash?

 When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
 consumed by char[] , String [].
 How can I find out who is creating these(is it SOLR or TIKA) and free up 
 these objects?


 Thank you so much for your time and help,



 Regards,
 Geeta



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik 
 Seeley
 Sent: 17 March, 2011 12:21 PM
 To: solr-user@lucene.apache.org
 Cc: Geeta Subramanian
 Subject: Re: memory not getting released in tomcat after pushing large 
 documents

 On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
        at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load
 (
 CVExtractingDocumentLoader.java:349)

 Looks like you're using a custom update handler.  Perhaps that's accidentally 
 hanging onto memory?

 -Yonik
 http://lucidimagination.com













 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient.  Any unauthorized review, 
 use or distribution by others is strictly prohibited.  If you have 
 received the message in error, please advise the sender by reply email 
 and delete the message. Thank you.
 












**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have

Re: dismax 1.4.1 and pure negative queries

Hi,

It works just as expected, but not in a phrase query. Get rid of your quotes 
and you'll be fine.

Cheers,

 Should 1.4.1 dismax query parser be able to handle pure negative queries
 like:
 
 q=-foo
 q=-foo -bar
 
 It kind of seems to me trying it out that it can NOT.  Can anyone else
 verify?  The documentation I can find doesn't say one way or another.
 Which is odd because the documentation for straight solr-lucene query
 parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
 straight solr-lucene query parser_can_  handle pure negative.  That
 seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
 misinterpreting or misunderstanding my experimental results.

Re: dismax 1.4.1 and pure negative queries

My fault for putting in the quotes in the email, I actually don't have 
tests in my quotes, just tried again to make sure.


And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I 
think it does not actually work?


On 3/17/2011 3:52 PM, Markus Jelsma wrote:

Hi,

It works just as expected, but not in a phrase query. Get rid of your quotes
and you'll be fine.

Cheers,


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.

Re: dismax 1.4.1 and pure negative queries

Oh i see, i overlooked your first query. A query with one term that is negated 
will yield zero results, it doesn't return all documents because nothing 
matches. It's, if i remember correctly, the same as when you're looking for a 
field that doesn't have a value: q=-field:[* TO *].

 My fault for putting in the quotes in the email, I actually don't have
 tests in my quotes, just tried again to make sure.
 
 And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
 think it does not actually work?
 
 On 3/17/2011 3:52 PM, Markus Jelsma wrote:
  Hi,
  
  It works just as expected, but not in a phrase query. Get rid of your
  quotes and you'll be fine.
  
  Cheers,
  
  Should 1.4.1 dismax query parser be able to handle pure negative queries
  like:
  
  q=-foo
  q=-foo -bar
  
  It kind of seems to me trying it out that it can NOT.  Can anyone else
  verify?  The documentation I can find doesn't say one way or another.
  Which is odd because the documentation for straight solr-lucene query
  parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
  straight solr-lucene query parser_can_  handle pure negative.  That
  seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
  misinterpreting or misunderstanding my experimental results.

Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Erik Hatcher

purely negative queries work with Solr's default (lucene) query parser.  But 
don't with dismax.   Or so that seems with my experience testing this out just 
now, on trunk.

In chatting with Jonathan further off-list we discussed having the best of both 
worlds 

   q={!lucene}*:* AND NOT _query_:{!dismax ...}inverse of original negative 
query

But this of course requires detecting that a query is all negative.  edismax 
can handle purely negative, FWIW, -ipod = +(-DisjunctionMaxQuery((text:ipod)) 
+MatchAllDocsQuery(*:*))

Erik



On Mar 17, 2011, at 16:45 , Markus Jelsma wrote:

 Oh i see, i overlooked your first query. A query with one term that is 
 negated 
 will yield zero results, it doesn't return all documents because nothing 
 matches. It's, if i remember correctly, the same as when you're looking for a 
 field that doesn't have a value: q=-field:[* TO *].
 
 My fault for putting in the quotes in the email, I actually don't have
 tests in my quotes, just tried again to make sure.
 
 And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
 think it does not actually work?
 
 On 3/17/2011 3:52 PM, Markus Jelsma wrote:
 Hi,
 
 It works just as expected, but not in a phrase query. Get rid of your
 quotes and you'll be fine.
 
 Cheers,
 
 Should 1.4.1 dismax query parser be able to handle pure negative queries
 like:
 
 q=-foo
 q=-foo -bar
 
 It kind of seems to me trying it out that it can NOT.  Can anyone else
 verify?  The documentation I can find doesn't say one way or another.
 Which is odd because the documentation for straight solr-lucene query
 parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
 straight solr-lucene query parser_can_  handle pure negative.  That
 seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
 misinterpreting or misunderstanding my experimental results.

Re: dismax 1.4.1 and pure negative queries

Yeah, looks to me like two or more negated terms does the same thing, 
not just one.


q=-foo -bar -baz

Also always returns zero hits. For the same reason. I understand why 
(sort of), although at the same time there is a logical answer to this 
question -foo -bar -baz, and oddly, 1.4.1 _lucene_ query parser _can_ 
handle it.


Erik Hatcher in IRC gave me one transformation of this query that still 
uses dismax as a unit, but can get you a solution.  (I want to use 
dismax in this case for it's convenient aggregation of multiple fields 
in qf, not so much for actual disjunction-maximum behavior).


defType=lucene
q=*:* AND NOT _query_:{!dismax} foo bar baz

I might be able to work with that in my situation.  But it also seems 
like something that dismax could take care of for you in such a 
situation. It looks from the documentation like the newer (not in 1.4.1) 
edismax does in at least some cases, where the pure negative query is 
inside grouping/subquery parens, it's not clear to me if it does it in 
general or not.


On 3/17/2011 4:45 PM, Markus Jelsma wrote:

Oh i see, i overlooked your first query. A query with one term that is negated
will yield zero results, it doesn't return all documents because nothing
matches. It's, if i remember correctly, the same as when you're looking for a
field that doesn't have a value: q=-field:[* TO *].


My fault for putting in the quotes in the email, I actually don't have
tests in my quotes, just tried again to make sure.

And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
think it does not actually work?

On 3/17/2011 3:52 PM, Markus Jelsma wrote:

Hi,

It works just as expected, but not in a phrase query. Get rid of your
quotes and you'll be fine.

Cheers,


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.

RE: Info about Debugging SOLR in Eclipse

Hi All,

Thanks for the help... I am now able to debug my solr. :-)

-Original Message-
From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter 
Keegan
Sent: 17 March, 2011 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Info about Debugging SOLR in Eclipse

The instructions refer to the 'Run configuration' menu. Did you try 'Debug 
configurations'?

On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan peterlkee...@gmail.comwrote:

 Can you use jetty?

 http://www.lucidimagination.com/developers/articles/setting-up-apache-
 solr-in-eclipse

 On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian  
 gsubraman...@commvault.com wrote:

 Hi,

 Can some please let me know the steps on how can I debug the solr 
 code in my eclipse?

 I tried to compile the source, use the jars and place in tomcat where 
 I am running solr. And do remote debugging, but it did not stop at 
 any break point.
 I also tried to write a sample standalone java class to push the document.
 But I stopped at solr j classes and not solr server classes.

 Please let me know if I am making any mistake.

 Regards,
 Geeta

 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient.  Any unauthorized review, 
 use or distribution by others is strictly prohibited.  If you have 
 received the message in error, please advise the sender by reply 
 email and delete the message. Thank you.

**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.

Re: dismax 1.4.1 and pure negative queries


On 3/17/2011 5:02 PM, Jonathan Rochkind wrote:

defType=lucene
q=*:* AND NOT _query_:{!dismax} foo bar baz



Oops, forgot a part, for anyone reading this and wanting to use it as a 
solution.


You can transform:

$defType=dismax
q=-foo -bar -baz

To:

defType=lucene
q=*:* AND NOT _query_:{!dismax mm=1}foo bar baz

And have basically equivalent semantics to what you meant but which 
dismax won't do.  The mm=1 is important, left that out before.


Jonathan



I might be able to work with that in my situation.  But it also seems
like something that dismax could take care of for you in such a
situation. It looks from the documentation like the newer (not in 1.4.1)
edismax does in at least some cases, where the pure negative query is
inside grouping/subquery parens, it's not clear to me if it does it in
general or not.

On 3/17/2011 4:45 PM, Markus Jelsma wrote:

Oh i see, i overlooked your first query. A query with one term that is negated
will yield zero results, it doesn't return all documents because nothing
matches. It's, if i remember correctly, the same as when you're looking for a
field that doesn't have a value: q=-field:[* TO *].


My fault for putting in the quotes in the email, I actually don't have
tests in my quotes, just tried again to make sure.

And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
think it does not actually work?

On 3/17/2011 3:52 PM, Markus Jelsma wrote:

Hi,

It works just as expected, but not in a phrase query. Get rid of your
quotes and you'll be fine.

Cheers,


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.

DIH Issue(newbie to solr)

2011-03-17 Thread neha

I am a newbie to solr I have an issue with DIH but unable to pinpoint what is
causing the issue. I am using the demo jetty installation of Solr and tried
to create a project with new schema.xml, solrconfig.xml and data-config.xml
files. when I run
http://131.187.88.221:8983/solr/dataimport?command=full-import; this is
what I get:
I am unable to index documents(it doesn't throw any error though).

##

−

0
0

−

−

test-data-config.xml


full-import
idle

−

0
1
0
2011-03-17 17:07:18
−

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

2011-03-17 17:07:18
2011-03-17 17:07:18
0
0:0:0.119

−

This response format is experimental.  It is likely to change in the future.


#

I do not find any log files(except on the console). And here are the
messages from the console:

###
INFO: Starting Full Import
Mar 17, 2011 5:08:20 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Mar 17, 2011 5:08:20 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
   
commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_k,version=1300286691490,generation=20,filenames=[segments_k]
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1300286691490
Mar 17, 2011 5:08:20 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Mar 17, 2011 5:08:20 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
   
commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_k,version=1300286691490,generation=20,filenames=[segments_k]
   
commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_l,version=1300286691491,generation=21,filenames=[segments_l]
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1300286691491
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@d1329 main
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=8,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0,item_subject_topic_facet={field=subject_topic_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_subject_geo_facet={field=subject_geo_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_subject_era_facet={field=subject_era_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_pub_date={field=pub_date,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_language_facet={field=language_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_b4cutter_facet={field=lc_b4cutter_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_alpha_facet={field=lc_alpha_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_1letter_facet={field=lc_1letter_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2}}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@d1329 main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=2,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@d1329 main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=2,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main

Retrieving Ranking (Position)

2011-03-17 Thread Jae Joo

Hi,

I am looking for the way to retrieve a ranking (or position) of  the
document matched  in the result set.

I can get the data, then parse it to find the position of the document
matched, but am looking for the way if there is a feature.

Thanks,

Jae

Re: memory not getting released in tomcat after pushing large documents

On Thu, Mar 17, 2011 at 3:55 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
 Hi Yonik,

 I am not setting the ramBufferSizeMB or maxBufferedDocs params...
 DO I need to for Indexing?

No, the default settings that come with Solr should be fine.
You should verify that they have not been changed however.

An older solrconfig that used maxBufferedDocs could cause an OOM with
large documents since it buffered a certain amount of documents
instead a certain amount of RAM.

Perhaps post your solrconfig (or at least the sections related to
index configuration).

-Yonik
http://lucidimagination.com


 Regards,
 Geeta

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: 17 March, 2011 3:45 PM
 To: Geeta Subramanian
 Cc: solr-user@lucene.apache.org
 Subject: Re: memory not getting released in tomcat after pushing large 
 documents

 In your solrconfig.xml,
 Are you specifying ramBufferSizeMB or maxBufferedDocs?

 -Yonik
 http://lucidimagination.com


 On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
 Hi,

  Thanks for the reply.
 I am sorry, the logs from where I posted does have a Custom Update Handler.

 But I have a local setup, which does not have a custome update handler, its 
 as its downloaded from SOLR site, even that gives me heap space.

 at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown
 Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at org.apache.solr.handler.extraction.Solrtik
 ContentHandler.characters(SolrContentHandler.java:257)
        at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl
 er.java:153)
        at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j
 ava:39)
        at
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java
 :61)
        at
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:
 113)
        at
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j
 ava:151)
        at
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler
 .java:175)
        at
 org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
        at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
        at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99
 )
        at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11
 2)
        at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra
 ctingDocumentLoader.java:193)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
 tentStreamHandlerBase.java:54)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
 rBase.java:131)
        at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR
 equest(RequestHandlers.java:237)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
 java:337)



 Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
 ideal heap space set?
 Also, I see that when I push a single document of 100 mb, in task manager I 
 see that about 900 mb memory is been used up, and some subsequent push keeps 
 the memory about 900mb, so at what point there can be OOM crash?

 When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
 consumed by char[] , String [].
 How can I find out who is creating these(is it SOLR or TIKA) and free up 
 these objects?


 Thank you so much for your time and help,



 Regards,
 Geeta



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: 17 March, 2011 12:21 PM
 To: solr-user@lucene.apache.org
 Cc: Geeta Subramanian
 Subject: Re: memory not getting released in tomcat after pushing large
 documents

 On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
        at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load
 (
 CVExtractingDocumentLoader.java:349)

 Looks like you're using a custom update handler.  Perhaps that's 
 accidentally hanging onto memory?

 -Yonik
 http://lucidimagination.com













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by

Re: Rename fields in a query

2011-03-17 Thread Ahmet Arslan

 Given a Query object (name:firefox
 name:opera), is it possible 'rename'
 the fields names to, for example, (content:firefox
 content:opera)?

By saying object you mean solrJ?

Anyway, it that helps, with df parameter you can change fields. 

q=firefox operadf=name  will be parsed into name:firefox name:opera 
q=firefox operadf=content will be parsed into content:firefox content:opera

Re: memory not getting released in tomcat after pushing large documents