What request handlers to use for query strings in Chinese or Japanese?

2011-03-17 Thread Andy
Hi,

For my Solr server, some of the query strings will be in Asian languages such 
as Chinese or Japanese. 

For such query strings, would the Standard or Dismax request handler work? My 
understanding is that both the Standard and the Dismax handler tokenize the 
query string by whitespace. And that wouldn't work for Chinese or Japanese, 
right? 

In that case, what request handler should I use? And if I need to set up custom 
request handlers for those languages, how do I do it?

Thanks.

Andy


  


Re: Solrj performance bottleneck

2011-03-17 Thread rahul
thanks for all your info.

I will try increase the RAM and check it.

thanks,

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2682797p2692503.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What request handlers to use for query strings in Chinese or Japanese?

2011-03-17 Thread Li Li
That's the job your analyzer should concern


2011/3/17 Andy angelf...@yahoo.com:
 Hi,

 For my Solr server, some of the query strings will be in Asian languages such 
 as Chinese or Japanese.

 For such query strings, would the Standard or Dismax request handler work? My 
 understanding is that both the Standard and the Dismax handler tokenize the 
 query string by whitespace. And that wouldn't work for Chinese or Japanese, 
 right?

 In that case, what request handler should I use? And if I need to set up 
 custom request handlers for those languages, how do I do it?

 Thanks.

 Andy






Re: Solr Autosuggest help

2011-03-17 Thread rahul
Hi,

One more query.

Currently in the autosuggestion Solr returns words like below:

googl
googl _
googl search
googl chrome
googl map

The last letter seems to be missing in autosuggestion. I have send the query
as
?qt=/termsterms=trueterms.fl=mydataterms.lower=googterms.prefix=goog.

The following is my schema.xml for the Text filed.

fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer
tokenizer class=solr.WhitespaceTokenizerFactory
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=1
catenateWords=0 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1
  
filter class=solr.LowerCaseFilterFactory
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt  
filter class=solr.RemoveDuplicatesTokenFilterFactory
filter class=solr.ShingleFilterFactory maxShingleSize=2
outputUnigrams=true outputUnigramIfNoNgram=true
 analyzer
fieldType

Could anyone update what could be wrong? why the last letter get missing. It
occurs for a few word only. Suggestions for other words are good only.

One more query, how the word 'sci/tech' will be indexed in solr. If I search
on sci/tech it wont send any results.

Thanks in Advance.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2692651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting 0 values last

2011-03-17 Thread MOuli
Okay.

When I use the map function with ..sort=map(price, 0, 0, 0, 1) desc then
solr output an error: 
17.03.2011 09:42:58 org.apache.solr.common.SolrException log
SCHWERWIEGEND: org.apache.solr.common.SolrException: Missing sort order.
at
org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:254)
at org.apache.solr.search.QParser.getSort(QParser.java:211)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:90)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)

17.03.2011 09:42:58 org.apache.solr.core.SolrCore execute
INFO: [de] webapp=/solr path=/select
params={sort=map(calc_curr,0,0,1)+descqt=nonequery} status=400 QTime=1


fyi: I use solr 1.4.1 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-0-values-last-tp2681612p2692701.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann
Hello Shawn,

Primary assumption:  You have a 64-bit OS and a 64-bit JVM.

Jepp, it's running 64-bit Linux with 64-bit JVM

It sounds to me like you're I/O bound, because your machine cannot
keep enough of your index in RAM.  Relative to your 100GB index, you
only have a maximum of 14GB of RAM available to the OS disk cache,
since Java's heap size is 10GB.

The load test seems to be more CPU bound than I/O bound. 
All cores are fully busy and iostat says that there isn't 
much more disk I/O going on than without load test. The 
index is on a RAID10 array with four disks.

How much disk space do all of the index files that end in x take up?
 I would venture a guess that it's significantly more than 14GB.  On
Linux, you could do this command to tally it quickly:

# du -hc *x

27G total

# du -hc `ls | egrep -v tvf|fdt`

51G total

If you installed enough RAM so the disk cache can be much larger than
the total size of those files ending in x, you'd probably stop
having these performance issues.  Realizing that this is a
Alternatively, you could take steps to reduce the size of your index,
or perhaps add more machines to go distributed.

Unfortunately, this doesn't seem to be the problem. 
The queries themselves are running fine. The problem 
is that the replications is crawling when there are 
many queries going on and that the replication speed 
stays low even after the load is gone.



Cheers
Vadim


Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Toke Eskildsen
On Wed, 2011-03-16 at 18:36 +0100, Erik Hatcher wrote:
 Sorry, I missed the original mail on this thread
 
 I put together that hierarchical faceting wiki page a couple
 of years ago when helping a customer evaluate SOLR-64 vs.
 SOLR-792 vs.other approaches.  Since then, SOLR-792 morphed
 and is committed as pivot faceting.  SOLR-64 spawned a
 PathTokenizer which is part of Solr now too.
 
 Recently Toke updated that page with some additional info.
 It's definitely not a how to page, and perhaps should get
 renamed/moved/revamped?  Toke?

Unfortunately or luckily, depending on ones point of view, I am hit by a
child #3 and buying house combo. A lot of intentions, but no promises
for the next month or two. 


I think we need both an overview and a detailed how-to of the different
angles on extended faceting in Solr, seen from a user-perspective.

I am not sure I fully understand the different methods myself, so maybe
we could start by discussing them here? Below is a quick outline of how
I see them; please expand  correct. I plan to back up the claims about
scale later with a wiki-page with performance tests.


http://www.lucidimagination.com/solutions/webcasts/faceting @27-33 min:

- Requires the user to transform the paths to multiple special terms
- Step-by-step drill down: If a visual tree is needed, it requires one 
  call for each branch.
- Supports multiple paths/document
- Constraints on output works just as standard faceting
- Scales very well when a single branch is requested

Example use case:
Click-to-expand tree structure of categories for books.


PathHierarchyTokenizer (trunk):
Changes /A/B/C to /A, /A/B and /A/B/C.

I don't know how this can be used directly for hierarchical faceting.
The Lucid Imagination webcast uses the tokenization 0/A, 1/A/B and
2/A/B/C so they seem incompatible to me. The discussion on SOLR-1057
indicates that it can be used with SOLR-64, but SOLR-64 does its own
tokenization!?  Little help here?


SOLR-64 (not up to date with trunk?):

- Uses a custom tokenizer to handle delimited paths (A/B/C).
- Single-path hierarchical faceting
- Constraints can be given on the depth of the hierarchy but not on the 
  number of entries at a given level (huge result set when a wide 
  hierarchy is analyzed)
- Fine (speed  memory) for small taxonomies
- Does not scale well (speed) to large taxonomies

Example use case:
Tree structure of addresses for stores.


SOLR-792 aka pivot faceting (Solr 4.0):

- Uses multiple independent fields as input: Not suitable for taxonomies
- Multi-value but not multi-path
- Supports taxonomies by restraining to single-path/document(?)
- Constraints can be given on entry count, but sorting cannot be done 
  on recursive counting of entries (and it would be very CPU expensive
  to do so(?))
- Fine (speed  memory) for small taxonomies
- Scales well (speed  memory)to large taxonomies
- Scales poorly (speed)to large taxonomies and large result size

Example use case:
Tree structure with price, rating and stock


SOLR-2412 (trunk, highly experimental):

- Multi-path hierarchical faceting
- Uses a field with delimited paths as input (A/B/C)
- Constraints can be given on depth as well as entry count, but sorting
  cannot be done on recursive counting of entries (the number is there 
  though, so it would be fairly easy to add such a sorter)
- Fine (speed  memory) for small taxonomies
- Scales well (speed  memory)to large taxonomies  result size

Example use case:
Tree structure of categories for books.



SOLR building problems

2011-03-17 Thread royr
Hello,

The apache wiki gives me this information:

Skip this section if you have a binary distribution of Solr. These
instructions will building Solr from source, if you have a nightly tarball
or have checked out the trunk from subversion at
http://svn.apache.org/repos/asf/lucene/dev/trunk. Assumes that you have JDK
1.6 already installed.

In the source directory, run ant dist to build the .war file under dist.
Build the example for the Solr tutorial by running ant example. Change to
the 'example' directory, run java -jar start.jar and visit
localhost:8983/solr/admin to test that the example works with the Jetty
container. 

I have run this code: svn checkout
http://svn.apache.org/repos/asf/lucene/dev/trunk
After that i try the ant example commando. This doesn't work, i got the
following error message:

common.compile-core:
[javac] Compiling 508 source files to
somedir/trunk/lucene/build/classes/java
[javac] --
[javac] 1. ERROR in
dir/trunk/lucene/src/java/org/apache/lucene/document/DateTools.java (at line
1)
[javac] package org.apache.lucene.document;
[javac] ^^
[javac] The type Enum is not generic; it cannot be parameterized with
arguments 
[javac] --
[javac] 1 problem (1 error)

BUILD FAILED
somedir/trunk/solr/common-build.xml:249: The following error occurred while
executing this line:
somedir/trunk/lucene/contrib/contrib-build.xml:58: The following error
occurred while executing this line:
somedir/trunk/lucene/common-build.xml:296: The following error occurred
while executing this line:
somedir/trunk/lucene/common-build.xml:717: Compile failed; see the compiler
error output for details.

Ant is installed correctly i think: 
ant -version = Apache Ant(TM) version 1.8.2 compiled on December 20 2010

What goes wrong?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2692916.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Version Incompatibility(Invalid version (expected 2, but 1) or the data in not in 'javabin' format)

2011-03-17 Thread Isha Garg

On Thursday 17 March 2011 03:18 AM, Ahmet Arslan wrote:

   I am using Solr 4.0 api
to search from index (made using solr1.4 version). I
   

am
 

getting error Invalid version (expected 2, but 1) or
   

the
 

data in not in 'javabin' format. Can anyone help me to
   

fix
 

problem.
   

You need to use solrj version 1.4 which is compatible to
your index format/version.

 

Actually there exists another solution. Using XMLResponseParser instead of 
BinaryResponseParser which is the default.

new CommonsHttpSolrServer(new URL(http://solr1.4.0Instance:8080/solr;), null, 
new XMLResponseParser(), false);


   Hi,
   

Thanks !!!




Re: Error: Unbuffered entity enclosing request can not be repeated.

2011-03-17 Thread Erick Erickson
What happens if you submit the 9th batch first? I'm wondering if the
9th batch is just mal-formed and has nothing to do with the
previous batches.

As to the time, what merge factor are you using? And how are you
committing? Via autocommit parameters or explicitly or not at all?

Best
Erick

On Wed, Mar 16, 2011 at 1:13 PM, André Santos manofi...@gmail.com wrote:
 Hi all!

 I created a SolrJ project to run test Solr. So, I am inserting batches of
 7000 records, each with 200 attributes which adds up approximately to 13.77
 Mb per batch.

 I am measuring the time it takes to add and commit each set of 7000
 records to an instantiation of CommonsHttpSolrServer.
 Each of the first 6 batches takes approximately 17 to 21 seconds.
 The 7th batch takes 42sec and the 8th takes 1min.

 And when it adds the 9th batch to the server it generates this error:

 Mar 16, 2011 4:56:20 PM org.apache.commons.httpclient.HttpMethodDirector
 executeWithRetry
 INFO: I/O exception (java.net.SocketException) caught when processing
 request: Connection reset
 Mar 16, 2011 4:56:21 PM org.apache.commons.httpclient.HttpMethodDirector
 executeWithRetry
 INFO: Retrying request
 Exception in thread main org.apache.solr.client.solrj.SolrServerException:
 org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
 request can not be repeated.
        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:480)
        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)


 I googled this error and one of the suggestions consists of the reduction of
 the number of records per batch. But I want to achieve a solution with at
 least 7000 records per batch.
 Any help would be appreciated.
 André



Re: i don't get why my index didn't grow more...

2011-03-17 Thread Erick Erickson
This page: http://lucene.apache.org/java/3_0_2/fileformats.html#file-names,
when combined with what Yonik said may help you figure it out...

And if you're still stumped, please post the fieldType and field
definitions you used

Best
Erick

On Wed, Mar 16, 2011 at 5:10 PM, Robert Petersen rober...@buy.com wrote:
 OK I have a 30 gb index where there are lots of sparsly populated int
 fields and then one title field and one catchall field with title and
 everything else we want as keywords, the catchall field.  I figure it is
 the biggest field in our documents which as I mentioned is otherwise
 composed of a variety if int fields and a title.



 So my puzzlement is that my biggest field is copied into a double
 metaphone field and now I added another copyfield to also copy the
 catchall field into a newly created soundex field for an experiment to
 compare the effectiveness of the two.  I expected the index to grow by
 at least 25% to 30%, but it barely grew at all.  Can someone explain
 this to me?  Thanks!  J






Re: Error: Unbuffered entity enclosing request can not be repeated.

2011-03-17 Thread André Santos
Hi, Eric!

I suspect that the problem resides in Tomcat. I think that the connection
server-client times out.

What happens if you submit the 9th batch first? I'm wondering if the
 9th batch is just mal-formed and has nothing to do with the
 previous batches.


The 9th batch is ok, like the other batches. It is filled up with random
data. I received that error in many executions (normally in 7th, 8th or 9th
batch) when batches have more than 10Mb approximately.



 As to the time, what merge factor are you using? And how are you
 committing? Via autocommit parameters or explicitly or not at all?


The merge factor is 25.

I done the commit explicitly:

for (int k = 0;k  nregisters; k++) {
...
docs.add( doc );
}
server.add(docs);
server.commit();

André


Re: SOLR building problems

2011-03-17 Thread Erick Erickson
What Java version do you have installed? (java -version)

Best
Erick

On Thu, Mar 17, 2011 at 6:30 AM, royr r...@blixem.nl wrote:
 Hello,

 The apache wiki gives me this information:

 Skip this section if you have a binary distribution of Solr. These
 instructions will building Solr from source, if you have a nightly tarball
 or have checked out the trunk from subversion at
 http://svn.apache.org/repos/asf/lucene/dev/trunk. Assumes that you have JDK
 1.6 already installed.

 In the source directory, run ant dist to build the .war file under dist.
 Build the example for the Solr tutorial by running ant example. Change to
 the 'example' directory, run java -jar start.jar and visit
 localhost:8983/solr/admin to test that the example works with the Jetty
 container.

 I have run this code: svn checkout
 http://svn.apache.org/repos/asf/lucene/dev/trunk
 After that i try the ant example commando. This doesn't work, i got the
 following error message:

 common.compile-core:
    [javac] Compiling 508 source files to
 somedir/trunk/lucene/build/classes/java
    [javac] --
    [javac] 1. ERROR in
 dir/trunk/lucene/src/java/org/apache/lucene/document/DateTools.java (at line
 1)
    [javac]     package org.apache.lucene.document;
    [javac]     ^^
    [javac] The type Enum is not generic; it cannot be parameterized with
 arguments
    [javac] --
    [javac] 1 problem (1 error)

 BUILD FAILED
 somedir/trunk/solr/common-build.xml:249: The following error occurred while
 executing this line:
 somedir/trunk/lucene/contrib/contrib-build.xml:58: The following error
 occurred while executing this line:
 somedir/trunk/lucene/common-build.xml:296: The following error occurred
 while executing this line:
 somedir/trunk/lucene/common-build.xml:717: Compile failed; see the compiler
 error output for details.

 Ant is installed correctly i think:
 ant -version = Apache Ant(TM) version 1.8.2 compiled on December 20 2010

 What goes wrong?




 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2692916.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SOLR-2242-distinctFacet.patch

2011-03-17 Thread Isha Garg

Hi,
  I  want to enquire the patch for 
namedistinct(SOLR-2242-distinctFacet.patch) available with solr4.0 trunk


On Monday 14 March 2011 08:05 PM, Jonathan Rochkind wrote:
It's not easy if you have lots of facet values (in my case, can even 
be up to a million), but there is no way built-in to Solr to get 
this.  I have been told that some of the faceting strategies (there 
are actually several in use in Solr based on your parameters and the 
nature of your data) return the page of facet values without actually 
counting all possible facet values, which is what would make this 
difficult. But I have not looked at the code myself.


Jonathan

On 3/11/2011 7:33 AM, Erick Erickson wrote:

There's nothing that I know of that gives you this, but it's
simple to count the members of the list yourself...

Best
Erick

On Fri, Mar 11, 2011 at 3:34 AM, rajini maskirajinima...@gmail.com  
wrote:

Query on facet field results...


   When I run a facet query on some field say : facet=on
facet.field=StudyID I get list of distinct StudyID list with the 
count that
tells that how many times did this study occur in the search query.  
But I
also needed the count of these distinct StudyID list.. Any solr 
query to get

count of it..



Example:



lst name=*facet_fields*

lst name= StudyID 

int name=*105*135164/int

int name=*179*79820/int

int name=*107*70815/int

int name=*120*37076/int

int name=*134*35276/int

/lst

/lst



I wanted the count attribute that shall return the count of number of
different studyID occurred .. In above example  it could be  : Count 
= 5

(105,179,107,120,134)



lst name=*facet_fields*

lst name= StudyID   COUNT=5

int name=*105*135164/int

int name=*179*79820/int

int name=*107*70815/int

int name=*120*37076/int

int name=*134*35276/int

/lst

/lst





Re: Multiple Blocked threads on UnInvertedField.getUnInvertedField() SegmentReader$CoreReaders.getTermsReader

2011-03-17 Thread Rachita Choudhary
Hi Yonik,

I have another question related to fieldValueCache.
When we uninvert a facet field, and if the termInstances = 0 for a
particular field, then also it gets added to the FieldValueCache.
What is the reason for caching facet fields with termInstances=0?

In our case, a lot of time is being spent in the 'uninvert' process. From
'time' values , I checked that it goes upto 20secs for certain facet fields.

Eg :
UnInverted multi-valued field
{field=product_brands_61936,memSize=4224,tindexSize=32,time=20202,phase1=20202,nTerms=0,bigTerms=0,termInstances=0,uses=0}

Also for the same facet field, the time and phase1 time varies from 3 msec
to 20 secs.
What is the reason for this variation ?
Also what does nTerms represent ?

Thanks,
Rachita

On Mon, Mar 7, 2011 at 8:22 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Mon, Mar 7, 2011 at 9:44 AM, Rachita Choudhary
 rachita.choudh...@burrp.com wrote:
  As enum method , will create a bitset for all the unique values

 It's more complex than that.
  - small sets will use a sorted int set... not a bitset
  - you can control what gets cached via facet.enum.cache.minDf parameter

 -Yonik
 http://lucidimagination.com



Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread François Schiettecatte
Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'?xml version=1.0 encoding=utf-8?', 


Here is my context file:

Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/omim/index/ override=true /
/Context
---

Hope this helps.

Cheers

François


On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:

 Hello list,
 
 Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
 past I have been using guidance in accordance with
 http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
 but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
 E.g.
 
 INFO: Deploying configuration descriptor wombra.xml  This is my context
 fragment
 from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
 SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
 target matching [xX][mM][lL] is not allowed.
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor wombra.xml
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 some more
 ...
 
 My configuration descriptor is as follows
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/home/lewis/Downloads/wombra/wombra.war
 crossContext=true
  Environment name=solr/home type=java.lang.String
 value=/home/lewis/Downloads/wombra override=true/
 /Context
 
 Preferably I would upload a WAR file, but I have been working well with
 the configuration I have been using up until now therefore I didn't
 question change.
 I am unfamiliar with the above errors. Can anyone please point me in the
 right direction?
 
 Thank you
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: i don't get why my index didn't grow more...

2011-03-17 Thread Yonik Seeley
Without even looking at the different segment files, things look odd:
You say that you optimize every day, yet I see segments up to 4 days old.
Also look at all the segments_??? files... each represents a commit
point of the index.
So it looks like you have 16 snapshots (or commit points) of the index.
Do you have a deletion policy configured to do this for some reason?

Anyway, this is why when you changed how you index, you didn't see
much of a size increase (comparatively).

-Yonik
http://lucidimagination.com



On Wed, Mar 16, 2011 at 7:46 PM, Robert Petersen rober...@buy.com wrote:
 Thanks for the reply Yonik, Here are the results of Ls -l on the master 
 server index folder, also please note we have hundreds of those small 
 sparsely populated fields and I run optimize once a day at midnight.  We 
 index 24/7 off a queue at a clip of about 200K docs per hour so the index has 
 had hundreds of commits since last night at midnight.

[...]


RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread Pierre GOSSE
I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my config 
file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't 
try with 7.0.11 yet.

I wonder why your exception point to line 4 column 6, however. Shouldn't it 
point to line 1 column 1 ? Do you have some blank lines at the start of your 
XML file or some non blank lines ?

Pierre

-Message d'origine-
De : François Schiettecatte [mailto:fschietteca...@gmail.com] 
Envoyé : jeudi 17 mars 2011 14:48
À : solr-user@lucene.apache.org
Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 

Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'?xml version=1.0 encoding=utf-8?', 


Here is my context file:

Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/omim/index/ override=true /
/Context
---

Hope this helps.

Cheers

François


On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:

 Hello list,
 
 Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
 past I have been using guidance in accordance with
 http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
 but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
 E.g.
 
 INFO: Deploying configuration descriptor wombra.xml  This is my context
 fragment
 from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
 SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
 target matching [xX][mM][lL] is not allowed.
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor wombra.xml
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 some more
 ...
 
 My configuration descriptor is as follows
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/home/lewis/Downloads/wombra/wombra.war
 crossContext=true
  Environment name=solr/home type=java.lang.String
 value=/home/lewis/Downloads/wombra override=true/
 /Context
 
 Preferably I would upload a WAR file, but I have been working well with
 the configuration I have been using up until now therefore I didn't
 question change.
 I am unfamiliar with the above errors. Can anyone please point me in the
 right direction?
 
 Thank you
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education's Widening Participation Initiative of the 
 Year 2009 and Herald Society's Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education's Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: SOLR building problems

2011-03-17 Thread royr
java version 1.6.0_21
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) Server VM (build 17.0-b16, mixed mode)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-building-problems-tp2692916p2693574.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread McGibbney, Lewis John
Hi François,

Thank you for your reply. I had made a simple mistake of including comments 
before
'?xml version=1.0 encoding=utf-8?', therefore I was getting a SAX error.
As you have correctly pointed out, it is not essential to include the snippet 
as above in the context file (if using one), however it might be useful to know 
that Tomcat 7 now validates XML files by default. In time I will get round to 
editing the wiki accordingly to mitigate against this in the future.

Thanks for looking in to this.

Lewis
___
From: François Schiettecatte [fschietteca...@gmail.com]
Sent: 17 March 2011 13:47
To: solr-user@lucene.apache.org
Subject: Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

Lewis

My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
context file and it does not have the xml preamble your has, specifically: 
'?xml version=1.0 encoding=utf-8?',


Here is my context file:

Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
value=/home/omim/index/ override=true /
/Context
---

Hope this helps.

Cheers

François

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Erik Hatcher
Yes, pivot faceting is committed to trunk.  But is not part of upcoming 3.1 
release.

Erik

On Mar 16, 2011, at 15:00 , McGibbney, Lewis John wrote:

 Hi Erik,
 
 I have been reading about the progression of SOLR-792 into pivot faceting, 
 however can you expand to comment on
 where it is committed. Are you referring to trunk?
 The reason I am asking is that I have been using 1.4.1 for some time now and 
 have been thinking of upgrading to trunk... or branch
 
 Thank you Lewis
 
 From: Erik Hatcher [erik.hatc...@gmail.com]
 Sent: 16 March 2011 17:36
 To: solr-user@lucene.apache.org
 Subject: Re: hierarchical faceting, SOLR-792 - confused on config
 
 Sorry, I missed the original mail on this thread
 
 I put together that hierarchical faceting wiki page a couple of years ago 
 when helping a customer evaluate SOLR-64 vs. SOLR-792 vs.other approaches.  
 Since then, SOLR-792 morphed and is committed as pivot faceting.  SOLR-64 
 spawned a PathTokenizer which is part of Solr now too.
 
 Recently Toke updated that page with some additional info.  It's definitely 
 not a how to page, and perhaps should get renamed/moved/revamped?  Toke?
 
Erik
 
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education’s Widening Participation Initiative of the 
 Year 2009 and Herald Society’s Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education’s Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html



Re: Solr Autosuggest help

2011-03-17 Thread rahul
hi,

We have found that 'EnglishPorterFilterFactory' causes that issue. I believe
that is used for stemming words. Once we commented that factory, it works
fine.

And another thing, currently I am checking about how the word 'sci/tech'
will be indexed in solr. As mentioned in my previous email, if I search on
sci/tech it wont send any results. But solr has the terms as sci/tech. When
I search on other terms which also contain sci/tech, it returns both the
words.

Please let me know, if you have any idea regarding that.. If I came to know
I will update this thread.

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: hierarchical faceting, SOLR-792 - confused on config

2011-03-17 Thread Erik Hatcher

On Mar 16, 2011, at 14:53 , Jonathan Rochkind wrote:

 Interesting, any documentation on the PathTokenizer anywhere? Or just have to 
 find and look at the source? That's something I hadn't known about, which may 
 be useful to some stuff I've been working on depending on how it works.

  
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory

Sorry, I said PathTokenizer which is what SOLR-1057 called it for a bit 
before it got renamed.

Erik



Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread François Schiettecatte
Pierre

That is a very good point, I have been caught in the past by poor xml (RSS 
feeds) that included control characters before the '?xml...' .

And I have add the preamble to my solr.xml files for good form :)

François


On Mar 17, 2011, at 10:02 AM, Pierre GOSSE wrote:

 I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my 
 config file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. 
 Haven't try with 7.0.11 yet.
 
 I wonder why your exception point to line 4 column 6, however. Shouldn't it 
 point to line 1 column 1 ? Do you have some blank lines at the start of your 
 XML file or some non blank lines ?
 
 Pierre
 
 -Message d'origine-
 De : François Schiettecatte [mailto:fschietteca...@gmail.com] 
 Envoyé : jeudi 17 mars 2011 14:48
 À : solr-user@lucene.apache.org
 Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 
 
 Lewis
 
 My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my 
 context file and it does not have the xml preamble your has, specifically: 
 '?xml version=1.0 encoding=utf-8?', 
 
 
 Here is my context file:
 
 Context 
 docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war 
 debug=0 crossContext=true 
   Environment name=solr/home type=java.lang.String 
 value=/home/omim/index/ override=true /
 /Context
 ---
 
 Hope this helps.
 
 Cheers
 
 François
 
 
 On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote:
 
 Hello list,
 
 Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the
 past I have been using guidance in accordance with
 http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
 but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems
 E.g.
 
 INFO: Deploying configuration descriptor wombra.xml  This is my context
 fragment
 from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost
 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError
 SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction
 target matching [xX][mM][lL] is not allowed.
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig
 deployDescriptor
 SEVERE: Error deploying configuration descriptor wombra.xml
 org.xml.sax.SAXParseException: The processing instruction target
 matching [xX][mM][lL] is not allowed.
 ...
 some more
 ...
 
 My configuration descriptor is as follows
 ?xml version=1.0 encoding=utf-8?
 Context docBase=/home/lewis/Downloads/wombra/wombra.war
 crossContext=true
 Environment name=solr/home type=java.lang.String
 value=/home/lewis/Downloads/wombra override=true/
 /Context
 
 Preferably I would upload a WAR file, but I have been working well with
 the configuration I have been using up until now therefore I didn't
 question change.
 I am unfamiliar with the above errors. Can anyone please point me in the
 right direction?
 
 Thank you
 Lewis
 
 Glasgow Caledonian University is a registered Scottish charity, number 
 SC021474
 
 Winner: Times Higher Education's Widening Participation Initiative of the 
 Year 2009 and Herald Society's Education Initiative of the Year 2009.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
 
 Winner: Times Higher Education's Outstanding Support for Early Career 
 Researchers of the Year 2010, GCU as a lead with Universities Scotland 
 partners.
 http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
 



Re: Parent-child options

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 The dreaded parent-child without denormalization question.  What are one's
 options for the following example:

 parent: shoes
 3 children. each with 2 attributes/fields: color and size
  * color: red black orange
  * size: 10 11 12

 The goal is to be able to search for:
 1) color:red AND size:10 and get 1 hit for the above
 2) color:red AND size:12 and get *no* matches because there are no red shoes 
 of
 size 12, only size 10.

What if you had this instead:

  color: red red orange
  size: 10 11 12

Do you need for color:red to return 1 or 2 (i.e. is the final answer
in units of child hits or parent hits)?

-Yonik
http://lucidimagination.com


Re: Replication slows down massively during high load

2011-03-17 Thread Shawn Heisey

On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:

Unfortunately, this doesn't seem to be the problem. The queries themselves are 
running fine. The problem is that the replications is crawling when there are 
many queries going on and that the replication speed stays low even after the 
load is gone.


If you run iostat 5 what are typical values on each iteration for the 
various CPU states while you're doing load testing and replication at 
the same time?  In particular, %iowait is important.




from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.

Regards,
Bernd


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Gora Mohanty
On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

 Is there a way to have a kind of casting for copyField?

 I have author names in multiValued string field and need a sorting on it,
 but sort on field is only for multiValued=false.

 I'm trying to get multiValued content from one field to a
 non-multiValued text or string field for sorting.
 And this, if possible, during loading with copyField.

 Or any other solution?
[...]

Not sure about CopyField, but you could use a transformer to
extract values from a multiValued field, and stick them into a
single-valued field.

Regards,
Gora


Rename fields in a query

2011-03-17 Thread Fabiano Nunes
Given a Query object (name:firefox name:opera), is it possible 'rename'
the fields names to, for example, (content:firefox content:opera)?


Re: Sorting on multiValued fields via function query

2011-03-17 Thread Yonik Seeley
On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Also... if lucene is already capable of sorting on multi-valued field by
 choosing the largest value largest vs. smallest is presumably just
 arbitrary there, there is presumably no performance implication to choosing
 the smallest instead of the largest. It just chooses the largest, according
 to Yonik.

It's a little more complicated than that.
It's not so much an explicit feature in lucene, but just what
naturally happens when building the field cache via uninverting an
indexed field.

It's pretty much this:

for every term in the field:
  for every document that matches that term:
value[document] = term

And since terms are iterated from smallest to largest (and no, you
can't reverse this)
larger values end up overwriting smaller values.
There's no simple patch to pick the smallest rather than the largest.

In the past, lucene used to try and detect this multi-valued case by
checking the number of values set in the whole array.  This was
unreliable though and the check was discarded.

-Yonik
http://lucidimagination.com


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

 Is there a way to have a kind of casting for copyField?

 I have author names in multiValued string field and need a sorting on it,
 but sort on field is only for multiValued=false.

 I'm trying to get multiValued content from one field to a
 non-multiValued text or string field for sorting.
 And this, if possible, during loading with copyField.

 Or any other solution?

 I need this solution due to patch SOLR-2339, which is now more strict.
 May be anyone else also.

Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling


Good idea.
Was also just looking into this area.

Assuming my input record looks like this:
documents
  document id=foobar
element name=authorvalueauthor_1 ; author_2 ; 
author_3/value/element
  /document
/documents

Do you know if I can use something like this:
...
entity name=records processor=XPathEntityProcessor
transformer=RegexTransformer
...
field column=author  
xpath=/documents/document/element[@name='author']/value /
field column=author_sort 
xpath=/documents/document/element[@name='author']/value /
field column=author  splitBy= ;  /
...

To just double the input and make author multiValued and author_sort a string 
field?

Regards
Bernd


Am 17.03.2011 15:39, schrieb Gora Mohanty:

On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de  wrote:


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

[...]

Not sure about CopyField, but you could use a transformer to
extract values from a multiValued field, and stick them into a
single-valued field.

Regards,
Gora


Re: Sorting on multiValued fields via function query

2011-03-17 Thread Bill Bell
Here is a work around. Stick the high value and low value into other fields. 
Use those fields for sorting.

Bill Bell
Sent from mobile


On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote:

 On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Also... if lucene is already capable of sorting on multi-valued field by
 choosing the largest value largest vs. smallest is presumably just
 arbitrary there, there is presumably no performance implication to choosing
 the smallest instead of the largest. It just chooses the largest, according
 to Yonik.
 
 It's a little more complicated than that.
 It's not so much an explicit feature in lucene, but just what
 naturally happens when building the field cache via uninverting an
 indexed field.
 
 It's pretty much this:
 
 for every term in the field:
  for every document that matches that term:
value[document] = term
 
 And since terms are iterated from smallest to largest (and no, you
 can't reverse this)
 larger values end up overwriting smaller values.
 There's no simple patch to pick the smallest rather than the largest.
 
 In the past, lucene used to try and detect this multi-valued case by
 checking the number of values set in the whole array.  This was
 unreliable though and the check was discarded.
 
 -Yonik
 http://lucidimagination.com


Re: Sorting on multiValued fields via function query

2011-03-17 Thread Bill Bell
By the way, this could be done automatically by Solr or Lucene behind the 
scenes. 

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:02 AM, Bill Bell billnb...@gmail.com wrote:

 Here is a work around. Stick the high value and low value into other fields. 
 Use those fields for sorting.
 
 Bill Bell
 Sent from mobile
 
 
 On Mar 17, 2011, at 8:49 AM, Yonik Seeley yo...@lucidimagination.com wrote:
 
 On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 Also... if lucene is already capable of sorting on multi-valued field by
 choosing the largest value largest vs. smallest is presumably just
 arbitrary there, there is presumably no performance implication to choosing
 the smallest instead of the largest. It just chooses the largest, according
 to Yonik.
 
 It's a little more complicated than that.
 It's not so much an explicit feature in lucene, but just what
 naturally happens when building the field cache via uninverting an
 indexed field.
 
 It's pretty much this:
 
 for every term in the field:
 for every document that matches that term:
   value[document] = term
 
 And since terms are iterated from smallest to largest (and no, you
 can't reverse this)
 larger values end up overwriting smaller values.
 There's no simple patch to pick the smallest rather than the largest.
 
 In the past, lucene used to try and detect this multi-valued case by
 checking the number of values set in the whole array.  This was
 unreliable though and the check was discarded.
 
 -Yonik
 http://lucidimagination.com


Re: Sorting on multiValued fields via function query

2011-03-17 Thread Jonathan Rochkind

Aha, oh well, not quite as good/flexible as I hoped.

Still, if lucene is now behaving somewhat more predictably/rationally 
when sorting on multi-valued fields, then I think, in response to your 
other email on a similar thread, perhaps SOLR-2339  is now a mistake.


When lucene was returning completely unpredictable results -- and even 
sometimes crashing entirely -- when sorting on a multi-valued field --- 
then I think in that situation it made a lot of sense for Solr to 
prevent you from doing that, which is I think what SOLR-2339 does?  So I 
don't think it was neccesarily a mistake in that context.


But if lucene now can sort a multi-valued field without crashing when 
there are 'too many' unique values, and with easily described and 
predictable semantics (use the minimal value in the multi-valued field 
as sort key) -- then it probably makes more sense for Solr to let you do 
that if you really want to, give you enough rope to hang yourself.


Jonathan

On 3/17/2011 10:49 AM, Yonik Seeley wrote:

On Wed, Mar 16, 2011 at 6:08 PM, Jonathan Rochkindrochk...@jhu.edu  wrote:

Also... if lucene is already capable of sorting on multi-valued field by
choosing the largest value largest vs. smallest is presumably just
arbitrary there, there is presumably no performance implication to choosing
the smallest instead of the largest. It just chooses the largest, according
to Yonik.

It's a little more complicated than that.
It's not so much an explicit feature in lucene, but just what
naturally happens when building the field cache via uninverting an
indexed field.

It's pretty much this:

for every term in the field:
   for every document that matches that term:
 value[document] = term

And since terms are iterated from smallest to largest (and no, you
can't reverse this)
larger values end up overwriting smaller values.
There's no simple patch to pick the smallest rather than the largest.

In the past, lucene used to try and detect this multi-valued case by
checking the number of values set in the whole array.  This was
unreliable though and the check was discarded.

-Yonik
http://lucidimagination.com



Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bill Bell
Do you use Dih handler? A script can do this easily.

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:02 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de 
wrote:

 
 Good idea.
 Was also just looking into this area.
 
 Assuming my input record looks like this:
 documents
  document id=foobar
element name=authorvalueauthor_1 ; author_2 ; 
 author_3/value/element
  /document
 /documents
 
 Do you know if I can use something like this:
 ...
 entity name=records processor=XPathEntityProcessor
transformer=RegexTransformer
 ...
 field column=author  
 xpath=/documents/document/element[@name='author']/value /
 field column=author_sort 
 xpath=/documents/document/element[@name='author']/value /
 field column=author  splitBy= ;  /
 ...
 
 To just double the input and make author multiValued and author_sort a string 
 field?
 
 Regards
 Bernd
 
 
 Am 17.03.2011 15:39, schrieb Gora Mohanty:
 On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de  wrote:
 
 Is there a way to have a kind of casting for copyField?
 
 I have author names in multiValued string field and need a sorting on it,
 but sort on field is only for multiValued=false.
 
 I'm trying to get multiValued content from one field to a
 non-multiValued text or string field for sorting.
 And this, if possible, during loading with copyField.
 
 Or any other solution?
 [...]
 
 Not sure about CopyField, but you could use a transformer to
 extract values from a multiValued field, and stick them into a
 single-valued field.
 
 Regards,
 Gora


Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann
On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote:

On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:
Unfortunately, this doesn't seem to be the problem. The queries
themselves are running fine. The problem is that the replications is
crawling when there are many queries going on and that the replication
speed stays low even after the load is gone.

If you run iostat 5 what are typical values on each iteration for
the various CPU states while you're doing load testing and replication
at the same time?  In particular, %iowait is important.



CPU stats from top (iostat doesn't seem to show CPU load correctly):

90.1%us,  4.5%sy,  0.0%ni,  5.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Seems like I/O is not the bottleneck here.

Other interesting thing: When Solr starts its replication under heavy
load, it tries to download the whole index from master.

From /solr/admin/replication/index.jsp:

Current Replication Status

Start Time: Thu Mar 17 15:57:20 CET 2011
Files Downloaded: 9 / 163
Downloaded: 83,04 MB / 97,75 GB [0.0%]
Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%]
Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 
KB/s


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling

Hi Yonik,

actually some applications misused sorting on a multiValued field,
like VuFind. And as a matter oft fact also FAST doesn't support this
because it doesn't make sense.
FAST distinguishes between multiValue and singleValue by just adding
the seperator-FieldAttribute to the field. So I moved this from FAST
index-profile to Solr DIH and placed the seperator there.

But now I'm looking for a solution for VuFind.
Easiest thing would be to have a kind of casting, may be for copyField.

Regards,
Bernd


Am 17.03.2011 15:58, schrieb Yonik Seeley:

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de  wrote:


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.


Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling

Hi Bill,
yes DIH is in use.

Thanks,
Bernd

Am 17.03.2011 16:09, schrieb Bill Bell:

Do you use Dih handler? A script can do this easily.

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:02 AM, Bernd Fehlingbernd.fehl...@uni-bielefeld.de  
wrote:



Good idea.
Was also just looking into this area.

Assuming my input record looks like this:
documents
  document id=foobar
element name=authorvalueauthor_1 ; author_2 ; 
author_3/value/element
  /document
/documents

Do you know if I can use something like this:
...
entity name=records processor=XPathEntityProcessor
transformer=RegexTransformer
...
field column=author  
xpath=/documents/document/element[@name='author']/value /
field column=author_sort 
xpath=/documents/document/element[@name='author']/value /
field column=author  splitBy= ;  /
...

To just double the input and make author multiValued and author_sort a string 
field?

Regards
Bernd


Am 17.03.2011 15:39, schrieb Gora Mohanty:

On Thu, Mar 17, 2011 at 8:04 PM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de   wrote:


Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

[...]

Not sure about CopyField, but you could use a transformer to
extract values from a multiValued field, and stick them into a
single-valued field.

Regards,
Gora


--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Replication slows down massively during high load

2011-03-17 Thread Bill Bell
You could always rsync the index dir and reload (old scripts). But this is 
still something we should investigate. I had this same issue on high load and 
never really found a solution. Did you try another Nic card? See if the Nic is 
configured right? Routing? Speed of transfer?

Bill Bell
Sent from mobile


On Mar 17, 2011, at 9:11 AM, Vadim Kisselmann v.kisselm...@googlemail.com 
wrote:

 On Mar 17, 2011, at 3:19 PM, Shawn Heisey wrote:
 
 On 3/17/2011 3:43 AM, Vadim Kisselmann wrote:
 Unfortunately, this doesn't seem to be the problem. The queries
 themselves are running fine. The problem is that the replications is
 crawling when there are many queries going on and that the replication
 speed stays low even after the load is gone.
 
 If you run iostat 5 what are typical values on each iteration for
 the various CPU states while you're doing load testing and replication
 at the same time?  In particular, %iowait is important.
 
 
 
 CPU stats from top (iostat doesn't seem to show CPU load correctly):
 
 90.1%us,  4.5%sy,  0.0%ni,  5.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
 
 Seems like I/O is not the bottleneck here.
 
 Other interesting thing: When Solr starts its replication under heavy
 load, it tries to download the whole index from master.
 
 From /solr/admin/replication/index.jsp:
 
Current Replication Status
 
Start Time: Thu Mar 17 15:57:20 CET 2011
Files Downloaded: 9 / 163
Downloaded: 83,04 MB / 97,75 GB [0.0%]
Downloading File: _d5x.nrm, Downloaded: 86,82 KB / 86,82 KB [100.0%]
Time Elapsed: 419s, Estimated Time Remaining: 504635s, Speed: 202,94 KB/s


Re: Parent-child options

2011-03-17 Thread Otis Gospodnetic
Hi,



- Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: Parent-child options
 
 On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com  wrote:
  The dreaded parent-child without denormalization question.  What  are one's
  options for the following example:
 
  parent:  shoes
  3 children. each with 2 attributes/fields: color and size
* color: red black orange
   * size: 10 11 12
 
  The goal is  to be able to search for:
  1) color:red AND size:10 and get 1 hit for the  above
  2) color:red AND size:12 and get *no* matches because there are no  red 
  shoes 
of
  size 12, only size 10.
 
 What if you had this  instead:
 
   color: red red orange
   size: 10 11 12
 
 Do  you need for color:red to return 1 or 2 (i.e. is the final answer
 in units of  child hits or parent hits)?

The final answer is the parent, which is shoes in this example.
So:
if the query is color:red AND size:10 the answer is: Yes, we got red shoes size 
10
if the query is color:red AND size:11 the answer is: Yes, we got red shoes size 
11
if the query is color:red AND size:12 the answer is: No, we don't have red 
shoes 
size 12

Thanks,
Otis


Re: Solr Autosuggest help

2011-03-17 Thread Otis Gospodnetic
Rahul,

Go to your Solr Admin Analysis page, enter sci/tech, check appropriate check 
boxes, and see how sci/tech gets analyzed.  This will lead you in the right 
direction.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: rahul asharud...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, March 17, 2011 10:12:27 AM
 Subject: Re: Solr Autosuggest help
 
 hi,
 
 We have found that 'EnglishPorterFilterFactory' causes that issue. I  believe
 that is used for stemming words. Once we commented that factory, it  works
 fine.
 
 And another thing, currently I am checking about how the  word 'sci/tech'
 will be indexed in solr. As mentioned in my previous email,  if I search on
 sci/tech it wont send any results. But solr has the terms as  sci/tech. When
 I search on other terms which also contain sci/tech, it  returns both the
 words.
 
 Please let me know, if you have any idea  regarding that.. If I came to know
 I will update this  thread.
 
 thanks.
 
 
 
 --
 View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-help-tp2580944p2693601.html
 Sent  from the Solr - User mailing list archive at Nabble.com.
 


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Jonathan Rochkind
Perhaps easiest thing for you right now, that you can do in any version 
of Solr, is translate your data at indexing time so you don't have to 
sort on a multi-valued field.  Put the stuff in an additional field for 
sorting, where at index time you only put the greatest or least value 
(your choice) from a multi-valued set in, to have a single-valued field.


Your sorting on a multi-valued field before, while Solr let you, was 
almost certainly resulting in unpredictable results in some cases, that 
you just hadn't noticed. Better to fix it up so it's predictable and 
reliable instead, no?


On 3/17/2011 11:14 AM, Bernd Fehling wrote:

Hi Yonik,

actually some applications misused sorting on a multiValued field,
like VuFind. And as a matter oft fact also FAST doesn't support this
because it doesn't make sense.
FAST distinguishes between multiValue and singleValue by just adding
the seperator-FieldAttribute to the field. So I moved this from FAST
index-profile to Solr DIH and placed the seperator there.

But now I'm looking for a solution for VuFind.
Easiest thing would be to have a kind of casting, may be for copyField.

Regards,
Bernd


Am 17.03.2011 15:58, schrieb Yonik Seeley:

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de   wrote:

Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.

Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com


Re: Parent-child options

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 11:21 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,



 - Original Message 
 From: Yonik Seeley yo...@lucidimagination.com
 Subject: Re: Parent-child options

 On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com  wrote:
  The dreaded parent-child without denormalization question.  What  are one's
  options for the following example:
 
  parent:  shoes
  3 children. each with 2 attributes/fields: color and size
    * color: red black orange
   * size: 10 11 12
 
  The goal is  to be able to search for:
  1) color:red AND size:10 and get 1 hit for the  above
  2) color:red AND size:12 and get *no* matches because there are no  red 
  shoes
of
  size 12, only size 10.

 What if you had this  instead:

   color: red red orange
   size: 10 11 12

 Do  you need for color:red to return 1 or 2 (i.e. is the final answer
 in units of  child hits or parent hits)?

 The final answer is the parent, which is shoes in this example.
 So:
 if the query is color:red AND size:10 the answer is: Yes, we got red shoes 
 size
 10
 if the query is color:red AND size:11 the answer is: Yes, we got red shoes 
 size
 11
 if the query is color:red AND size:12 the answer is: No, we don't have red 
 shoes
 size 12

Then yes, the join patch would work (as long as it's just filtering
and you don't need relevancy of child hits to propagate to the
parent).

parent {category:shoes}
child {parent:shoes, color:red, size:10}

q={!join from=parent to=category}color:red AND size:10

If you had a query on the parent type docs, the join could also be
used as an fq.

-Yonik
http://lucidimagination.com


FuzzyQuery rewrite

2011-03-17 Thread Fabiano Nunes
Rewriting fuzzy queries in spellchecker index is a good practice?
When I rewrite these queries in the main index, the rewriting time is about
3.5 - 4 secs. Now, this rewrites takes a few milliseconds.


Re: from multiValued field to non-multiValued field with copyField?

2011-03-17 Thread Bernd Fehling

 Better to fix it up so it's predictable and reliable instead, no?

Yes, you are absolutely right. Thats why I'm looking into this.

But how would i stuff, say always author_1, from a multi-valued field
into a single-valued (string or text) field?

Ok, another solution comes up to my mind.
Writing a processor for updateRequestProcessorChain, that might work.

Regards,
Bernd


Am 17.03.2011 16:27, schrieb Jonathan Rochkind:

Perhaps easiest thing for you right now, that you can do in any version of 
Solr, is translate your data at indexing time so you don't have to
sort on a multi-valued field. Put the stuff in an additional field for sorting, 
where at index time you only put the greatest or least value
(your choice) from a multi-valued set in, to have a single-valued field.

Your sorting on a multi-valued field before, while Solr let you, was almost 
certainly resulting in unpredictable results in some cases, that you
just hadn't noticed. Better to fix it up so it's predictable and reliable 
instead, no?

On 3/17/2011 11:14 AM, Bernd Fehling wrote:

Hi Yonik,

actually some applications misused sorting on a multiValued field,
like VuFind. And as a matter oft fact also FAST doesn't support this
because it doesn't make sense.
FAST distinguishes between multiValue and singleValue by just adding
the seperator-FieldAttribute to the field. So I moved this from FAST
index-profile to Solr DIH and placed the seperator there.

But now I'm looking for a solution for VuFind.
Easiest thing would be to have a kind of casting, may be for copyField.

Regards,
Bernd


Am 17.03.2011 15:58, schrieb Yonik Seeley:

On Thu, Mar 17, 2011 at 10:34 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

Is there a way to have a kind of casting for copyField?

I have author names in multiValued string field and need a sorting on it,
but sort on field is only for multiValued=false.

I'm trying to get multiValued content from one field to a
non-multiValued text or string field for sorting.
And this, if possible, during loading with copyField.

Or any other solution?

I need this solution due to patch SOLR-2339, which is now more strict.
May be anyone else also.

Hmmm, you're the second person that's relied on that (sorting on a
multiValued field working).
Was SOLR-2339 a mistake?

-Yonik
http://lucidimagination.com


--
*
Bernd FehlingUniversitätsbibliothek Bielefeld
Dipl.-Inform. (FH)Universitätsstr. 25
Tel. +49 521 106-4060   Fax. +49 521 106-4052
bernd.fehl...@uni-bielefeld.de33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Replication slows down massively during high load

2011-03-17 Thread Vadim Kisselmann
Hi Bill,

 You could always rsync the index dir and reload (old scripts).

I used them previously but was getting problems with them. The
application querying the Solr doesn't cause enough load on it to
trigger the issue. Yet.

 But this is still something we should investigate.

Indeed :-)

 See if the Nic is configured right? Routing? Speed of transfer?

Network doesn't seem to be the problem. Testing with iperf from slave
to master yields a full gigabit, even while Solrmeter is hammering the
server.

 Bill Bell

Vadim


Re: Parent-child options

2011-03-17 Thread Jonathan Rochkind
The standard answer, which is a kind of de-normalizing, is to index 
tokens like this:


red_10   red_11orange_12

in another field, you could do these things with size first:

10_red 11_red 12_orange

Now if you want to see what sizes of red you have, you can do a facet 
query with facet.prefix=red_ .  You'll need to do a bit of 
parsing/interpreting client size to translate from the results you get 
(red_10, red_11) to telling the users sizes 10 and 11 are 
available.  The second field with size first lets you do the same thing 
to answer what colors do we have in size X?.


That gets unmanageable with more than 2-3 facet combinations, but with 
just 2 (or, pushing it, 3), can work out okay. You'd probably ALSO want 
to keep the facets you have with plain values red red orange etc, to 
support that first level of user-implementing. There is a bit more work 
to do on client side with this approach, Solr isn't just giving you 
exactly what you want in it's response, you've got to have logic for 
when to use the top-level facets and when to go to that second-level 
combo facet (red_12), but it's do-able.


On 3/17/2011 11:21 AM, Otis Gospodnetic wrote:

Hi,



- Original Message 

From: Yonik Seeleyyo...@lucidimagination.com
Subject: Re: Parent-child options

On Thu, Mar 17, 2011 at 1:49 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com   wrote:

The dreaded parent-child without denormalization question.  What  are one's
options for the following example:

parent:  shoes
3 children. each with 2 attributes/fields: color and size
   * color: red black orange
  * size: 10 11 12

The goal is  to be able to search for:
1) color:red AND size:10 and get 1 hit for the  above
2) color:red AND size:12 and get *no* matches because there are no  red shoes

of

size 12, only size 10.

What if you had this  instead:

   color: red red orange
   size: 10 11 12

Do  you need for color:red to return 1 or 2 (i.e. is the final answer
in units of  child hits or parent hits)?

The final answer is the parent, which is shoes in this example.
So:
if the query is color:red AND size:10 the answer is: Yes, we got red shoes size
10
if the query is color:red AND size:11 the answer is: Yes, we got red shoes size
11
if the query is color:red AND size:12 the answer is: No, we don't have red shoes
size 12

Thanks,
Otis



memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Geeta Subramanian
Hi,

I am very new to SOLR and facing a lot of issues when using SOLR to push large 
documents.
I have solr running in tomcat. I have allocated about 4gb memory (-Xmx) but I 
am pushing about twenty five 100 mb documents and gives heap space and fails.

Also I tried pushing just 1 document. It went thru successfully, but the tomcat 
memory does not come down. It consumes about a gig memory for just one 100 mb 
document and does not release it.

Please let me know if I am making any mistake in configuration/ or set up.

Here is the stack trace:
SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuffer.append(StringBuffer.java:306)
at java.io.StringWriter.write(StringWriter.java:77)
at 
com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.java:1570)
at 
com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.java:1488)
at 
com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLStream.java:1529)
at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.characters(TransformerHandlerImpl.java:168)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
at 
com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)


Thanks for help,
Geeta













**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.



Info about Debugging SOLR in Eclipse

2011-03-17 Thread Geeta Subramanian
Hi,

Can some please let me know the steps on how can I debug the solr code in my 
eclipse?

I tried to compile the source, use the jars and place in tomcat where I am 
running solr. And do remote debugging, but it did not stop at any break point.
I also tried to write a sample standalone java class to push the document. But 
I stopped at solr j classes and not solr server classes.


Please let me know if I am making any mistake.

Regards,
Geeta 













**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.



Re: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Markus Jelsma
Hi,

25*100MB=2.5GB will most likely fail with just 4GB of heap space. But 
consecutive single `pushes` as you call it, of 25MB documents should work 
fine. Heap memory will only drop after the garbage collector comes along.

Cheers,

On Thursday 17 March 2011 17:12:46 Geeta Subramanian wrote:
 Hi,
 
 I am very new to SOLR and facing a lot of issues when using SOLR to push
 large documents. I have solr running in tomcat. I have allocated about 4gb
 memory (-Xmx) but I am pushing about twenty five 100 mb documents and
 gives heap space and fails.
 
 Also I tried pushing just 1 document. It went thru successfully, but the
 tomcat memory does not come down. It consumes about a gig memory for just
 one 100 mb document and does not release it.
 
 Please let me know if I am making any mistake in configuration/ or set up.
 
 Here is the stack trace:
 SEVERE: java.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:2882)
   at
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:
 100) at
 java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at
 java.lang.StringBuffer.append(StringBuffer.java:306)
   at java.io.StringWriter.write(StringWriter.java:77)
   at
 com.sun.org.apache.xml.internal.serializer.ToStream.processDirty(ToStream.
 java:1570) at
 com.sun.org.apache.xml.internal.serializer.ToStream.characters(ToStream.ja
 va:1488) at
 com.sun.org.apache.xml.internal.serializer.ToHTMLStream.characters(ToHTMLS
 tream.java:1529) at
 com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.charac
 ters(TransformerHandlerImpl.java:168) at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor
 ator.java:124) at
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.j
 ava:153) at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor
 ator.java:124) at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecor
 ator.java:124) at
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:
 39) at
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
 at
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
 at
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:
 151) at
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.jav
 a:175) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144) at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142) at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99) at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVEx
 tractingDocumentLoader.java:349) at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Content
 StreamHandlerBase.java:54) at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
 e.java:131) at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleReque
 st(RequestHandlers.java:237) at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
   at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java
 :337) at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.jav
 a:240) at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicati
 onFilterChain.java:235) at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilter
 Chain.java:206) at
 filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.jav
 a:122)
 
 
 Thanks for help,
 Geeta
 
 
 
 
 
 
 
 
 
 
 
 
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Markus Jelsma

http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-
eclipse



On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote:
 Hi,
 
 Can some please let me know the steps on how can I debug the solr code in
 my eclipse?
 
 I tried to compile the source, use the jars and place in tomcat where I am
 running solr. And do remote debugging, but it did not stop at any break
 point. I also tried to write a sample standalone java class to push the
 document. But I stopped at solr j classes and not solr server classes.
 
 
 Please let me know if I am making any mistake.
 
 Regards,
 Geeta
 
 
 
 
 
 
 
 
 
 
 
 
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
        at 
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(CVExtractingDocumentLoader.java:349)

Looks like you're using a custom update handler.  Perhaps that's
accidentally hanging onto memory?

-Yonik
http://lucidimagination.com


RE: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Geeta Subramanian
Hi,

 Thanks for the reply.
I am sorry, the logs from where I posted does have a Custom Update Handler.

But I have a local setup, which does not have a custome update handler, its as 
its downloaded from SOLR site, even that gives me heap space.

at java.util.Arrays.copyOf(Unknown Source)  
at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)   
at java.lang.AbstractStringBuilder.append(Unknown Source)   
at java.lang.StringBuilder.append(Unknown Source)   
at org.apache.solr.handler.extraction.Solrtik   
ContentHandler.characters(SolrContentHandler.java:257)  
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
   
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
 
at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)   
 
at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)   
at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)   
at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)  
 
at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
 
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)   
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)   
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)  
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112) 
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
  
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
 
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
   
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337) 
 



Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
ideal heap space set?
Also, I see that when I push a single document of 100 mb, in task manager I see 
that about 900 mb memory is been used up, and some subsequent push keeps the 
memory about 900mb, so at what point there can be OOM crash?

When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
consumed by char[] , String []. 
How can I find out who is creating these(is it SOLR or TIKA) and free up these 
objects?


Thank you so much for your time and help,



Regards,
Geeta



-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 17 March, 2011 12:21 PM
To: solr-user@lucene.apache.org
Cc: Geeta Subramanian
Subject: Re: memory not getting released in tomcat after pushing large documents

On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
gsubraman...@commvault.com wrote:
        at 
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(
 CVExtractingDocumentLoader.java:349)

Looks like you're using a custom update handler.  Perhaps that's accidentally 
hanging onto memory?

-Yonik
http://lucidimagination.com













**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.



RE: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Geeta Subramanian
Hi Markus,

Thanks, I had already followed the steps of this site.
But I am not able to DEBUG the SOLR classes though I am able to run the solr.

I want to see the code flow from the server side, especially the point where 
solr calls tika and it gets the content from tika.

Thanks for the time  help,
Regards,
Geeta

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: 17 March, 2011 12:22 PM
To: solr-user@lucene.apache.org
Cc: Geeta Subramanian
Subject: Re: Info about Debugging SOLR in Eclipse


http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-
eclipse



On Thursday 17 March 2011 17:17:30 Geeta Subramanian wrote:
 Hi,
 
 Can some please let me know the steps on how can I debug the solr code 
 in my eclipse?
 
 I tried to compile the source, use the jars and place in tomcat where 
 I am running solr. And do remote debugging, but it did not stop at any 
 break point. I also tried to write a sample standalone java class to 
 push the document. But I stopped at solr j classes and not solr server 
 classes.
 
 
 Please let me know if I am making any mistake.
 
 Regards,
 Geeta
 
 
 
 
 
 
 
 
 
 
 
 
 
 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient.  Any unauthorized review, 
 use or distribution by others is strictly prohibited.  If you have 
 received the message in error, please advise the sender by reply email 
 and delete the message. Thank you.
 

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350











**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.



Re: Sorting on multiValued fields via function query

2011-03-17 Thread Chris Hostetter

: But if lucene now can sort a multi-valued field without crashing when there
: are 'too many' unique values, and with easily described and predictable
: semantics (use the minimal value in the multi-valued field as sort key) --
: then it probably makes more sense for Solr to let you do that if you really
: want to, give you enough rope to hang yourself.

(Clarification: it's the the *maximal* value that gets used by lucene in 
that situation) 

I disagree.  

If we do what you describe we'd be relying on users to recognize when the 
sort logic is silently doing something tricky under the covers and make 
a concious decision as to if that was what they want, and if not then 
change their indexing to account for it.  

That seems like a recipe for confusion and unexpected behavior.

with SOLR-2339 in place, we tell users explicitly and up front what you 
are attempting to do can not work as specified and we force them to 
decide in advance how they want to deal with it -- by either indexing the 
lowest value or hte highest value (or both in distinct fields).

As the code stands now: we fail fast and let the person building hte index 
make a decision.  If we silently sort on the maximal value, we leave nasty 
headache for people who don't realize they are missusing a multiValued 
field and then wonder why some sorts don't do what they expect in some 
situations.

Bottom line: from day 1, we have always documented that sorting on 
multiValued fields (or fields that produced more then one document per 
document) didn't work.  If people didn't notice that documentation, they 
aren't likely to notice any documentation that says it will sort on the 
maximal value either -- SOLR-2339 may introduce a pain point for people 
upgrading, but it introduces it early and loudly, not quietly at some 
arbitrary moment in the future when they're beating their heads against a 
desk wondering why some sort isn't working the way they expect it to 
becuase they added some more values to a few documents.




-Hoss


Segments and Memory Correlate?

2011-03-17 Thread danomano
Hi folks, I ran into problem today where I am no longer able to execute any
queries :( due to Out of Memory issues.

I am in the process of investigating the use of different mergeFactors, or
even different merge policies all together.
My question is if I have many segments (i.e. smaller sized segments), will
that also reduce the total memory in RAM required for searching?  (my System
is currently allocated 8GB ram and has a ~255GB index).  (I'm not fully up
on the 'default merge policy' but I believe with a mergeFactor of 10, that
would mean each segment should be approaching about 25Gb? with ~543 million
documents

of note: this is all running on 1 server.

As seen below.

SEVERE: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.search.cache.LongValuesCreator.fillLongValues(LongValuesCreator.java:141)
at
org.apache.lucene.search.cache.LongValuesCreator.validate(LongValuesCreator.java:84)
at
org.apache.lucene.search.cache.LongValuesCreator.create(LongValuesCreator.java:74)
at
org.apache.lucene.search.cache.LongValuesCreator.create(LongValuesCreator.java:37)
at
org.apache.lucene.search.FieldCacheImpl$Cache.createValue(FieldCacheImpl.java:155)
at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:188)
at
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:337)
at
org.apache.lucene.search.FieldComparator$LongComparator.setNextReader(FieldComparator.java:504)
at
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:207)
at org.apache.lucene.search.Searcher.search(Searcher.java:101)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1389)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1285)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:344)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:273)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:210)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1324)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
com.openmarket.servletfilters.LogToCSVFilter.doFilter(LogToCSVFilter.java:89)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
at
com.openmarket.servletfilters.GZipAutoDeflateFilter.doFilter(GZipAutoDeflateFilter.java:66)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
...etc

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Segments-and-Memory-Correlate-tp2694747p2694747.html
Sent from the Solr - User mailing list archive at Nabble.com.


OOM for large files

2011-03-17 Thread Geeta Subramanian
Hi,



I am getting OOM after posting a 100 Mb document to SOLR with trace:

Exception in thread main org.apache.solr.common.SolrException: Java heap 
space  java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Unknown Source)

at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
Source)

at java.lang.AbstractStringBuilder.append(Unknown Source)

at java.lang.StringBuilder.append(Unknown Source)

at org.apache.solr.handler.extraction.Solrtik   
ContentHandler.characters(SolrContentHandler.java:257)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)

at 
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)

at 
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)

at 
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)

at 
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)

at 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)

at 
org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)

at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)

at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)

at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)

at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)

at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)

at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)

at org.apache.solr.se







I have given 1024M memory.

But still this fails, so, can somebody tell me the minimum heap size required 
w.r.t. file size so that document get indexed successfully?



Also just a weird question:

In Tika's code, there is a place where char[] is initialized to 4096. Then when 
this used in StringWriter, if the array is full it does an expandCapacity (as 
highlighted in logs), there is an array copy operation. So with just 4kb, if I 
want to process a 100mb document, a lot of char arrays will be generated and we 
need to depend on GC for getting them cleaned.



Is there any idea, if I change the Tika code to initialize the char array with 
more than ~4k , will there be any performance improvement?



Thanks for your time,

Regards,

Geeta















**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.


Helpful new JVM parameters

2011-03-17 Thread Dyer, James
We're on the final stretch in getting our product database in Production with 
Solr.  We have 13m wide-ish records with quite a few stored fields in a 
single index (no shards).  We sort on at least a dozen fields and facet on 
20-30.  One thing that came up in QA testing is we were getting full gc's due 
to promotion failed conditions.  This led us to believe we were dealing with 
large objects being created and a fragmented old generation.  After improving, 
but not solving, the problem by tweaking conventional jvm parameters, our JVM 
expert learned about some newer tuning params included in Sun/Oracle's JDK 
1.6.0_24 (we're running RHEL x64, but I think these are available on other 
platforms too):

These 3 options dramatically reduced the # objects getting promoted into the 
Old Gen, reducing fragmentation and CMS frequency  time:
-XX:+UseStringCache
-XX:+OptimizeStringConcat
-XX:+UseCompressedStrings

This uses compressed pointers on a 64-bit JVM, significantly reducing the 
memory  performance penalty in using a 64-bit jvm over 32-bit.  This reduced 
our new GC (ParNew) time significantly:
-XX:+UseCompressedOops

The default for this was causing CMS to begin too late sometimes.  (the 
documentated 68% proved false in our case.  We figured it was defaulting close 
to 90%)  Much lower than 75%, though, and CMS ran far too often:
-XX:CMSInitiatingOccupancyFraction=75

This made the stop-the-world pauses during CMS much shorter:
-XX:+CMSParallelRemarkEnabled

We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the 
box), with a 1.2G newSize/maxNewSize.

In case anyone else is having similar issues, we thought we would share our 
experience with these newer options.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311



Re: Helpful new JVM parameters

2011-03-17 Thread Jonathan Rochkind
Awesome, very helpful. Do you maybe want to add this to the Solr wiki 
somewhere?  Finding some advice for JVM tuning for Solr can be 
challenging, and you've explained what you did and why very well.


On 3/17/2011 2:59 PM, Dyer, James wrote:

We're on the final stretch in getting our product database in Production with Solr.  We have 13m 
wide-ish records with quite a few stored fields in a single index (no shards).  We sort on at 
least a dozen fields and facet on 20-30.  One thing that came up in QA testing is we were getting full gc's 
due to promotion failed conditions.  This led us to believe we were dealing with large objects 
being created and a fragmented old generation.  After improving, but not solving, the problem by tweaking 
conventional jvm parameters, our JVM expert learned about some newer tuning params included in 
Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are available on other platforms too):

These 3 options dramatically reduced the # objects getting promoted into the Old 
Gen, reducing fragmentation and CMS frequency  time:
-XX:+UseStringCache
-XX:+OptimizeStringConcat
-XX:+UseCompressedStrings

This uses compressed pointers on a 64-bit JVM, significantly reducing the 
memory  performance penalty in using a 64-bit jvm over 32-bit.  This reduced 
our new GC (ParNew) time significantly:
-XX:+UseCompressedOops

The default for this was causing CMS to begin too late sometimes.  (the 
documentated 68% proved false in our case.  We figured it was defaulting close 
to 90%)  Much lower than 75%, though, and CMS ran far too often:
-XX:CMSInitiatingOccupancyFraction=75

This made the stop-the-world pauses during CMS much shorter:
-XX:+CMSParallelRemarkEnabled

We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on the 
box), with a 1.2G newSize/maxNewSize.

In case anyone else is having similar issues, we thought we would share our 
experience with these newer options.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311




Re: Sorting on multiValued fields via function query

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 2:12 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
 As the code stands now: we fail fast and let the person building hte index
 make a decision.

Indexing two fields when one could work is unfortunate though.
I think what we should support (eventually) is a max() function will also
work on a multi-valued field and select the maximum value (i.e. it will
simply bypass the check for multi-valued fields).

Then one can utilize sort-by-function to do
sort=max(author) asc

-Yonik
http://lucidimagination.com


dismax 1.4.1 and pure negative queries

2011-03-17 Thread Jonathan Rochkind

Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.




Re: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Peter Keegan
Can you use jetty?
http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse

On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian 
gsubraman...@commvault.com wrote:

 Hi,

 Can some please let me know the steps on how can I debug the solr code in
 my eclipse?

 I tried to compile the source, use the jars and place in tomcat where I am
 running solr. And do remote debugging, but it did not stop at any break
 point.
 I also tried to write a sample standalone java class to push the document.
 But I stopped at solr j classes and not solr server classes.


 Please let me know if I am making any mistake.

 Regards,
 Geeta













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 



Re: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Peter Keegan
The instructions refer to the 'Run configuration' menu. Did you try 'Debug
configurations'?


On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan peterlkee...@gmail.comwrote:

 Can you use jetty?


 http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse

 On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:

 Hi,

 Can some please let me know the steps on how can I debug the solr code in
 my eclipse?

 I tried to compile the source, use the jars and place in tomcat where I am
 running solr. And do remote debugging, but it did not stop at any break
 point.
 I also tried to write a sample standalone java class to push the document.
 But I stopped at solr j classes and not solr server classes.


 Please let me know if I am making any mistake.

 Regards,
 Geeta













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 





Adding the suggest component

2011-03-17 Thread Brian Lamb
Hi all,

When I installed Solr, I downloaded the most recent version (1.4.1) I
believe. I wanted to implement the Suggester (
http://wiki.apache.org/solr/Suggester). I copied and pasted the information
there into my solrconfig.xml file but I'm getting the following error:

Error loading class 'org.apache.solr.spelling.suggest.Suggester'

I read up on this error and found that I needed to checkout a newer version
from SVN. I checked out a full version and copied the contents of
src/java/org/apache/spelling/suggest to the same location on my set up.
However, I am still receiving this error.

Did I not put the files in the right place? What am I doing incorrectly?

Thanks,

Brian Lamb


Re: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Yonik Seeley
In your solrconfig.xml,
Are you specifying ramBufferSizeMB or maxBufferedDocs?

-Yonik
http://lucidimagination.com


On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
 Hi,

  Thanks for the reply.
 I am sorry, the logs from where I posted does have a Custom Update Handler.

 But I have a local setup, which does not have a custome update handler, its 
 as its downloaded from SOLR site, even that gives me heap space.

 at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at org.apache.solr.handler.extraction.Solrtik   
 ContentHandler.characters(SolrContentHandler.java:257)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
        at 
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:153)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:124)
        at 
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:39)
        at 
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:61)
        at 
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:113)
        at 
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:151)
        at 
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:175)
        at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
        at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
        at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:193)
        at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:237)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)



 Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
 ideal heap space set?
 Also, I see that when I push a single document of 100 mb, in task manager I 
 see that about 900 mb memory is been used up, and some subsequent push keeps 
 the memory about 900mb, so at what point there can be OOM crash?

 When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
 consumed by char[] , String [].
 How can I find out who is creating these(is it SOLR or TIKA) and free up 
 these objects?


 Thank you so much for your time and help,



 Regards,
 Geeta



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: 17 March, 2011 12:21 PM
 To: solr-user@lucene.apache.org
 Cc: Geeta Subramanian
 Subject: Re: memory not getting released in tomcat after pushing large 
 documents

 On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
        at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load(
 CVExtractingDocumentLoader.java:349)

 Looks like you're using a custom update handler.  Perhaps that's accidentally 
 hanging onto memory?

 -Yonik
 http://lucidimagination.com













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by others is strictly prohibited.  If you have
 received the message in error, please advise the sender by reply
 email and delete the message. Thank you.
 



RE: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Geeta Subramanian
Hi Yonik,

I am not setting the ramBufferSizeMB or maxBufferedDocs params...
DO I need to for Indexing?

Regards,
Geeta

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: 17 March, 2011 3:45 PM
To: Geeta Subramanian
Cc: solr-user@lucene.apache.org
Subject: Re: memory not getting released in tomcat after pushing large documents

In your solrconfig.xml,
Are you specifying ramBufferSizeMB or maxBufferedDocs?

-Yonik
http://lucidimagination.com


On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian 
gsubraman...@commvault.com wrote:
 Hi,

  Thanks for the reply.
 I am sorry, the logs from where I posted does have a Custom Update Handler.

 But I have a local setup, which does not have a custome update handler, its 
 as its downloaded from SOLR site, even that gives me heap space.

 at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown 
 Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at org.apache.solr.handler.extraction.Solrtik   
 ContentHandler.characters(SolrContentHandler.java:257)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at 
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl
 er.java:153)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at 
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at 
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j
 ava:39)
        at 
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java
 :61)
        at 
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:
 113)
        at 
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j
 ava:151)
        at 
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler
 .java:175)
        at 
 org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
        at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99
 )
        at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11
 2)
        at 
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra
 ctingDocumentLoader.java:193)
        at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
 tentStreamHandlerBase.java:54)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
 rBase.java:131)
        at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR
 equest(RequestHandlers.java:237)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
 java:337)



 Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
 ideal heap space set?
 Also, I see that when I push a single document of 100 mb, in task manager I 
 see that about 900 mb memory is been used up, and some subsequent push keeps 
 the memory about 900mb, so at what point there can be OOM crash?

 When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
 consumed by char[] , String [].
 How can I find out who is creating these(is it SOLR or TIKA) and free up 
 these objects?


 Thank you so much for your time and help,



 Regards,
 Geeta



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik 
 Seeley
 Sent: 17 March, 2011 12:21 PM
 To: solr-user@lucene.apache.org
 Cc: Geeta Subramanian
 Subject: Re: memory not getting released in tomcat after pushing large 
 documents

 On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
        at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load
 (
 CVExtractingDocumentLoader.java:349)

 Looks like you're using a custom update handler.  Perhaps that's accidentally 
 hanging onto memory?

 -Yonik
 http://lucidimagination.com













 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient.  Any unauthorized review, 
 use or distribution by others is strictly prohibited.  If you have 
 received the message in error, please advise the sender by reply email 
 and delete the message. Thank you.
 












**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have

Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Markus Jelsma
Hi,

It works just as expected, but not in a phrase query. Get rid of your quotes 
and you'll be fine.

Cheers,

 Should 1.4.1 dismax query parser be able to handle pure negative queries
 like:
 
 q=-foo
 q=-foo -bar
 
 It kind of seems to me trying it out that it can NOT.  Can anyone else
 verify?  The documentation I can find doesn't say one way or another.
 Which is odd because the documentation for straight solr-lucene query
 parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
 straight solr-lucene query parser_can_  handle pure negative.  That
 seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
 misinterpreting or misunderstanding my experimental results.


Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Jonathan Rochkind
My fault for putting in the quotes in the email, I actually don't have 
tests in my quotes, just tried again to make sure.


And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I 
think it does not actually work?


On 3/17/2011 3:52 PM, Markus Jelsma wrote:

Hi,

It works just as expected, but not in a phrase query. Get rid of your quotes
and you'll be fine.

Cheers,


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.


Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Markus Jelsma
Oh i see, i overlooked your first query. A query with one term that is negated 
will yield zero results, it doesn't return all documents because nothing 
matches. It's, if i remember correctly, the same as when you're looking for a 
field that doesn't have a value: q=-field:[* TO *].

 My fault for putting in the quotes in the email, I actually don't have
 tests in my quotes, just tried again to make sure.
 
 And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
 think it does not actually work?
 
 On 3/17/2011 3:52 PM, Markus Jelsma wrote:
  Hi,
  
  It works just as expected, but not in a phrase query. Get rid of your
  quotes and you'll be fine.
  
  Cheers,
  
  Should 1.4.1 dismax query parser be able to handle pure negative queries
  like:
  
  q=-foo
  q=-foo -bar
  
  It kind of seems to me trying it out that it can NOT.  Can anyone else
  verify?  The documentation I can find doesn't say one way or another.
  Which is odd because the documentation for straight solr-lucene query
  parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
  straight solr-lucene query parser_can_  handle pure negative.  That
  seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
  misinterpreting or misunderstanding my experimental results.


Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Erik Hatcher
purely negative queries work with Solr's default (lucene) query parser.  But 
don't with dismax.   Or so that seems with my experience testing this out just 
now, on trunk.

In chatting with Jonathan further off-list we discussed having the best of both 
worlds 

   q={!lucene}*:* AND NOT _query_:{!dismax ...}inverse of original negative 
query

But this of course requires detecting that a query is all negative.  edismax 
can handle purely negative, FWIW, -ipod = +(-DisjunctionMaxQuery((text:ipod)) 
+MatchAllDocsQuery(*:*))

Erik



On Mar 17, 2011, at 16:45 , Markus Jelsma wrote:

 Oh i see, i overlooked your first query. A query with one term that is 
 negated 
 will yield zero results, it doesn't return all documents because nothing 
 matches. It's, if i remember correctly, the same as when you're looking for a 
 field that doesn't have a value: q=-field:[* TO *].
 
 My fault for putting in the quotes in the email, I actually don't have
 tests in my quotes, just tried again to make sure.
 
 And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
 think it does not actually work?
 
 On 3/17/2011 3:52 PM, Markus Jelsma wrote:
 Hi,
 
 It works just as expected, but not in a phrase query. Get rid of your
 quotes and you'll be fine.
 
 Cheers,
 
 Should 1.4.1 dismax query parser be able to handle pure negative queries
 like:
 
 q=-foo
 q=-foo -bar
 
 It kind of seems to me trying it out that it can NOT.  Can anyone else
 verify?  The documentation I can find doesn't say one way or another.
 Which is odd because the documentation for straight solr-lucene query
 parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
 straight solr-lucene query parser_can_  handle pure negative.  That
 seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
 misinterpreting or misunderstanding my experimental results.



Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Jonathan Rochkind
Yeah, looks to me like two or more negated terms does the same thing, 
not just one.


q=-foo -bar -baz

Also always returns zero hits. For the same reason. I understand why 
(sort of), although at the same time there is a logical answer to this 
question -foo -bar -baz, and oddly, 1.4.1 _lucene_ query parser _can_ 
handle it.


Erik Hatcher in IRC gave me one transformation of this query that still 
uses dismax as a unit, but can get you a solution.  (I want to use 
dismax in this case for it's convenient aggregation of multiple fields 
in qf, not so much for actual disjunction-maximum behavior).


defType=lucene
q=*:* AND NOT _query_:{!dismax} foo bar baz

I might be able to work with that in my situation.  But it also seems 
like something that dismax could take care of for you in such a 
situation. It looks from the documentation like the newer (not in 1.4.1) 
edismax does in at least some cases, where the pure negative query is 
inside grouping/subquery parens, it's not clear to me if it does it in 
general or not.


On 3/17/2011 4:45 PM, Markus Jelsma wrote:

Oh i see, i overlooked your first query. A query with one term that is negated
will yield zero results, it doesn't return all documents because nothing
matches. It's, if i remember correctly, the same as when you're looking for a
field that doesn't have a value: q=-field:[* TO *].


My fault for putting in the quotes in the email, I actually don't have
tests in my quotes, just tried again to make sure.

And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
think it does not actually work?

On 3/17/2011 3:52 PM, Markus Jelsma wrote:

Hi,

It works just as expected, but not in a phrase query. Get rid of your
quotes and you'll be fine.

Cheers,


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.


RE: Info about Debugging SOLR in Eclipse

2011-03-17 Thread Geeta Subramanian
Hi All,

Thanks for the help... I am now able to debug my solr. :-)

-Original Message-
From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter 
Keegan
Sent: 17 March, 2011 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Info about Debugging SOLR in Eclipse

The instructions refer to the 'Run configuration' menu. Did you try 'Debug 
configurations'?


On Thu, Mar 17, 2011 at 3:27 PM, Peter Keegan peterlkee...@gmail.comwrote:

 Can you use jetty?


 http://www.lucidimagination.com/developers/articles/setting-up-apache-
 solr-in-eclipse

 On Thu, Mar 17, 2011 at 12:17 PM, Geeta Subramanian  
 gsubraman...@commvault.com wrote:

 Hi,

 Can some please let me know the steps on how can I debug the solr 
 code in my eclipse?

 I tried to compile the source, use the jars and place in tomcat where 
 I am running solr. And do remote debugging, but it did not stop at 
 any break point.
 I also tried to write a sample standalone java class to push the document.
 But I stopped at solr j classes and not solr server classes.


 Please let me know if I am making any mistake.

 Regards,
 Geeta













 **Legal Disclaimer***
 This communication may contain confidential and privileged material 
 for the sole use of the intended recipient.  Any unauthorized review, 
 use or distribution by others is strictly prohibited.  If you have 
 received the message in error, please advise the sender by reply 
 email and delete the message. Thank you.
 














**Legal Disclaimer***
This communication may contain confidential and privileged material
for the sole use of the intended recipient.  Any unauthorized review,
use or distribution by others is strictly prohibited.  If you have
received the message in error, please advise the sender by reply
email and delete the message. Thank you.



Re: dismax 1.4.1 and pure negative queries

2011-03-17 Thread Jonathan Rochkind

On 3/17/2011 5:02 PM, Jonathan Rochkind wrote:

defType=lucene
q=*:* AND NOT _query_:{!dismax} foo bar baz



Oops, forgot a part, for anyone reading this and wanting to use it as a 
solution.


You can transform:

$defType=dismax
q=-foo -bar -baz

To:

defType=lucene
q=*:* AND NOT _query_:{!dismax mm=1}foo bar baz

And have basically equivalent semantics to what you meant but which 
dismax won't do.  The mm=1 is important, left that out before.


Jonathan



I might be able to work with that in my situation.  But it also seems
like something that dismax could take care of for you in such a
situation. It looks from the documentation like the newer (not in 1.4.1)
edismax does in at least some cases, where the pure negative query is
inside grouping/subquery parens, it's not clear to me if it does it in
general or not.

On 3/17/2011 4:45 PM, Markus Jelsma wrote:

Oh i see, i overlooked your first query. A query with one term that is negated
will yield zero results, it doesn't return all documents because nothing
matches. It's, if i remember correctly, the same as when you're looking for a
field that doesn't have a value: q=-field:[* TO *].


My fault for putting in the quotes in the email, I actually don't have
tests in my quotes, just tried again to make sure.

And I always get 0 results on a pure negative Solr 1.4.1 dismax query. I
think it does not actually work?

On 3/17/2011 3:52 PM, Markus Jelsma wrote:

Hi,

It works just as expected, but not in a phrase query. Get rid of your
quotes and you'll be fine.

Cheers,


Should 1.4.1 dismax query parser be able to handle pure negative queries
like:

q=-foo
q=-foo -bar

It kind of seems to me trying it out that it can NOT.  Can anyone else
verify?  The documentation I can find doesn't say one way or another.
Which is odd because the documentation for straight solr-lucene query
parser athttp://wiki.apache.org/solr/SolrQuerySyntax  suggests that
straight solr-lucene query parser_can_  handle pure negative.  That
seems odd that solr-lucene Q.P. can, but dismax can't? Maybe I'm
misinterpreting or misunderstanding my experimental results.


DIH Issue(newbie to solr)

2011-03-17 Thread neha
I am a newbie to solr I have an issue with DIH but unable to pinpoint what is
causing the issue. I am using the demo jetty installation of Solr and tried
to create a project with new schema.xml, solrconfig.xml and data-config.xml
files. when I run
http://131.187.88.221:8983/solr/dataimport?command=full-import; this is
what I get:
I am unable to index documents(it doesn't throw any error though).

##

−

0
0

−

−

test-data-config.xml


full-import
idle

−

0
1
0
2011-03-17 17:07:18
−

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

2011-03-17 17:07:18
2011-03-17 17:07:18
0
0:0:0.119

−

This response format is experimental.  It is likely to change in the future.


#

I do not find any log files(except on the console). And here are the
messages from the console:

###
INFO: Starting Full Import
Mar 17, 2011 5:08:20 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dataimport.properties
Mar 17, 2011 5:08:20 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
   
commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_k,version=1300286691490,generation=20,filenames=[segments_k]
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1300286691490
Mar 17, 2011 5:08:20 PM org.apache.solr.handler.dataimport.DocBuilder finish
INFO: Import completed successfully
Mar 17, 2011 5:08:20 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy onCommit
INFO: SolrDeletionPolicy.onCommit: commits:num=2
   
commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_k,version=1300286691490,generation=20,filenames=[segments_k]
   
commit{dir=/local/home/neha/ruby/Solr/apache-solr-1.4.1/example/solr/data/index,segFN=segments_l,version=1300286691491,generation=21,filenames=[segments_l]
Mar 17, 2011 5:08:20 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1300286691491
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening Searcher@d1329 main
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=8,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0,item_subject_topic_facet={field=subject_topic_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_subject_geo_facet={field=subject_geo_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_subject_era_facet={field=subject_era_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_pub_date={field=pub_date,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_language_facet={field=language_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_b4cutter_facet={field=lc_b4cutter_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_alpha_facet={field=lc_alpha_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2},item_lc_1letter_facet={field=lc_1letter_facet,memSize=4224,tindexSize=32,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=2}}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@d1329 main
   
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=2,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for Searcher@d1329 main
   
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=2,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Mar 17, 2011 5:08:20 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming Searcher@d1329 main from Searcher@1dcc2a3 main
   

Retrieving Ranking (Position)

2011-03-17 Thread Jae Joo
Hi,

I am looking for the way to retrieve a ranking (or position) of  the
document matched  in the result set.

I can get the data, then parse it to find the position of the document
matched, but am looking for the way if there is a feature.

Thanks,

Jae


Re: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 3:55 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
 Hi Yonik,

 I am not setting the ramBufferSizeMB or maxBufferedDocs params...
 DO I need to for Indexing?

No, the default settings that come with Solr should be fine.
You should verify that they have not been changed however.

An older solrconfig that used maxBufferedDocs could cause an OOM with
large documents since it buffered a certain amount of documents
instead a certain amount of RAM.

Perhaps post your solrconfig (or at least the sections related to
index configuration).

-Yonik
http://lucidimagination.com


 Regards,
 Geeta

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: 17 March, 2011 3:45 PM
 To: Geeta Subramanian
 Cc: solr-user@lucene.apache.org
 Subject: Re: memory not getting released in tomcat after pushing large 
 documents

 In your solrconfig.xml,
 Are you specifying ramBufferSizeMB or maxBufferedDocs?

 -Yonik
 http://lucidimagination.com


 On Thu, Mar 17, 2011 at 12:27 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
 Hi,

  Thanks for the reply.
 I am sorry, the logs from where I posted does have a Custom Update Handler.

 But I have a local setup, which does not have a custome update handler, its 
 as its downloaded from SOLR site, even that gives me heap space.

 at java.util.Arrays.copyOf(Unknown Source)
        at java.lang.AbstractStringBuilder.expandCapacity(Unknown
 Source)
        at java.lang.AbstractStringBuilder.append(Unknown Source)
        at java.lang.StringBuilder.append(Unknown Source)
        at org.apache.solr.handler.extraction.Solrtik
 ContentHandler.characters(SolrContentHandler.java:257)
        at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at
 org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandl
 er.java:153)
        at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at
 org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerD
 ecorator.java:124)
        at
 org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.j
 ava:39)
        at
 org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java
 :61)
        at
 org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:
 113)
        at
 org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.j
 ava:151)
        at
 org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler
 .java:175)
        at
 org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:144)
        at
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:142)
        at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99
 )
        at
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:11
 2)
        at
 org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extra
 ctingDocumentLoader.java:193)
        at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Con
 tentStreamHandlerBase.java:54)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
 rBase.java:131)
        at
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleR
 equest(RequestHandlers.java:237)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
 java:337)



 Also, in general, if I post 25 * 100 mb docs to solr, how much should be the 
 ideal heap space set?
 Also, I see that when I push a single document of 100 mb, in task manager I 
 see that about 900 mb memory is been used up, and some subsequent push keeps 
 the memory about 900mb, so at what point there can be OOM crash?

 When I ran the YourKit Profiler, I saw that around 1 gig of memory was just 
 consumed by char[] , String [].
 How can I find out who is creating these(is it SOLR or TIKA) and free up 
 these objects?


 Thank you so much for your time and help,



 Regards,
 Geeta



 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
 Seeley
 Sent: 17 March, 2011 12:21 PM
 To: solr-user@lucene.apache.org
 Cc: Geeta Subramanian
 Subject: Re: memory not getting released in tomcat after pushing large
 documents

 On Thu, Mar 17, 2011 at 12:12 PM, Geeta Subramanian 
 gsubraman...@commvault.com wrote:
        at
 com.commvault.solr.handler.extraction.CVExtractingDocumentLoader.load
 (
 CVExtractingDocumentLoader.java:349)

 Looks like you're using a custom update handler.  Perhaps that's 
 accidentally hanging onto memory?

 -Yonik
 http://lucidimagination.com













 **Legal Disclaimer***
 This communication may contain confidential and privileged material
 for the sole use of the intended recipient.  Any unauthorized review,
 use or distribution by 

Re: Rename fields in a query

2011-03-17 Thread Ahmet Arslan
 Given a Query object (name:firefox
 name:opera), is it possible 'rename'
 the fields names to, for example, (content:firefox
 content:opera)?

By saying object you mean solrJ?

Anyway, it that helps, with df parameter you can change fields. 

q=firefox operadf=name  will be parsed into name:firefox name:opera 
q=firefox operadf=content will be parsed into content:firefox content:opera


  


Re: memory not getting released in tomcat after pushing large documents

2011-03-17 Thread Yonik Seeley
On Thu, Mar 17, 2011 at 5:50 PM, Geeta Subramanian
gsubraman...@commvault.com wrote:
 Here is the attached xml.
 In our xml, maxBufferedDocs is commented. I hope that's not causing any issue.
 The ramBufferSizeMB is 32Mb, will changing  this be of any use to me?

Nope... your index settings are fine.
Perhaps something in extracting request handler or tika is holding onto memory.
Has anyone else experienced/reproduced this?

Geeta, can you open a JIRA issue?  If you're actually giving the JVM
4G of heap (is this a 64 bit JVM?), this looks like a bug somewhere.

-Yonik
http://lucidimagination.com


Spacial Search Field Type

2011-03-17 Thread Tanner Postert
I am using Solr 1.4.1 (Solr Implementation Version: 1.4.1 955763M - mark -
2010-06-17 18:06:42) to be exact.

I'm trying to implement that GeoSpacial field type by adding to the schema:

 fieldType name=location class=solr.LatLonType
subFieldSuffix=_latLon/
dynamicField name=*_latlon type=location index=true stored=true /

field name=geo type=location index=true stored=true
multiValued=false /


but I get the following errors:


org.apache.solr.common.SolrException: Unknown fieldtype 'location'
specified on field geo

and


org.apache.solr.common.SolrException: Error loading class 'solr.LatLonType'


I thought I read that you had to have Solr 4.0 for the LatLon field
type, but isn't 1.4 = 4.0? Do I need some type of patch or different
version of Solr to use that field type?


Re: Spacial Search Field Type

2011-03-17 Thread Ahmet Arslan
 I thought I read that you had to have Solr 4.0 for the
 LatLon field
 type, but isn't 1.4 = 4.0? Do I need some type of patch or
 different
 version of Solr to use that field type?

No, 1.4 and 4.0 are different. You can checkout trunk

http://wiki.apache.org/solr/HowToContribute#Getting_the_source_code 


  


Re: Rename fields in a query

2011-03-17 Thread Fabiano Nunes
hi, Arslan!
By object, I was saying an instance of [org.apache.lucene.search.Query].
For performance purposes, I'm wanting rewrite a fuzzy query in a field and,
then, query in another.

Thank you!


On Thu, Mar 17, 2011 at 18:43, Ahmet Arslan iori...@yahoo.com wrote:

  Given a Query object (name:firefox
  name:opera), is it possible 'rename'
  the fields names to, for example, (content:firefox
  content:opera)?

 By saying object you mean solrJ?

 Anyway, it that helps, with df parameter you can change fields.

 q=firefox operadf=name  will be parsed into name:firefox name:opera
 q=firefox operadf=content will be parsed into content:firefox
 content:opera






Re: Smart Pagination queries

2011-03-17 Thread Chris Hostetter


: In order to paint Next links app would have to know total number of
: records that user is eligible for read. getNumFound() will tell me that
: there are total 4K records that Solr returned. If there wasn't any
: entitlement rules then it could have been easier to determine how many
: Next links to paint and when user clicks on Next pass in start
: position appropriately in solr query. Since I have to apply post filter as
: and when results are fetched from Solr is there a better way to achieve

In an ideal world, you would do this using a custom plugin -- either a 
SearchComponent or a QParser used i na filter query.

if you really have to do this client side, then a few basic rules come to 
mind...

1) allways over request.  if you estimate that your user can only fiew 1/X 
docs in your total collection, and you want ot show Y results per page, 
then your rows param should be at least 2*X*Y (i picked 2 just for good 
measure, just because you know the average doesn't mean you know the real 
distrobution)

2) however many rows you get back, you need to keep track of the real 
start param you used, and at what in the current page you had enough docs 
to show the user -- that will determine your next start param.

3) wether you have a next link or not depends on:
3a) wether you had any left over the first time you over requested (see 
#2 above)
3b) wether numFound was greater then the index of the last item you got.
...if 3a and 3b are both false, you definitley don't need a next link. 
 if either of them is true then you probably *should* give them a next 
link, but you still need to be prepared for the possibility that you won't 
have any more docs (they might only be half way through the result set, 
but every remaining doc might be something they arne't allowed to see)

there's really no clean way to avoid the possibility completley, unless 
you really crank up how agressively you over request -- ultimatley if you 
over request *all* matches, then you can know definitively wether to give 
them a next link at any point.

-Hoss


Re: Custom search filters

2011-03-17 Thread Chris Hostetter

: Hi all, I am trying to use a custom search filter
: (org.apache.lucene.search.Filter) but I am unsure of where I should configure
: this.
: 
: Would I have to create my own SearchHandler that would wrap this logic in? Any
: example/suggestions out there?

the easiest way to plugin a custom Filter is to wrap it in a 
ConstantScoreQuery and use it as part of the filters that 
SolrIndexSearcher applies (that way it will be cached independently and 
can be reused)

you could do this in a SearchComponent where you decide when to 
generate the Filter based on query params and then add it explicitly (see 
ResponseBuilder.getFilters()).

or you could do it in a QParserPlugin, in which case clients 
could optionally enable it by refering ot your QParser by name in the 
local params of an fq param.



-Hoss


Re: Helpful new JVM parameters

2011-03-17 Thread Li Li
will UseCompressedOops be useful? for application using less than 4GB
memory, it will be better that 64bit reference. But for larger memory
using application, it will not be cache friendly.
JRocket the definite guide says: Naturally, 64 GB isn't a
theoretical limit but just an example. It was mentioned because
compressed references on 64-GB heaps have proven beneficial compared
to full 64-bit pointers in some benchmarks and applications. What
really matters, is how many bits can be spared and the performance
benefit of this approach. In some cases, it might just be easier to
use full length 64-bit pointers.

2011/3/18 Dyer, James james.d...@ingrambook.com:
 We're on the final stretch in getting our product database in Production with 
 Solr.  We have 13m wide-ish records with quite a few stored fields in a 
 single index (no shards).  We sort on at least a dozen fields and facet on 
 20-30.  One thing that came up in QA testing is we were getting full gc's due 
 to promotion failed conditions.  This led us to believe we were dealing 
 with large objects being created and a fragmented old generation.  After 
 improving, but not solving, the problem by tweaking conventional jvm 
 parameters, our JVM expert learned about some newer tuning params included in 
 Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are 
 available on other platforms too):

 These 3 options dramatically reduced the # objects getting promoted into the 
 Old Gen, reducing fragmentation and CMS frequency  time:
 -XX:+UseStringCache
 -XX:+OptimizeStringConcat
 -XX:+UseCompressedStrings

 This uses compressed pointers on a 64-bit JVM, significantly reducing the 
 memory  performance penalty in using a 64-bit jvm over 32-bit.  This reduced 
 our new GC (ParNew) time significantly:
 -XX:+UseCompressedOops

 The default for this was causing CMS to begin too late sometimes.  (the 
 documentated 68% proved false in our case.  We figured it was defaulting 
 close to 90%)  Much lower than 75%, though, and CMS ran far too often:
 -XX:CMSInitiatingOccupancyFraction=75

 This made the stop-the-world pauses during CMS much shorter:
 -XX:+CMSParallelRemarkEnabled

 We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on 
 the box), with a 1.2G newSize/maxNewSize.

 In case anyone else is having similar issues, we thought we would share our 
 experience with these newer options.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311




RE: Helpful new JVM parameters

2011-03-17 Thread Dyer, James
Our tests showed, in our situation, the compressed oops flag caused our minor 
(ParNew) generation time to decrease significantly.   We're using a larger heap 
(22gb) and our index size is somewhere in the 40's gb total.  I guess with any 
of these jvm parameters, it all depends on your situation and you need to test. 
 In our case, this flag solved a real problem we were having.  Whoever wrote 
the JRocket book you refer to no doubt had other scenarios in mind...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: Thursday, March 17, 2011 10:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Helpful new JVM parameters

will UseCompressedOops be useful? for application using less than 4GB
memory, it will be better that 64bit reference. But for larger memory
using application, it will not be cache friendly.
JRocket the definite guide says: Naturally, 64 GB isn't a
theoretical limit but just an example. It was mentioned because
compressed references on 64-GB heaps have proven beneficial compared
to full 64-bit pointers in some benchmarks and applications. What
really matters, is how many bits can be spared and the performance
benefit of this approach. In some cases, it might just be easier to
use full length 64-bit pointers.

2011/3/18 Dyer, James james.d...@ingrambook.com:
 We're on the final stretch in getting our product database in Production with 
 Solr.  We have 13m wide-ish records with quite a few stored fields in a 
 single index (no shards).  We sort on at least a dozen fields and facet on 
 20-30.  One thing that came up in QA testing is we were getting full gc's due 
 to promotion failed conditions.  This led us to believe we were dealing 
 with large objects being created and a fragmented old generation.  After 
 improving, but not solving, the problem by tweaking conventional jvm 
 parameters, our JVM expert learned about some newer tuning params included in 
 Sun/Oracle's JDK 1.6.0_24 (we're running RHEL x64, but I think these are 
 available on other platforms too):

 These 3 options dramatically reduced the # objects getting promoted into the 
 Old Gen, reducing fragmentation and CMS frequency  time:
 -XX:+UseStringCache
 -XX:+OptimizeStringConcat
 -XX:+UseCompressedStrings

 This uses compressed pointers on a 64-bit JVM, significantly reducing the 
 memory  performance penalty in using a 64-bit jvm over 32-bit.  This reduced 
 our new GC (ParNew) time significantly:
 -XX:+UseCompressedOops

 The default for this was causing CMS to begin too late sometimes.  (the 
 documentated 68% proved false in our case.  We figured it was defaulting 
 close to 90%)  Much lower than 75%, though, and CMS ran far too often:
 -XX:CMSInitiatingOccupancyFraction=75

 This made the stop-the-world pauses during CMS much shorter:
 -XX:+CMSParallelRemarkEnabled

 We use these in conjunction with CMS/ParNew and a 22gb heap (64gb total on 
 the box), with a 1.2G newSize/maxNewSize.

 In case anyone else is having similar issues, we thought we would share our 
 experience with these newer options.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311